Cleanlab Raises $25M in Series A Funding

From left to right, Cleanlab co-founders Curtis Northcutt, Anish Athalye, and Jonas Mueller

Cleanlab, a San Francisco, CA-based provider of an automated data curation solution to increase the value of every data point in enterprise artificial intelligence (AI), large language model (LLM), and analytics solutions, raised $25M in Series A funding.

The round, which brought the total amount to $30M, was led by Menlo Ventures and TQ Ventures with participation from existing investor Bain Capital Ventures (BCV) and new investor Databricks Ventures.

The company intends to use the funds to expand operations and its business reach.

Led by co-founders Curtis Northcutt, Anish Athalye, and Jonas Mueller, Cleanlab provides an automated data curation platform, called Cleanlab Studio, that adds smart metadata automatically, removing the vast majority of the work and turning real-world data into useful inputs for various models. This process increases the reliability and profit margin of enterprise analytics, LLM, and AI decisions. Cleanlab also automatically identifies the majority of a dataset containing no issues, increasing the profit margins of enterprise pipelines by avoiding expensive data quality and annotation for the majority of data. The system has just launched new features that address unreliable LLM outputs. Cleanlab’s Trustworthy Language Model (TLM) produces high-quality LLM outputs like ChatGPT, Falcon, and similar LLMs. It also adds a trustworthiness reliability score to all LLM outputs. Cleanlab Studio identifies and fixes issues in all types of datasets, including text, image, and tabular data.

Today, over 10% of Fortune 500 companies (including AWS, JPMorgan Chase, Google, Oracle, and Walmart) and a variety of innovative startups (like ByteDance, HuggingFace, and Databricks) use Cleanlab to find and fix problems in sizable structured and unstructured visual, text, and tabular datasets.