We develop methods, benchmarks, and training systems that turn expert data into frontier AI
building benchmarks and collaborating with
Featured research
Vision and impact
We help labs advance frontier models by working with domain experts to design and build complex, realistic datasets that drive model performance.
Benchmarking & Evaluation
Build benchmarks that define and advance the AI frontier
Scaling Subject Matter Expertise
Define how subject matter experts encode their knowledge into data
RL, Training, & Data Valuation
Drive dataset development based on feedback from RL and model training
Community and open science
Open benchmarks, conversations, and research for real-world AI performance.

Open Benchmarks Grants
Backed by a $3M commitment, the program funds open-source datasets, benchmarks, and evaluation artifacts that shape how frontier AI systems are built and evaluated.

Benchtalks

Reading Group
DEEP RESEARCH Expertise
Technical advisors and distinguished affiliates
Browse research blogs and academic papers
Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.
Automating data augmentation by learning a generative sequence model over user-specified transformation functions.
Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.
Presenting Coral, a paradigm that infers generative model structure, significantly reducing the amount of data required to learn structure.
Introducing SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly.
Introducing Socratic learning, a paradigm that uses feedback from a discriminative model to automatically identify latent data subsets in training data.
This paper presents a flexible interface layer to write labeling functions based on experience.
A paradigm for labeling training datasets programmatically rather than by hand.
Introducing DDLite, an interactive development framework for data programming.










