New Snorkel benchmark leaderboards. See the results.
Expert Data.
Unparalleled quality.
Proud to partner with leading AI companies
Snorkel AI services and technology
Expert training and evaluation data
Custom models and evaluations
AI data curation technology
Expert data, specialized AI
Trusted by Leading AI Teams
Snorkel supports cutting-edge research labs and model development teams building the next generation of AI models.
A PhD-level benchmark for frontier LLMs
A leading LLM provider sought a dataset of multiple-choice Q&A questions that stretched beyond the limits of frontier LLMs. Snorkel AI developed a dataset that probed for PhD-level understanding, covering thousands of subdomains across humanities, STEM, and professional topics.
<20%
Pass rate by two frontier LLMs
1,000+
PhD-level sub-domains
The frontiers of multi-turn math reasoning
Snorkel provided a frontier LLM team with a dataset purpose-built to assess LLMs’ abilities to reason over math problems ranging from high school to graduate-level topics. Snorkel's differentiated approach to data development allowed the customer to control distribution across topics, skills, and complexity.
0%
Pass rate for frontier LLMs
900
Mathematical skills
Multi-turn, multi-agent AI assistant training data for a tech industry giant
A tech industry giant aimed to build better, more usable support assistants for its customers. We collaborated with them to build a deep, expert-crafted dataset of realistic multi-turn, multi-agent conversations—including simulated tool use.
3+
Tool calls per conversation,
which averaged 9+ turns
15+
Reasoning scenarios represented
Multi-step, multi-turn, and multi-tool Deep Research data
A leading LLM provider hired Snorkel AI to create a dataset to enhance its models’ deep research capabilities. Together with our expert network, Snorkel researchers assembled a dataset where each data point included a complex user query, a high-quality research plan, and a fine-grained response quality evaluation rubric.
10+
Average interactions between model and user
30+
Evaluation criteria developed per task on average
Featured Benchmarks
Exclusive to Snorkel, these benchmarks are meticulously designed and validated by subject matter experts to prove frontier AI models on demanding, specialized tasks.
Research with real-world impact
Snorkel began in 2015 as the Snorkel Research project at the Stanford AI lab in collaboration with Google, Intel, DARPA, and other leading organizations.
The Snorkel AI team and affiliated researchers have been at the cutting edge of AI with over 170 published peer-reviewed research papers with special recognition at events such as NeurIPS, ICML, and ICLR.
See how Snorkel can help you get up to:
Faster Data Curation