Snorkel Expert data-as-a-service
Use Cases
From coding and agentic reasoning to text generation and more, discover how Snorkel enables AI teams to build the next generation of models with unparalleled speed and accuracy.
Agentic
The frontiers of multi-turn math reasoning
Snorkel provided a frontier LLM team with a dataset to assess LLM math reasoning skills on high school to graduate-level challenges. Our data development approach saw experts correct responses and reasoning traces and allowed the customer to control distribution across topics, skills, and complexity.
0%
900
AI Voice assistant training data for a tech industry giant
A tech industry giant aimed to build better, more usable voice assistants for its customers. We collaborated with them to build a deep, expert-crafted dataset of realistic multi-turn, multi-agent conversations, including simulated tool use.
3+
15+
Robust agentic evaluation benchmarks
A Global 2000 telecom partnered with Snorkel to curate a gold-standard set of prompts, responses, and tool calls targeting reasoning and multi-step planning. This custom benchmark revealed critical model failures, enabling the team to target training and correction and progress to production faster than manual reviews.
10+
+35
Multi-step, multi-turn, and multi-tool Deep Research data
A leading LLM provider hired Snorkel AI to create a dataset to enhance its models’ deep research capabilities. Snorkel researchers assembled a dataset where each data point included a complex user query, a high-quality research plan, and a fine-grained response quality evaluation rubric.
10+
30+
Annotation
Grading LLM information retrieval and synthesis
An open-source LLM developer sought to improve its models’ ability to extract questions and answers from technical documents like textbooks and research papers. Snorkel experts graded and corrected model attempts to cite sources and answer questions from these documents, creating a golden set of retrievals.
30+
10+
Enabling FMs to understand charts
A leading LLM developer sought high-quality annotations of graphs, maps, and other visuals used to solve middle-school and high-school math problems. Snorkel experts reviewed documents and curated annotations (including chart elements, data points, and implied relationships) for training and evaluation purposes.
22+
15+
Coding
Alignment for better code generation
A frontier model developer sought to improve code generation outputs using human feedback. Snorkel rapidly assembled a team of qualified engineers to assess, review, and grade multiple candidate code responses to user queries, resulting in a rich training set to better align the model.
8
21
Training and evaluation data for code generation
A tech industry giant sought unique prompts and answers to train and evaluate its frontier LLMs’ code generation capabilities. Snorkel experts curated unique competition-style coding prompts with verifiable solutions and accompanying unit tests to validate samples automatically.
20+
4
Multi-Modal
Enabling FMs to understand charts
A leading LLM developer sought high-quality annotations of graphs, maps, and other visuals used to solve middle-school and high-school math problems. Snorkel experts reviewed documents and curated annotations (including chart elements, data points, and implied relationships) for training and evaluation purposes.
22+
15+
Image-based search for retail
An e-commerce giant aimed to let customers search products by image and feeling (such as “summer vibes.”) Snorkel researchers generated pairs of user queries and associated results that boosted downstream search mode performance.
10,000+
+37
Text Generation
A PhD-level benchmark for frontier LLMs
A leading LLM developer sought a dataset of multiple-choice Q&A questions that stretched beyond the limits of frontier LLMs. Snorkel AI developed a dataset that probed for PhD-level understanding, covering thousands of topics across humanities, STEM, and professional domains.
<20%
1,000+
Q&A training data for customer billing SLM
A Fortune 500 telecom wanted an SLM to automatically answer customer billing questions. Using expert input and programmatic acceleration, Snorkel curated data that covered all expected question types and improved the model’s performance, enabling the team to deploy 10+ supported use cases to production.
+41
93%
Multi-step, multi-turn, and multi-tool Deep Research data
A leading LLM provider hired Snorkel AI to create a dataset to enhance its models’ deep research capabilities. Snorkel researchers assembled a dataset where each data point included a complex user query, a high-quality research plan, and a fine-grained response quality evaluation rubric.