Data development | Snorkel AI

Research-led data development

Datasets and environments that give frontier models domain expertise

Snorkel builds the human expert-authored datasets, evaluation environments, and benchmarks calibrated to push the limits of frontier model capability. Off-the-shelf or custom.

Request dataset samples

Where generic data runs out

Frontier model development stalls on data problems generic pipelines weren't built to solve, including distributional gaps in specialized domains, benchmark blind spots, and failure modes that only surface at scale. We build the data to solve them.

get started

Two ways to get the data you need

What the data frontier models need most is rarely the data that already exists. Snorkel delivers it two ways: off-the-shelf for well-defined task areas, or custom-built for the gaps only you can see.

Snorkel Data Series

Curriculum-structured datasets for the task areas frontier models are pushing hardest, with rubrics, reviewer guidance, difficulty tiers, and eval slices built in.

Request samples

Custom data development

Bespoke datasets, evaluation environments, and benchmark expansions to target the exact failure surface you're trying to close.

Talk to the team

SNORKEL DATA SERIES

Built for the task areas that matter now

Co-developed with leading frontier AI teams. Each series is curriculum-structured to build difficulty progressively across a task area, with the evaluation infrastructure to match. A look at a few areas we support:

Scientific Workflows & Research thumbnail

Software Engineering

Repo-grounded software engineering tasks inside real codebases, spanning multiple languages and difficulty tiers.

Terminal Coding

Long-horizon agents in real containerized terminals — planning, execution, and error recovery.

Enterprise & Workplace Agents

Multi-turn, tool-rich professional workflows across industries, policies, and 100+ occupations.

Scientific Workflows & Research

Research execution and technical reasoning across scientific domains.

Computer Use

GUI interaction and desktop workflow execution across real applications.

CUSTOM DATA DEVELOPMENT

Closing gaps existing datasets can’t reach

Custom data development engagements start with the failure surface: what the model can't do, where it's brittle, and what the correct evaluation criteria are. From there, Snorkel builds the datasets, environments, and benchmark expansions needed to close it.

Task specification and rubric design

Bespoke dataset construction

RL environment development

Benchmark and eval expansion

Provenance and adjudication

PUBLISHED RESEARCH