Snorkel helps build Terminal-Bench 2.0. Learn more
From cutting-edge research to enterprise and frontier impact
Deep research roots
Born out of the Stanford AI lab in 2019 and in collaboration with leading research institutions, Snorkel-affiliated researchers have published more than 170 peer-reviewed research papers on weak supervision, AI data development techniques, foundation models, and more—with special recognition at events such as NeurlPS, ICML, and ICLR. Our researchers are closely affiliated with academic institutions including Stanford University, University of Washington, Brown University, and the University of Wisconsin-Madison.




Featured benchmarks
Exclusive to Snorkel, these benchmarks are meticulously designed and validated by subject matter experts to probe frontier AI models on demanding, specialized tasks.
These are just a few of our featured benchmarks—new ones are added regularly, so check back often to see the latest from our research team.
SnorkelUnderwrite
Finance Reasoning
SnorkelSequences
Leaderboards
Challenging benchmarks for models and agents
Snorkel benchmarks are built with human expertise to test models on realistic tasks ranging from coding and financial analysis to healthcare and more. For example, our SnorkelUnderwrite benchmark includes multi-turn agentic tasks germane to the insurance industry.
Rubrics
Aligning human expertise and automated evaluation
We investigate how to scalably develop rubrics that are both comprehensive of the desired agentic capabilities and reliably assessed by both human experts and AI judges.
RL ENvironments
Environments give agents a fully realized simulation
As tool-calling and more open-ended application requirements break simple test frameworks, agent validation must be done with techniques that reproduce real-world variability. For example, our contributions to Terminal-Bench (tbench.ai) include containerized simulation environments.