RESOURCES

Blog

Ideas, updates, and practical guidance from the Snorkel team.
Image for Closing the Evaluation Gap in Agentic AI

Closing the Evaluation Gap in Agentic AI

Announcing a $3M commitment to launch Open Benchmarks Grants

February 11, 2026
All articles
Sort: Newest
2026: The year of environments
2026: The year of environments

We just returned from NeurIPS 2025, and we’re still processing everything we saw. The energy around data-centric AI has never been stronger—and we couldn’t be more grateful to the research community for pushing these ideas forward.

Dec 10, 2025
Learn more about 2026: The year of environments
Part V: Future direction and emerging trends
Part V: Future direction and emerging trends

Explores how rubrics support agentic, multi-turn, tool-using, multimodal, and code-generating AI systems, and how they evolve with AI feedback and ensemble evaluation.

Dec 05, 2025
Learn more about Part V: Future direction and emerging trends
The self-critique paradox: Why AI verification fails where it’s needed most
The self-critique paradox: Why AI verification fails where it’s needed most

TL;DR: We stress-tested the “generate → criticize → improve” loop on 50 visual reasoning tasks. The results were counterintuitive: self-critique acts as a corrosive agent on high-performance tasks, turning 98% accuracy into 57%. Yet, for tasks where models fail completely, it works like magic. This difficulty-dependent behavior poses a critical, hidden risk for RLFT pipelines. The promise vs. the reality…

Nov 26, 2025
Learn more about The self-critique paradox: Why AI verification fails where it’s needed most
A chat with the Terminal-Bench team
A chat with the Terminal-Bench team

Snorkel Chief Scientist Fred Sala and Kobie Crawford chat with the Terminal-Bench team to unpack the design behind Terminal-Bench 2.0 and the new Harbor framework.

Nov 19, 2025
Learn more about A chat with the Terminal-Bench team
Intelligence per watt: A new metric for AI’s future
Intelligence per watt: A new metric for AI’s future

Snorkel AI contributes specialized datasets to Hazy Research’s “Intelligence-per-Watt” study, advancing how efficiently AI turns energy into intelligence.

Nov 12, 2025
Learn more about Intelligence per watt: A new metric for AI’s future
Terminal-Bench 2.0: Raising the bar for AI agent evaluation
Terminal-Bench 2.0: Raising the bar for AI agent evaluation

Terminal-Bench 2.0 launches today, marking a major leap in AI agent evaluation. Snorkel AI contributed key research and task design to this release.

Nov 07, 2025
Learn more about Terminal-Bench 2.0: Raising the bar for AI agent evaluation
Snorkeling in RL environments
Snorkeling in RL environments

We unpack what makes a high-quality RL environment for LLMs and show how we build realistic, enterprise-grade environments at Snorkel AI.

Nov 04, 2025
Learn more about Snorkeling in RL environments
Introducing SnorkelSpatial
Introducing SnorkelSpatial

A procedurally generated and programmatically verified benchmark for evaluating spatial reasoning capabilities in LLMs Large language models (LLMs) are showing remarkable results on solving complex reasoning problems across domains—from mathematical proofs and logical puzzles to graduate-level science and engineering questions. On the other hand, their spatial reasoning capabilities are less understood, even though such reasoning underlies many everyday tasks. We…

Oct 24, 2025
Learn more about Introducing SnorkelSpatial
Scaling trust: rubrics in Snorkel’s quality process
Scaling trust: rubrics in Snorkel’s quality process

Snorkel’s “Trusted Scale” philosophy Welcome to Part 4 of Snorkel AI’s rubric series. In previous posts, we explored how rubrics enable structured evaluation (Part 1), the spectrum of rubric types and use cases (Part 2), and the science behind designing and validating them (Part 3). In this latest installment, we pull back the curtain on how Snorkel puts these principles…

Oct 16, 2025
Learn more about Scaling trust: rubrics in Snorkel’s quality process
1 2 35 36
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.

By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.