author

Kobie Crawford

The latest from Kobie

Blog

SlopCodeBench: Measuring Code Erosion as Agents Iterate

SlopCodeBench reveals how AI coding agents degrade code quality over time—measuring “slop,” technical debt, and architectural erosion across iterations.

Jan 20, 2026 •

Kobie Crawford

Learn more about SlopCodeBench: Measuring Code Erosion as Agents Iterate

Blog

Introducing the Snorkel Agentic Coding Benchmark

Today, we’re sharing details about the Snorkel Agentic Coding benchmark—a comprehensive evaluation suite designed to test whether agents can handle the full complexity of software engineering work.

Jan 09, 2026 •

Kobie Crawford

Learn more about Introducing the Snorkel Agentic Coding Benchmark

Blog

A chat with the Terminal-Bench team

Snorkel Chief Scientist Fred Sala and Kobie Crawford chat with the Terminal-Bench team to unpack the design behind Terminal-Bench 2.0 and the new Harbor framework.

Nov 19, 2025 •

Kobie Crawford, Fred Sala

Learn more about A chat with the Terminal-Bench team

Blog

Intelligence per watt: A new metric for AI’s future

Snorkel AI contributes specialized datasets to Hazy Research’s “Intelligence-per-Watt” study, advancing how efficiently AI turns energy into intelligence.

Nov 12, 2025 •

Kobie Crawford

Learn more about Intelligence per watt: A new metric for AI’s future

Blog

Terminal-Bench 2.0: Raising the bar for AI agent evaluation

Terminal-Bench 2.0 launches today, marking a major leap in AI agent evaluation. Snorkel AI contributed key research and task design to this release.

Nov 07, 2025 •

Kobie Crawford

Learn more about Terminal-Bench 2.0: Raising the bar for AI agent evaluation

Blog

Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark

Terminal-Bench, developed through a collaboration between Stanford University and Laude Institute, has quickly become the gold standard benchmark for evaluating AI agent capabilities in a command line environment. This comprehensive evaluation framework measures how effectively AI agents can perform complex, real-world tasks within terminal environments. At Snorkel AI, we’re excited to share that we’re one of the top collaborators contributing…

Sep 30, 2025 •

Kobie Crawford, Jeong Shin, Tom Walshe

Learn more about Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark

For models that need to be right. Not just good enough.

Request dataset samples

Talk to our team

Kobie Crawford

The latest from Kobie

For models that need to be right. Not just good enough.

How do you want to work with Snorkel?