Image
author

Kobie Crawford

The latest from Kobie

SlopCodeBench: Measuring Code Erosion as Agents Iterate
Blog
SlopCodeBench: Measuring Code Erosion as Agents Iterate

SlopCodeBench reveals how AI coding agents degrade code quality over time—measuring “slop,” technical debt, and architectural erosion across iterations.

Jan 20, 2026
Learn more about SlopCodeBench: Measuring Code Erosion as Agents Iterate
Introducing the Snorkel Agentic Coding Benchmark
Blog
Introducing the Snorkel Agentic Coding Benchmark

Today, we’re sharing details about the Snorkel Agentic Coding benchmark—a comprehensive evaluation suite designed to test whether agents can handle the full complexity of software engineering work.

Jan 09, 2026
Learn more about Introducing the Snorkel Agentic Coding Benchmark
A chat with the Terminal-Bench team
Blog
A chat with the Terminal-Bench team

Snorkel Chief Scientist Fred Sala and Kobie Crawford chat with the Terminal-Bench team to unpack the design behind Terminal-Bench 2.0 and the new Harbor framework.

Nov 19, 2025
Learn more about A chat with the Terminal-Bench team
Intelligence per watt: A new metric for AI’s future
Blog
Intelligence per watt: A new metric for AI’s future

Snorkel AI contributes specialized datasets to Hazy Research’s “Intelligence-per-Watt” study, advancing how efficiently AI turns energy into intelligence.

Nov 12, 2025
Learn more about Intelligence per watt: A new metric for AI’s future
Terminal-Bench 2.0: Raising the bar for AI agent evaluation
Blog
Terminal-Bench 2.0: Raising the bar for AI agent evaluation

Terminal-Bench 2.0 launches today, marking a major leap in AI agent evaluation. Snorkel AI contributed key research and task design to this release.

Nov 07, 2025
Learn more about Terminal-Bench 2.0: Raising the bar for AI agent evaluation
Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark
Blog
Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark

Terminal-Bench, developed through a collaboration between Stanford University and Laude Institute, has quickly become the gold standard benchmark for evaluating AI agent capabilities in a command line environment. This comprehensive evaluation framework measures how effectively AI agents can perform complex, real-world tasks within terminal environments. At Snorkel AI, we’re excited to share that we’re one of the top collaborators contributing…

Sep 30, 2025
Learn more about Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark

For models that need to be right. Not just good enough.