Image

Snorkel at AI Engineer Europe

London, UK, April 8-10, 2026
Booth G9

Join Snorkel at this three-day conference for AI engineers and builders focused on production-ready AI systems and come meet the team.

Evals & Observability Breakout Track

The Art & Science of Benchmarking Agents

April 9, 2026

12:40-1 PM

Location: Moore

Our ability to measure AI has been outpaced by our ability to develop it, and this eval gap is one of the most important problems in AI. We need more enduring benchmarks to close this gap, and consequently advance entire new vectors of capabilities for the field.

In this talk, I'll share our learnings evaluating agents, drawing from experience working with nearly all global frontier labs and leading academics. We'll discuss the science (i.e., mechanics that make benchmarks rigorous and effective) and art (i.e., intangibles driving ambitious and enduring benchmarks) of building great benchmarks.

I'll close by sharing some of the learnings from Open Benchmarks Grants a $3M initiative in partnership with Hugging Face, Together AI, Prime Intellect, Factory, and others and highlighting some of the projects we're most excited about funding.

Image
Presented by
Vincent Sunn Chen
Founding Team & Researcher
Expo Breakout Session

Task Fidelity Scaling Laws

Improving LLM agents isn’t just about bigger models or more data task fidelity often matters more. Experiments show that fine-tuning on a small set of well-designed tasks can outperform training on many low-quality ones, because ambiguous specs teach models the wrong behaviors. The takeaway: fix your tasks before scaling your models.

Image
Kobie Crawford
April 10, 2026
1:25-1:43 PM

Location: Shelley

↳ Add to calendar
Expo Breakout Session

Stop Making Models Bigger. Make Them Behave

Bigger models often reason better, but they don’t always behave better especially with tools. This talk shows how a 4B model was fine-tuned to outperform a 235B model on financial analysis tasks by learning strong tool discipline with reinforcement learning, demonstrating that better behavior not bigger models can drive stronger real-world results.

Image
Kobie Crawford
April 10, 2026
3:45-4:03 PM

Location: Wordsworth

↳ Add to calendar

Meet our team on-site

Image

Vincent Sunn Chen

Founding Team and Researcher
Image

Dan Oersnes-Leeming

Global VP Revenue
Image

Zach Kleinbaum

Senior Applied AI Delivery Manager
Image

Kobie Crawford

Developer Advocate
Image

Matt Holek

Enterprise Advocate
Image

Lexi Sobel

Community Manager

Partner with Snorkel Data Research Lab to build and evaluate AI that performs in the real world