Snorkel at AI Engineer World's Fair

Join Snorkel and thousands of your peers for 800+ sessions, keynotes and training at the world’s largest data, analytics and AI conference.

San Francisco, CA

June 29-July 2, 2026

Booth L-G12

Featured session

Towards Reliable Financial Agents: How a 4B Model Outsmarted a 235B Giant

June 30, 2026 · 3:45-4:05pm

Expo Stage 3

Bigger models often reason better, but they don’t always behave better – especially with tools. This talk shows how a 4B model was fine-tuned to outperform a 235B model on financial analysis tasks by learning strong tool discipline with reinforcement learning, demonstrating that better behavior – not bigger models – can drive stronger real-world results.

Speaker

Charlie Dickens

Senior Applied Research Scientist

Featured session

From Agent Traces to Agent Simulations: The next era of agent evaluation

July 1, 2026 · 12:05-12:25pm

Evals / Room 2005

This talk explores how executable simulation environments let teams repeatedly test agents across realistic tasks, compare models and harnesses, and uncover failure modes that trace review alone misses. Drawing from Snorkel's experience building simulation datasets at scale for major labs and contributions to projects like Agents' Last Exam and Terminal-Bench, we'll cover concrete engineering patterns for building these environments.

Speaker

Rustem Feyzkhanov

Senior Engineering Manager, AI Platform

Accepted paper

Benchmarking Agents in Insurance Underwiting Environments

UNDERWRITE is an expert-first benchmark for evaluating AI agents in insurance underwriting, built in close collaboration with domain practitioners to capture enterprise-realistic complexity: proprietary business knowledge, noisy tool interfaces, and imperfect data. It fills the gap left by open-domain benchmarks that overemphasize code and narrow accuracy metrics.

Amanda Dsouza, Ramya Ramakrishnan, Charles Dickens,
Bhavishya Pohani, Christopher M Glaze

↳ Read the paper