Tom Walshe

Tom Walshe

Title: Staff Research Scientist
Company: Snorkel AI
Bio

Tom Walshe is a Staff Research Scientist at Snorkel AI. Before Snorkel, Tom worked in LegalTech and finance services, where he focussed on building end-to-end AI systems and researching data-centric AI. Prior to industry, Tom completed a PhD in Computer Science from the University of Oxford.

The latest from
Tom Walshe

Evaluating Coding Agent Capabilities with Terminal-Bench: Snorkel’s Role in Building the Next Generation Benchmark

Terminal-Bench, developed through a collaboration between Stanford University and Laude Institute, has quickly become the gold standard benchmark for evaluating AI agent capabilities in a command line environment. This comprehensive evaluation framework measures how effectively AI agents can perform complex, real-world tasks within terminal environments. At Snorkel AI, we’re excited to share that we’re one of the top collaborators contributing…

September 30, 2025

The right tool for the job: An A-Z of rubrics

Rubrics turn fuzzy “good vs. bad” into measurable criteria for GenAI. In Part 2, we map what to measure (granularity and dataset-level vs instance-specific), where to measure (process vs outcome), and how to measure (humans, LLM-as-judge, code, reward models)—with examples like HHH, FLASK, HealthBench, and PaperBench.

September 2, 2025

LLM alignment techniques: 4 post-training approaches

Ensure your LLMs align with your values and goals using LLM alignment techniques. Learn how to mitigate risks and optimize performance.

March 4, 2025

Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment

Snorkel takes a step on the path to enterprise superalignment with new data development workflows for enterprise alignment

Image
See how Snorkel can help you get up to:
100x

Faster Data Curation

40x
Faster Model Delivery
99%
Model Accuracy