Image
author

Chris Glaze

Applied Research Scientist
,
Snorkel AI


Chris Glaze is Applied Research Scientist at Snorkel AI. He is an experienced PhD with a demonstrated history of developing novel machine learning tools and mathematical models in academia and industry. Accomplishments span data mining, experimental research, and application to digital technologies.

The latest from Chris

How Tool Discipline Let a 4B Model Outsmart a 235B Giant on Financial Tasks
Blog
How Tool Discipline Let a 4B Model Outsmart a 235B Giant on Financial Tasks

The Snorkel research team collaborated with the rLLM team at UC Berkeley on the Agentica project, using their open-source rLLM framework to fine-tune Qwen3-4B-Instruct-2507, delivering a model that beats Qwen3-235B-A22B on Snorkel AI’s expert-curated financial benchmarks – at 1/60th the size. A full breakdown of the results are published in the rLLM blog here. The key insight? Just focus on…

Feb 18, 2026
Learn more about How Tool Discipline Let a 4B Model Outsmart a 235B Giant on Financial Tasks
Benchmarking Agents in Insurance Underwriting Environments
As AI agents integrate into enterprise applications, their evaluation demands benchmarks that reflect the complexity of real-world operations. Instead, existing benchmarks overemphasize open-domains such as code, use narrow accuracy metrics, and lack authentic complexity. We present UNDERWRITE, an expert-first, multi-turn insurance underwriting benchmark designed in close collaboration with domain experts to capture real-world enterprise challenges. UNDERWRITE introduces critical realism factors often absent in current benchmarks: proprietary business knowledge, noisy tool interfaces, and imperfect simulated users requiring careful information gathering. Evaluating 13 frontier models, we uncover significant gaps between research lab performance and enterprise readiness: the most accurate models are not...
Research Paper
Accepted to CAIS 2026
Benchmarking Agents in Insurance Underwriting Environments

As AI agents integrate into enterprise applications, their evaluation demands benchmarks that reflect the complexity of real-world operations. Instead, existing benchmarks overemphasize open-domains such as code, use narrow accuracy metrics, and lack authentic complexity. We present UNDERWRITE, an expert-first, multi-turn insurance underwriting benchmark designed in close collaboration with domain experts to capture real-world enterprise challenges. UNDERWRITE introduces critical realism factors…

Jan 31, 2026
Snorkel Team
Learn more about Benchmarking Agents in Insurance Underwriting Environments
The science of rubric design
Blog
The science of rubric design

Part 3 of our rubric series explains the science of rubric design. We show why rubrics should be treated like models—structured, measured, and iterated—to maximize objective alignment and inter-rater agreement. Learn how to choose hierarchy and scale points, track agreement (IAA) and LLMAJ alignment, and refine with domain experts, with examples like PaperBench and HealthBench.

Sep 11, 2025
Learn more about The science of rubric design
Building the benchmark: inside our agentic insurance underwriting dataset
Blog
Building the benchmark: inside our agentic insurance underwriting dataset

In this post, we unpack how Snorkel built a realistic benchmark dataset to evaluate AI agents in commercial insurance underwriting. From expert-driven data design to multi-tool reasoning tasks, see how our approach surfaces actionable failure modes that generic benchmarks miss—revealing what it really takes to deploy AI in enterprise workflows.

Jul 10, 2025
Learn more about Building the benchmark: inside our agentic insurance underwriting dataset
Evaluating AI agents for insurance underwriting
Blog
Evaluating AI agents for insurance underwriting

In this post, we will show you a specialized benchmark dataset we developed with our expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark uncovers several model-specific and actionable error modes, including basic tool use errors and a surprising number of insidious hallucinations from one provider. This is part of an ongoing series of benchmarks we are releasing across verticals…

Jun 26, 2025
Learn more about Evaluating AI agents for insurance underwriting
How does the Snorkel Flow label model work?
Blog
How does the Snorkel Flow label model work?

The Snorkel Flow label model plays an instrumental role in driving the enterprise value we create. Here’s a peek at how it works.

Jun 18, 2024
Learn more about How does the Snorkel Flow label model work?
Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment
Blog
Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment

Snorkel takes a step on the path to enterprise superalignment with new data development workflows for enterprise alignment

Learn more about Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment
Building better enterprise AI: incorporating expert feedback in system development
Blog
Building better enterprise AI: incorporating expert feedback in system development

Enterprises that aim to build valuable GenAI applications must view them from a systems-level. LLMs are just one part of an ecosystem.

Jan 30, 2024
Learn more about Building better enterprise AI: incorporating expert feedback in system development
How we built better GenAI with programmatic data development
Blog
How we built better GenAI with programmatic data development

We used weak supervision to programmatically curate instruction tuning data for open-source LLMs to build a better GenAI.

Jul 19, 2023
Learn more about How we built better GenAI with programmatic data development

For models that need to be right. Not just good enough.