resources

Resource library

Explore our complete library of resources including blogs, benchmarks, research papers, and more.

Image for Why coding agents need better data, evals, and environments
Blog

Why coding agents need better data, evals, and environments

Announcing a $3M commitment to launch Open Benchmarks Grants
May 11, 2026
Image for Closing the Evaluation Gap in Agentic AI
Blog

Closing the Evaluation Gap in Agentic AI

Announcing a $3M commitment to launch Open Benchmarks Grants

February 11, 2026
Image for Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark
Blog

Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark

Announcing a $3M commitment to launch Open Benchmarks Grants
September 30, 2025
Image for Benchtalks #2: The future of coding benchmarks
Blog

Benchtalks #2: The future of coding benchmarks

Featuring John Yang (SWE-bench, ProgramBench)

June 3, 2026
Image for Building FinQA: An Open RL Environment for Financial Reasoning Agents
Blog

Building FinQA: An Open RL Environment for Financial Reasoning Agents

Announcing a $3M commitment to launch Open Benchmarks Grants
March 30, 2026
Image for The science of rubric design
Blog

The science of rubric design

Announcing a $3M commitment to launch Open Benchmarks Grants
September 11, 2025
of
Type: All Types
Sort: Newest
Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data
A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set. In particular, they fail to model latent subsets in the training data in which the supervision sources perform differently than on average. We present Socratic learning, a paradigm that...
Research Paper
Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to…

Nov 13, 2017
P. Varma, et al, 2017
Learn more about Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data
Snorkel: Rapid Training Data Creation With Weak Supervision
This paper presents a flexible interface layer to write labeling functions based on experience.
Research Paper
Snorkel: Rapid Training Data Creation With Weak Supervision

This paper presents a flexible interface layer to write labeling functions based on experience.

Oct 04, 2017
Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré
Learn more about Snorkel: Rapid Training Data Creation With Weak Supervision
Data Programming: Creating Large Training Sets, Quickly
A paradigm for labeling training datasets programmatically rather than by hand.
Research Paper
Data Programming: Creating Large Training Sets, Quickly

A paradigm for labeling training datasets programmatically rather than by hand.

Dec 20, 2016
A. Ratner, et al. 2016
Learn more about Data Programming: Creating Large Training Sets, Quickly
Data Programming With DDLite: Putting Humans in a Different Part of the Loop
Introducing DDLite, an interactive development framework for data programming.
Research Paper
Data Programming With DDLite: Putting Humans in a Different Part of the Loop

Introducing DDLite, an interactive development framework for data programming.

Dec 19, 2016
H. Ehrenberg, et al, 2016
Learn more about Data Programming With DDLite: Putting Humans in a Different Part of the Loop
1 2 64 65
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.