Resource library

Weakly Supervised Sequence Tagging from Noisy Rules

We propose a framework for training sequence tagging models with weak supervision consisting of multiple heuristic rules of unknown accuracy. In addition to supporting rules that vote on tags in the output sequence, we introduce a new type of weak supervision, called linking rules, that vote on how sequence elements should be grouped into spans with the same tag. These rules are an alternative to candidate span generators that require significantly more human effort. To estimate the accuracies of the rules and combine their conflicting outputs into training data, we introduce a new type of generative model, linked hidden Markov...

Research Paper

Weakly Supervised Sequence Tagging from Noisy Rules

We propose a framework for training sequence tagging models with weak supervision consisting of multiple heuristic rules of unknown accuracy. In addition to supporting rules that vote on tags in the output sequence, we introduce a new type of weak supervision, called linking rules, that vote on how sequence elements should be grouped into spans with the same tag. These…

Apr 03, 2020 •

E. Safranchik, et al.

Learn more about Weakly Supervised Sequence Tagging from Noisy Rules

Weakly Supervised Classification of Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences

This work formalizes a deep learning baseline for aortic valve classification and outlines a general strategy for using weak supervision to train machine learning models using unlabeled medical images at scale.

Research Paper

Weakly Supervised Classification of Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences

This work formalizes a deep learning baseline for aortic valve classification and outlines a general strategy for using weak supervision to train machine learning models using unlabeled medical images at scale.

Dec 20, 2019 •

J. Fries, et al, 2019

Learn more about Weakly Supervised Classification of Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address this bottleneck of training data, we explore the applicability of weak supervision, or relying on higher level, noisier forms of supervision to label training data. Specifically, we use data programming, a paradigm that can learn the accuracy and dependency structure...

Research Paper

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address…

Dec 19, 2019 •

Z. Wheng, et al, 2019

Learn more about Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

Training Complex Models with Multi-Task Weak Supervision

Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting

Research Paper

Training Complex Models with Multi-Task Weak Supervision

Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting

Dec 18, 2019 •

A. Ratner, et al, 2019

Learn more about Training Complex Models with Multi-Task Weak Supervision

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.

Research Paper

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.

Dec 17, 2019 •

A. Ratner, et al, 2019

Learn more about The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Snuba: Automating Weak Supervision to Label Training Data

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text patterns. To address these challenges, we present Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large,...

Research Paper

Snuba: Automating Weak Supervision to Label Training Data

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain…

Dec 16, 2019 •

P. Varma and C. Ré, 2019

Learn more about Snuba: Automating Weak Supervision to Label Training Data

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

This is first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting

Research Paper

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

This is first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting

Dec 15, 2019 •

S. Bach, et al, 2019

Learn more about Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

Slice-Based Learning: A Programming Model for Residual Learning

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets---we define these as slices, the key abstraction in our approach. To address slice-level performance, practitioners often train separate "expert" models on slice subsets or use multi-task hard parameter sharing. We propose Slice-based Learning, a new programming model in which the...

Research Paper

Slice-Based Learning: A Programming Model for Residual Learning

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and “question” sentences might be important to a dialogue agent’s language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets—we…

Dec 14, 2019 •

V. Chen, et al, 2019

Learn more about Slice-Based Learning: A Programming Model for Residual Learning

Scene Graph Prediction With Limited Labels

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text patterns. To address these challenges, we present Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large,...

Research Paper

Scene Graph Prediction With Limited Labels

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain…

Dec 13, 2019 •

V. Chen, et al, 2019

Learn more about Scene Graph Prediction With Limited Labels

Resource library

Why coding agents need better data, evals, and environments

Closing the Evaluation Gap in Agentic AI

Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark

Building FinQA: An Open RL Environment for Financial Reasoning Agents

The science of rubric design

Benchtalks #3: We taught AI everything except how to learn

Join our newsletter

How do you want to work with Snorkel?