Resource library

Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code

Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.

Research Paper

Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code

Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.

Dec 12, 2019 •

E. Bringer, et al, 2019

Learn more about Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code

Multi-Resolution Weak Supervision for Sequential Data

Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision is estimating the unknown accuracies and correlations of these sources without using labeled data. Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence. We propose Dugong, the first framework...

Research Paper

Multi-Resolution Weak Supervision for Sequential Data

Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision…

Dec 11, 2019 •

P. Varma, et al, 2019

Learn more about Multi-Resolution Weak Supervision for Sequential Data

Medical Device Surveillance With Electronic Health Records

Showcasing state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data.

Research Paper

Medical Device Surveillance With Electronic Health Records

Showcasing state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data.

Dec 10, 2019 •

A. Callahan, et al, 2019

Learn more about Medical Device Surveillance With Electronic Health Records

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCAbased algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real-world tasks. Under certain conditions, we show that the amount of unlabeled data needed can scale sublinearly or even logarithmically with the number of sources m, improving over previous efforts that ignore the...

Research Paper

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCAbased algorithm for learning these dependency structures, establish improved theoretical recovery…

Dec 09, 2019 •

P. Varma, et al, 2019

Learn more about Learning Dependency Structures for Weak Supervision Models

Interactive Programmatic Labeling for Weak Supervision

Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.

Research Paper

Interactive Programmatic Labeling for Weak Supervision

Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.

Dec 08, 2019 •

B. Cohen-Wang, et al, 2019

Learn more about Interactive Programmatic Labeling for Weak Supervision

Bootstrapping Conversational Agents with Weak Supervision

This paper presents a framework called search, label, and propagate (SLP) for bootstrapping intents from existing chat logs using weak supervision.

Research Paper

Bootstrapping Conversational Agents with Weak Supervision

This paper presents a framework called search, label, and propagate (SLP) for bootstrapping intents from existing chat logs using weak supervision.

Dec 07, 2019 •

N. Mallinar, et al, 2019

Learn more about Bootstrapping Conversational Agents with Weak Supervision

A Machine-Compiled Database of Genome-Wide Association Studies

Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.

Research Paper

A Machine-Compiled Database of Genome-Wide Association Studies

Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.

Dec 06, 2019 •

V. Kuleshov, et al, 2019

Learn more about A Machine-Compiled Database of Genome-Wide Association Studies

A Clinical Text Classification Paradigm Using Weak Supervision…

This work develops a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models.

Research Paper

A Clinical Text Classification Paradigm Using Weak Supervision…

This work develops a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models.

Dec 05, 2019 •

Y. Wang, et al, 2019

Learn more about A Clinical Text Classification Paradigm Using Weak Supervision…

Training Classifiers with Natural Language Explanations

Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount of unlabeled data, which is used to train a classifier. On three relation extraction tasks, we find that users are able to train classifiers with comparable F1 scores from 5–100× faster by providing explanations instead of just labels. Furthermore, given...

Research Paper

Training Classifiers with Natural Language Explanations

Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount…

Dec 20, 2018 •

B. Hancock, et al, 2018

Learn more about Training Classifiers with Natural Language Explanations

Resource library

Why coding agents need better data, evals, and environments

Closing the Evaluation Gap in Agentic AI

Evaluating coding agent capabilities with Terminal-Bench: Snorkel’s role in building the next generation benchmark

Building FinQA: An Open RL Environment for Financial Reasoning Agents

The science of rubric design

Benchtalks #3: We taught AI everything except how to learn

Join our newsletter

How do you want to work with Snorkel?