The Power of
Programmatic Labeling




Snorkel AI's technology is based on years of research represented in 40+ publications around programmatic labeling, weak supervision, and broader ML techniques.




Request demo






Technology developed and deployed with the world’s leading organizations
Image
Image
Image
Image
Image
Image
Image
Image



Research —

Over 40+ Peer-Reviewed Publications


Snorkel’s approach is informed by novel research into ML systems and weak supervision from the Stanford AI Lab and beyond, funded by DARPA, ONR, DoD, NIH, NSF, and many others.



NATURE COMMS

Weakly Supervised Classification of Aortic Valve Malformations Using …

J. Fries, et al, 2019
IEEE IVS

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving…

Z. Wheng, et al, 2019
AAAI

Training Complex Models with Multi-Task Weak Supervision

A. Ratner, et al, 2019
ACL

Training Classifiers with Natural Language Explanations

B. Hancock, et al, 2018
Research Paper

Train and You’ll Miss It: Interactive Model Iteration with Weak Supervision…

M. Chen, et al, 2020
CIDR

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

A. Ratner, et al, 2019
KDD

Software 2.0 and Snorkel: Going Beyond Hand-Labeled Data

C. Ré, 2018 (invited)
Research Paper

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

P. Varma, et al, 2017
VLDB

Snuba: Automating Weak Supervision to Label Training Data

P. Varma and C. Ré, 2019
SIGMOD

Snorkel: Fast Training Set Generation for Information Extraction

A. Ratner, et al, 2017
SIGMOD

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

A. Ratner, et al, 2018
SIGMOD

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

S. Bach, et al, 2019
NEURIPS

Slice-based Learning: A Programming Model for Residual Learning…

V. Chen, et al, 2019
ICCV

Scene Graph Prediction With Limited Labels

V. Chen, et al, 2019
DEEM @ SIGMOD

Osprey: Weak Supervision of Imbalanced Extraction Problems without Code

E. Bringer, et al, 2019
NATURE COMMS

Ontology-driven weak supervision for clinical entity classification in electronic health records

J. A. Fries, et al, 2021
NEURIPS

Multi-Resolution Weak Supervision for Sequential Data

P. Varma, et al, 2019
NPJ MEDICINE

Medical Device Surveillance With Electronic Health Records

A. Callahan, et al, 2019
VLDB

Leveraging Organizational Resources to Adapt Models to New Data Modalities

S. Suri, et al, 2020
NEURIPS

Learning to Compose Domain-Specific Transformations for Data Augmentation

A. Ratner, et al, 2017
ICML

Learning the Structure of Generative Models without Labeled Data

S. Bach, et al, 2017
ICML

Learning Dependency Structures for Weak Supervision Models

P. Varma, et al, 2019
KDD

Interactive Programmatic Labeling for Weak Supervision

B. Cohen-Wang, et al, 2019
NEURIPS

Inferring Generative Model Structure with Static Analysis

P. Varma, et al, 2017
SIGMOD

Fonduer: Knowledge Base Construction from Richly Formatted Data

S. Wu, et al, 2018
ICML

Fast & Three-rious: Speed Up Weak Supervision with Triplet Methods

D. Fu, et al, 2020
ICWI

Deep Text Mining of Instagram Data without Strong Supervision

K. Hammar, et al, 2018
NEURIPS

Data Programming: Creating Large Training Sets, Quickly

A. Ratner, et al. 2016
SIGMOD

Data programming with DDLite: Putting Humans in a Different Part of the Loop

H. Ehrenberg, et al, 2016
CELL PATTERNS

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

J. Dunnmon, et al, 2020
AAAI

Bootstrapping Conversational Agents with Weak Supervision

N. Mallinar, et al, 2019
NATURE

A Machine-Compiled Database of Genome-Wide Association Studies

V. Kuleshov, et al, 2019
BCM MIDM

A Clinical Text Classification Paradigm Using Weak Supervision…

Y. Wang, et al, 2019












Conventional Approaches —


The Problem with Legacy AI


Conventional AI approaches rely on generic third-party models, armies of human labelers, or fall back to brittle rule-based systems.



Image

Black box models or APIs
Black box models or APIs ignore the nuances of your data and objectives, and offer no way to customize, adapt, or audit their behavior.


Image
Rules-based approaches
Rules-based approaches often don’t generalize as well as ML models on complex, unstructured data or adapt easily to data drift or changing objectives.

Image

Hand-labeled ML
Hand-labeled ML is notoriously expensive and slow, especially when subject matter experts are required, with limited ability to iterate, adapt, audit, or be privacy compliant.







New Approach to AI —


Programmatic Labeling


Snorkel introduces a radically new approach that enables users to programmatically label massive amounts of training data by writing “labeling functions”. While this has led to advancing the state of AI, like any new paradigm it has introduced new challenges, which Team Snorkel has spent over half a decade researching. The result of this work is the Snorkel Flow platform.




Image
Faster Development
Reduce development time by 10-100x with programmatic labeling.
Image
Faster Development
Easily version and audit with code-based training data creation.
Image
High-Accuracy Models
Increase predictive performance with massive training datasets.
Image
Collaborative Workflows
Increase predictive performance with massive training datasets.
Image
Adaptable Applications
Adapt to changing data or business goals without re-labeling from scratch.
Image
Privacy-Safe Labeling
Keep data inhouse or label without humans viewing the majority of the data.






The Snorkel Framework —


Weak Supervision


Snorkel’s framework is based on weak supervision, a classical but newly-resurgent set of techniques proven in research as well as hundreds of production deployments. The key idea in weak supervision is to train machine learning models using more efficient but potentially less accurate or “noisier” labels instead of “ground truth” labels provided by groups of expert annotators. Such noisy or so-called weak labels are easier to acquire in massive quantities, often resulting in higher quality models overall.

Snorkel Flow extends and subsumes years of weak supervision research with the concept of a “labeling function”. In Snorkel Flow, users write labeling functions which capture heuristics from domain knowledge of the data, and can leverage existing resources such as input from models, expert systems, and knowledge bases. 


Weak Supervision Interfaces + Modeling

Image



Weak labels typically overlap and conflict, vary in accuracy and dataset coverage, and may even hide latent dependencies and correlations. Snorkel automatically models and combines their outputs using a generative model that looks for patterns of agreements and disagreements, then uses the resulting probabilistic labels to train a discriminative model. Team Snorkel has spent years on theoretically- and empirically-grounded research advances that go into the foundations of the Snorkel Flow, and continue to integrate the latest advances in state-of-the-art.




Image
Intution

Look at agreements & disagreements


Image
Provably consistent matrix completion-
style algorithm over inverse covariance






Community —


Snorkel: Recommended For Modern ML Practice


The Snorkel Framework is taught in several introductory and advanced machine learning courses and recommended by independent researchers, data scientists, and machine learning engineers.




COURSE

Stanford University: CS229 – Machine Learning

by Chris Re
COURSE

Stanford University: CS 329S: Machine Learning Systems Design

by Chip Huyen
COURSE

Brown University: CSCI 2952-C – Learning with Limited Labeled Data

by Stephen Bach
BOOK

Advanced Natural Language Processing

by Ashish Bansal
BOOK

Practical Weak Supervision: Doing More with Less Data

by Wee Hyong Tok, Amit Bahree, Senja Filipi
BLOG

Generating Labels for Model Training Using Weak Supervision review

by F. Duplessis, S. Chow, S. Prince at Borealis AI
BLOG

Taking Snorkel for a Spin

by Fast Forward Labs at Cloudera
BLOG

Understanding Snorkel

by Anna Zubova
BLOG

Hand labeling is the Past. The Future is #NoLabel AI

by Russell Jurney
BLOG

Snorkel — A Weak Supervision System

by Shreya Ghelani