The Power of
Programmatic Labeling




Snorkel AI's technology is based on years of research represented in 40+ publications around programmatic labeling, weak supervision, and broader ML techniques.




Request demo






Technology developed and deployed with the world’s leading organizations
Image
Image
Image
Image
Image
Image
Image
Image



Research —

Over 40+ Peer-Reviewed Publications


Snorkel’s approach is informed by novel research into ML systems and weak supervision from the Stanford AI Lab and beyond, funded by DARPA, ONR, DoD, NIH, NSF, and many others.















Conventional Approaches —


The Problem with Legacy AI


Conventional AI approaches rely on generic third-party models, armies of human labelers, or fall back to brittle rule-based systems.



Image

Black box models or APIs
Black box models or APIs ignore the nuances of your data and objectives, and offer no way to customize, adapt, or audit their behavior.


Image
Rules-based approaches
Rules-based approaches often don’t generalize as well as ML models on complex, unstructured data or adapt easily to data drift or changing objectives.

Image

Hand-labeled ML
Hand-labeled ML is notoriously expensive and slow, especially when subject matter experts are required, with limited ability to iterate, adapt, audit, or be privacy compliant.







New Approach to AI —


Programmatic Labeling


Snorkel introduces a radically new approach that enables users to programmatically label massive amounts of training data by writing “labeling functions”. While this has led to advancing the state of AI, like any new paradigm it has introduced new challenges, which Team Snorkel has spent over half a decade researching. The result of this work is the Snorkel Flow platform.




Image
Faster Development
Reduce development time by 10-100x with programmatic labeling.
Image
Auditable AI
Easily version and audit with code-based training data creation.
Image
High-Accuracy Models
Increase predictive performance with massive training datasets.
Image
Collaborative Workflows
Bring together data scientists, developers, and domain experts to build solutions previously not possible.
Image
Adaptable Applications
Adapt to changing data or business goals without re-labeling from scratch.
Image
Privacy-Safe Labeling
Keep data inhouse or label without humans viewing the majority of the data.






The Snorkel Framework —


Weak Supervision


Snorkel’s framework is based on weak supervision, a classical but newly-resurgent set of techniques proven in research as well as hundreds of production deployments. The key idea in weak supervision is to train machine learning models using more efficient but potentially less accurate or “noisier” labels instead of “ground truth” labels provided by groups of expert annotators. Such noisy or so-called weak labels are easier to acquire in massive quantities, often resulting in higher quality models overall.

Snorkel Flow extends and subsumes years of weak supervision research with the concept of a “labeling function”. In Snorkel Flow, users write labeling functions which capture heuristics from domain knowledge of the data, and can leverage existing resources such as input from models, expert systems, and knowledge bases. 


Weak Supervision Interfaces + Modeling

Image



Weak labels typically overlap and conflict, vary in accuracy and dataset coverage, and may even hide latent dependencies and correlations. Snorkel automatically models and combines their outputs using a generative model that looks for patterns of agreements and disagreements, then uses the resulting probabilistic labels to train a discriminative model. Team Snorkel has spent years on theoretically- and empirically-grounded research advances that go into the foundations of the Snorkel Flow, and continue to integrate the latest advances in state-of-the-art.




Image

Intuition


Look at agreements & disagreements


Image
Provably consistent matrix completion-
style algorithm over inverse covariance






Community —


Snorkel: Recommended For Modern ML Practice


The Snorkel Framework is taught in several introductory and advanced machine learning courses and recommended by independent researchers, data scientists, and machine learning engineers.




COURSE

Stanford University: CS229 – Machine Learning

by Chris Re
COURSE

Stanford University: CS 329S: Machine Learning Systems Design

by Chip Huyen
COURSE

Brown University: CSCI 2952-C – Learning with Limited Labeled Data

by Stephen Bach
BOOK

Advanced Natural Language Processing

by Ashish Bansal
BOOK

Practical Weak Supervision: Doing More with Less Data

by Wee Hyong Tok, Amit Bahree, Senja Filipi
BLOG

Generating Labels for Model Training Using Weak Supervision review

by F. Duplessis, S. Chow, S. Prince at Borealis AI
BLOG

Taking Snorkel for a Spin

by Fast Forward Labs at Cloudera
BLOG

Understanding Snorkel

by Anna Zubova
BLOG

Hand labeling is the Past. The Future is #NoLabel AI

by Russell Jurney
BLOG

Snorkel — A Weak Supervision System

by Shreya Ghelani