Advancing
data-centric AI

Snorkel AI is rooted in years of research into data-centric AI
development, including programmatic labeling, weak supervision,
and other state-of-the-art techniques.

Request a demo
Technology represented in 60+ academic papers and funded by
Image
Image
Image
Image
Image
Image

Paradigm shift for machine learning

Snorkel’s technology has been used to unlock new ML use cases in healthcare, criminology, and journalism and to power mission critical AI applications for Fortune 500 enterprises such as Chubb, Genentech, Google, and more.

Image

Snorkel is a fundamentally new interface to ML without hand-labeled training data

Mike Tamir
Chief ML Scientist, Head of Machine Learning/AI at Susquehanna International Group and Data Science Faculty, UC Berkeley
Image
Image

For many practical applications, it’s now more productive to hold the neural network architecture fixed, and instead find ways to improve the data.

Andrew Ng

Founder and CEO, Landing AI
Image
Image

Combining weak supervision and reinforcement learning enables Al systems to learn which actions can solve for which tasks. The result is a high-quality dataset and an optimized model.

Xuedong Huang

Technical Fellow and Azure AI CTO

Image

Pioneering technology

Programmatic labeling
Weak supervision
Data-centric AI
Programmatic labeling
Image
NEURIPS
Data Programming: Creating Large Training Sets, Quickly - A. Ratner, et al. 2016

Image
KDD
Interactive Programmatic Labeling for Weak Supervision - B. Cohen-Wang, et al, 2019

Image
ICCV
Scene Graph Prediction With Limited Labels - V. Chen, et al, 2019

Image
SIGMOD
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale - S. Bach, et al, 2019

Image
NATURE COMMS
Weakly Supervised Classification of Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences - J. Fries, et al, 2019

Image
CELL PATTERNS
Cross-Modal Data Programming Enables Rapid Medical Machine Learning - J. Dunnmon, et al, 2020

Image
ICML
Fast and Three-Rious: Speed up Weak Supervision With Triplet Methods - D. Fu, et al, 2020

Image
VLDB
Leveraging Organizational Resources to Adapt Models to New Data Modalities - S. Suri, et al, 2020

Weak supervision
Image
SIGMOD
Snorkel MeTaL: Weak Supervision for Multi-Task Learning - A. Ratner, et al, 2018

Image
DEEM @ SIGMOD
Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code - E. Bringer, et al, 2019

Image
VLDB
Snuba: Automating Weak Supervision to Label Training Data - P. Varma and C. Ré, 2019

Image
AAAI
Training Complex Models with Multi-Task Weak Supervision - A. Ratner, et al, 2019

Image
EMNLP
Reference-based Weak Supervision for Answer Sentence Selection using Web Data - V. Krishnamurthy, et al

Image
NeurIPS
WRENCH: A Comprehensive Benchmark for Weak Supervision - J. Zhang, et al

Data-centric AI
Image
SIGMOD
Data Programming With DDLite: Putting Humans in a Different Part of the Loop - H. Ehrenberg, et al, 2016

Image
NEURIPS
Learning to Compose Domain-Specific Transformations for Data Augmentation - A. Ratner, et al, 2017

Image
VLDB
Snorkel: Rapid Training Data Creation With Weak Supervision - Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

Image
NEURIPS
Slice-Based Learning: A Programming Model for Residual Learning - V. Chen, et al, 2019

Image
CIDR
The Role of Massively Multi-Task and Weak Supervision in Software 2.0 - A. Ratner, et al, 2019

Image
TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data - W. Piriyakulkij, et al

Image
ICLR
Multitask prompted training enables zero-shot task generalization - V. Sanh, et al

Image
ICML
Adversarial Multiclass Learning under Weak Supervision with Performance Guarantees

Snorkel DryBell: Tackling content classification at Google

Snorkel was used to create classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, convert non-servable organizational resources to servable models for an average 52% performance improvement, and execute over millions of data points in tens of minutes.
Published at SIGMOD’19

Collaboration with US FDA and Veterans Affairs on text and image data

Snorkel provided 132% average improvements to predictive performance over prior heuristic approaches and came within an average 3.6% of the predictive performance of large hand-curated training sets.
Published at VLDB’19

Snuba: Outperforming automated approaches

In collaborations with users at research labs, Stanford Hospital, and on open-source datasets, Snorkel outperformed other automated approaches like semi-supervised learning by up to 14.4 F1 points.
Published at VLDB’19

Slice-based learning for
 language and vision tasks

Snorkel improved over baselines in terms of slice-specific and
overall performance by up to 19.0 and 4.6 F1 points respectively on applications spanning natural language understanding and computer vision benchmarks as well as production-scale industrial systems.
Read Research Paper

Cross-modal data programming

Snorkel yielded models that on average perform within 1.75 points and 10.3 ROC-AUC of those supervised with physician-years and -months of hand labeling respectively while using only person-days of developer time and clinician work—a time saving of 96%.
Read Research Paper

Research at Snorkel AI

Our research team works closely with partners in academia and industry to make data-centric AI ubiquitous. The Snorkel AI team regularly publishes in academic journals, contributes to open source projects, applies research to the Snorkel Flow platform, and is faculty at the world's leading educational institutions.

Image
Image
Image
Image
Image
Image
Image

Dive in

[get_press_posts]
Press
Blog
Research
Case studies
Press
Image
September 20, 2021
Snorkel AI welcomes industry leaders to the team

Image
August 9, 2021
This hot startup is now valued at $1 billion for its A.I. skills

Image
February 24, 2021
The Data-First Enterprise AI Revolution

Image
July 14, 2020
Meet The Stanford AI Lab Alums That Raised $15 Million To Optimize Machine Learning

Blog
Image
February 4, 2022
Making Automated Data Labeling a Reality in Modern AI

Image
Date: Jan 25, 2022
The Principles of Data-Centric AI Development

Image
Date: Jan 5, 2022
Meet the Snorkelers

Image
Date: Jul 9, 2021
How to Use Snorkel to Build AI Applications

Research
Image
2022
Universalizing Weak Supervision

Image
2021
Ontology-driven weak supervision for clinical entity classification in electronic health records

Image
2017
Rapid Training Data Creation with Weak Supervision

Image
2016
Data Programming: Creating Large Datasets Quickly

Customer Stories
Image
February 26, 2022
Genentech used Snorkel Flow to extract information from clinical trials

Image
February 18, 2022
Google used Snorkel to build and adapt content classification models

Image
2019
Intel used Snorkel to accelerate sales and marketing agents

Image
2019
Apple built a Snorkel-based system to answer billions of queries in multiple languages

Image

Let’s connect

Speed time to value, reduce costs, and unlock more AI possibility 
with the Snorkel Flow platform.
Request a demo