Advancing
data-centric AI

Snorkel AI is rooted in years of research into data-centric AI
development, including programmatic labeling, weak supervision,
and other state-of-the-art techniques.

Request a demo
Technology represented in 60+ academic papers and funded by
Image
Image
Image
Image
Image
Image

Paradigm shift for machine learning

Snorkel’s technology has been used to unlock new ML use cases in healthcare, criminology, and journalism and to power mission critical AI applications for Fortune 500 enterprises such as Chubb, Genentech, Google, and more.

Image

Snorkel is a fundamentally new interface to ML without hand-labeled training data

Mike Tamir
Chief ML Scientist, Head of Machine Learning/AI at Susquehanna International Group and Data Science Faculty, UC Berkeley
Image
Image

For many practical applications, it’s now more productive to hold the neural network architecture fixed, and instead find ways to improve the data.

Andrew Ng

Founder and CEO, Landing AI
Image
Image

Combining weak supervision and reinforcement learning enables Al systems to learn which actions can solve for which tasks. The result is a high-quality dataset and an optimized model.

Xuedong Huang

Technical Fellow and Azure AI CTO

Image

Pioneering technology

Programmatic labeling
Weak supervision
Data-centric AI
Programmatic labeling
Image
NEURIPS
Data Programming: Creating Large Training Sets, Quickly - A. Ratner, et al. 2016

Image
KDD
Interactive Programmatic Labeling for Weak Supervision - B. Cohen-Wang, et al, 2019

Image
ICCV
Scene Graph Prediction With Limited Labels - V. Chen, et al, 2019

Image
SIGMOD
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale - S. Bach, et al, 2019

Image
NATURE COMMS
Weakly Supervised Classification of Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences - J. Fries, et al, 2019

Image
CELL PATTERNS
Cross-Modal Data Programming Enables Rapid Medical Machine Learning - J. Dunnmon, et al, 2020

Image
ICML
Fast and Three-Rious: Speed up Weak Supervision With Triplet Methods - D. Fu, et al, 2020

Image
VLDB
Leveraging Organizational Resources to Adapt Models to New Data Modalities - S. Suri, et al, 2020

Weak supervision
Image
SIGMOD
Snorkel MeTaL: Weak Supervision for Multi-Task Learning - A. Ratner, et al, 2018
Image

Image
DEEM @ SIGMOD
Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code - E. Bringer, et al, 2019
Image

Image
VLDB
Snuba: Automating Weak Supervision to Label Training Data - P. Varma and C. Ré, 2019
Image

Image
AAAI
Training Complex Models with Multi-Task Weak Supervision - A. Ratner, et al, 2019
Image

Image
EMNLP
Reference-based Weak Supervision for Answer Sentence Selection using Web Data - V. Krishnamurthy, et al
Image

Image
NeurIPS
WRENCH: A Comprehensive Benchmark for Weak Supervision - J. Zhang, et al
Image

Data-centric AI
Image
SIGMOD
Data Programming With DDLite: Putting Humans in a Different Part of the Loop - H. Ehrenberg, et al, 2016
Image

Image
NEURIPS
Learning to Compose Domain-Specific Transformations for Data Augmentation - A. Ratner, et al, 2017
Image

Image
VLDB
Snorkel: Rapid Training Data Creation With Weak Supervision - Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré
Image

Image
NEURIPS
Slice-Based Learning: A Programming Model for Residual Learning - V. Chen, et al, 2019
Image

Image
CIDR
The Role of Massively Multi-Task and Weak Supervision in Software 2.0 - A. Ratner, et al, 2019
Image

Image
TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data - W. Piriyakulkij, et al
Image

Image
ICLR
Multitask prompted training enables zero-shot task generalization - V. Sanh, et al
Image

Image
ICML
Adversarial Multiclass Learning under Weak Supervision with Performance Guarantees
Image

Snorkel DryBell: Tackling content classification at Google

Snorkel was used to create classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, convert non-servable organizational resources to servable models for an average 52% performance improvement, and execute over millions of data points in tens of minutes.
Published at SIGMOD’19

Collaboration with US FDA and Veterans Affairs on text and image data

Snorkel provided 132% average improvements to predictive performance over prior heuristic approaches and came within an average 3.6% of the predictive performance of large hand-curated training sets.
Published at VLDB’19

Snuba: Outperforming automated approaches

In collaborations with users at research labs, Stanford Hospital, and on open-source datasets, Snorkel outperformed other automated approaches like semi-supervised learning by up to 14.4 F1 points.
Published at VLDB’19

Slice-based learning for
 language and vision tasks

Snorkel improved over baselines in terms of slice-specific and
overall performance by up to 19.0 and 4.6 F1 points respectively on applications spanning natural language understanding and computer vision benchmarks as well as production-scale industrial systems.
Read Research Paper

Cross-modal data programming

Snorkel yielded models that on average perform within 1.75 points and 10.3 ROC-AUC of those supervised with physician-years and -months of hand labeling respectively while using only person-days of developer time and clinician work—a time saving of 96%.
Read Research Paper

Research at Snorkel AI

Our research team works closely with partners in academia and industry to make data-centric AI ubiquitous. The Snorkel AI team regularly publishes in academic journals, contributes to open source projects, applies research to the Snorkel Flow platform, and is faculty at the world's leading educational institutions.

Image
Image
Image
Image
Image
Image
Image

Dive in

[get_press_posts]
Press
Blog
Research
Case studies
Press
Image
November 17, 2022
Snorkel AI Accelerates Foundation Model Adoption with Data-centric AI


Image
November 17, 2022
AI startup Snorkel preps a new kind of expert for enterprise AI


Image
November 17, 2022
Snorkel dives into data labeling and foundation AI models


Image
July 28, 2022
Here’s why a gold rush of NLP startups is about to arrive


Blog
Image
November 17, 2022
Data-centric Foundation Model Development: Bridging the gap between foundation models and enterprise AI


Image
November 17, 2022
Better not bigger: How to get GPT-3 quality at 0.1% the cost


Image
November 3, 2022
Building an NLP application to analyze ESG factors in Earnings Calls using Snorkel Flow


Image
August 4, 2022
The Future of Data-Centric AI 2022 day 1 highlights


Research
Image
2022
Universalizing Weak Supervision


Image
2021
Ontology-driven weak supervision for clinical entity classification in electronic health records


Image
2017
Rapid Training Data Creation with Weak Supervision


Image
2016
Data Programming: Creating Large Datasets Quickly


Customer Stories
Image
September 30, 2022
How Schlumberger uses Snorkel Flow to enhance proactive well management


Image
September 30, 2022
How a global custodial bank automated KYC verification with Snorkel Flow


Image
September 28, 2022
How Memorial Sloan Kettering Cancer Center used Snorkel Flow to scale clinical trial screening


Image
February 26, 2022
How Genentech extracted information for clinical trial analytics with Snorkel Flow


Image

Are you ready to dive in?

Label data programmatically, train models efficiently, improve performance iteratively, and deploy applications rapidly—all in one platform.
Request a demo