AI beyond
manual labeling




AI today is blocked by a lack of labeled data, not models. Unblock AI with the first data-centric AI development platform powered by a programmatic approach.



Request demo



Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image

Technology developed and deployed with the world’s leading organizations





Approach —


Data-centric AI



Snorkel AI is leading the shift from model-centric to data-centric AI development with its unique programmatic approach.




Image

Accelerated

Save time and costs by replacing manual labeling with rapid, programmatic labeling.




Image

Accurate

Develop and deploy high-quality AI models via rapid, guided iteration on the part that matters–the training data.




Image

Adaptable

Adapt to changing data or business goals by quickly changing code, not manually re-labeling entire datasets.




Image

Governable

Version and audit data like code, leading to more responsive and ethical deployments.




Image

Collaborative

Incorporate subject matter experts' knowledge by collaborating around a common interface–the data needed to train models.




Image

Private

Reduce risk and meet compliance by labeling programmatically and keeping data in-house, not shipping to external annotators.

















Platform —




Snorkel Flow




The only AI platform that lets you label data programmatically, train models efficiently, improve performance iteratively, and deploy applications rapidly.




Image
Image

01



Label & Build
Label and build training data programmatically in hours without months of hand-labeling
Image

02



Integrate & Manage
Automatically clean, integrate, and manage programmatic training data from all sources
Image


03



Train & Deploy
Train and deploy state-of-the-art machine learning models in-platform or via Python SDK
Image


04



Analyze & Monitor
Analyze and monitor model performance to rapidly identify and correct error modes in the data







Learn more

Image






Solutions —




Endless Use Cases




Build and deploy AI applications, previously blocked on training data, on an enterprise-grade platform.



FINANCIAL SERVICES
News Analytics
Extract entities, events, and relationships to improve investment and risk strategies and more.
FINANCIAL SERVICES
Financial Spreading
Manage credit risk by collecting financial and non-financial data in any format from statements.
FINANCIAL SERVICES
Contract Intelligence
Extract and organize data from a wide variety of complex contracts efficiently.
FINANCIAL SERVICES
Account Identification
Confirm customer identity to open more accounts, improve ACH success rates, and reduce fraud.
FINANCIAL SERVICES
Know Your Customer (KYC)
Know your customers better to offer new services via robo-advisors or AI-assisted wealth management.
FINANCIAL SERVICES
Algo Trading
Employ state-of-the-art ML models, trained on your private data, and customized for your trading strategies.
FINANCIAL SERVICES
Process Automation
Automate back-office operations for accounting reconciliations, check validation, overdraft protection, and more.
FINANCIAL SERVICES
Credit Approval
Predict credit-worthiness with fairness and precision using extensive data and organizational resources.
FINANCIAL SERVICES
Cyber Risk Management
Investigate suspicious IP addresses or traffic patterns from network data to prevent cybersecurity breaches.
FINANCIAL SERVICES
Customer Service
Predict issues and route interactions to the right team or fine-tune IVR or chatbot responses.
FINANCIAL SERVICES
Smart Search
Create smart search indexes to locate records with specific attributes, like loan rates or credit scores.
FINANCIAL SERVICES
Anti Money Laundering
Identify money laundering patterns by extracting client ID, IBAN number, and transaction details.
FINANCIAL SERVICES
Compliance Monitoring
Run comprehensive compliance analysis on contracts, emails, reports, and other data sources.
INSURANCE
Risk Classification
Classify policy documents on the basis of the behavior or occupation to assess risk.
INSURANCE
Claims Fraud Detection
Monitor documents and forms to identify potentially suspicious claims.
INSURANCE
Underwriting
Evaluate risks associated with a policy by extracting contextual data from contracts and forms.
INSURANCE
Claims Processing
Recognize entities like involved parties, loss amount, and the policyholder to process claims faster.
HEALTHCARE
Clinical Trial Matching
Determine clinical trial candidates by categorizing patient records.
HEALTHCARE
Condition Detection
Detect anomalies in patient health records to find potentially problematic medical conditions.
HEALTHCARE
Drug Discovery
Automate data extraction from clinical trial records for digital pathology.
HEALTHCARE
Clinical Decision Support
Extract previous health events from patient records to assist with diagnoses and treatment recommendations.
HEALTHCARE
Patient Identification
Find entities in patient records to identify health patterns and improve diagnoses.
TELECOM
Customer Segmentation
Build customized promotions by analyzing customer behavior and demographics.
TELECOM
Traffic Monitoring
Predict traffic with precision to efficiently reallocate resources in real-time.
TELECOM
Interaction Analytics
Understand every customer interaction deeply by analyzing chats, emails, and tickets.
TELECOM
Intrusion Detection
Detect anomalous internet traffic and respond to stop malicious activity.
TELECOM
Geospatial Analysis
Tie information from documents to geospatial analysis to discover new market opportunities.
TELECOM
Network Optimization
Automatically monitor network performance, detect and respond to issues immediately.
TELECOM
Customer Support
Understand customer sentiment and invest in better support options to alleviate friction points.
PUBLIC SECTOR
Back-Office Automation
Manage, sort, and process forms and documents while maintaining data security.
PUBLIC SECTOR
Cybersecurity
Identify network threats, protect from virus or malware, model user behavior, and monitor emails.
PUBLIC SECTOR
Resource Management
Automate management processes and create resource allotment plans or predictive maintenance schedules.
PUBLIC SECTOR
Constituent Communications
Route constituent communications to the right department, offer chatbot services or analyze survey data.
PUBLIC SECTOR
Planning and Policy Making
Process constituent feedback and expert opinions on a large scale to inform policy-making.
PUBLIC SECTOR
Crime Detection
Support field investigation with machine learning to detect non-compliance, money laundering or other financial crimes.
PUBLIC SECTOR
Geospatial Apps
Augment image and vision-based GIS applications to use the sensor, coordinates, zip codes, and other textual data.
SOFTWARE
Search Engine Optimization
Identify named entities in customer search queries and optimize content on websites.
SOFTWARE
Email Filtering & Routing
Classify emails to remove spam and route queries to the correct channels.
SOFTWARE
Invoice Processing
Extract information from invoices or receipts for accounting or expense analysis.
SOFTWARE
Customer Support
Understand customer sentiment to maximize investments in improving customer engagement.
SOFTWARE
Content Moderation
Personalize and moderate content based on user behavior, attributes, and policies
RETAIL
Product Recommendation
Enhance recommender systems by identifying entities (price, keywords, etc.) in product descriptions.
RETAIL
Product Catalogs
Extract product attributes from tables, lists, and forms for cataloging.
RETAIL
Customer Analytics
Extract detailed information from customer receipts to understand analyze shopping behaviors.
RETAIL
Quality Assurance
Detect incidents in call logs to prevent loss of revenue, negative social posts, or calls from upset customers.
RETAIL
RETAIL
Analyze social media posts or consumer surveys to assess brand impressions.
RETAIL
Product Reviews
Classify customer reviews to analyze and understand shopping behaviors.
FINANCIAL SERVICES
News Analytics
Extract entities, events, and relationships to improve investment and risk strategies and more.
FINANCIAL SERVICES
Financial Spreading
Manage credit risk by collecting financial and non-financial data in any format from statements.
FINANCIAL SERVICES
Contract Intelligence
Extract and organize data from a wide variety of complex contracts efficiently.
FINANCIAL SERVICES
Account Identification
Confirm customer identity to open more accounts, improve ACH success rates, and reduce fraud.
FINANCIAL SERVICES
Know Your Customer (KYC)
Know your customers better to offer new services via robo-advisors or AI-assisted wealth management.
FINANCIAL SERVICES
Algo Trading
Employ state-of-the-art ML models, trained on your private data, and customized for your trading strategies.
FINANCIAL SERVICES
Process Automation
Automate back-office operations for accounting reconciliations, check validation, overdraft protection, and more.
FINANCIAL SERVICES
Credit Approval
Predict credit-worthiness with fairness and precision using extensive data and organizational resources.
FINANCIAL SERVICES
Cyber Risk Management
Investigate suspicious IP addresses or traffic patterns from network data to prevent cybersecurity breaches.
FINANCIAL SERVICES
Customer Service
Predict issues and route interactions to the right team or fine-tune IVR or chatbot responses.
FINANCIAL SERVICES
Smart Search
Create smart search indexes to locate records with specific attributes, like loan rates or credit scores.
FINANCIAL SERVICES
Anti Money Laundering
Identify money laundering patterns by extracting client ID, IBAN number, and transaction details.
FINANCIAL SERVICES
Compliance Monitoring
Run comprehensive compliance analysis on contracts, emails, reports, and other data sources.
INSURANCE
Risk Classification
Classify policy documents on the basis of the behavior or occupation to assess risk.
INSURANCE
Claims Fraud Detection
Monitor documents and forms to identify potentially suspicious claims.
INSURANCE
Underwriting
Evaluate risks associated with a policy by extracting contextual data from contracts and forms.
INSURANCE
Claims Processing
Recognize entities like involved parties, loss amount, and the policyholder to process claims faster.
HEALTHCARE
Clinical Trial Matching
Determine clinical trial candidates by categorizing patient records.
HEALTHCARE
Condition Detection
Detect anomalies in patient health records to find potentially problematic medical conditions.
HEALTHCARE
Drug Discovery
Automate data extraction from clinical trial records for digital pathology.
HEALTHCARE
Clinical Decision Support
Extract previous health events from patient records to assist with diagnoses and treatment recommendations.
HEALTHCARE
Patient Identification
Find entities in patient records to identify health patterns and improve diagnoses.
TELECOM
Customer Segmentation
Build customized promotions by analyzing customer behavior and demographics.
TELECOM
Traffic Monitoring
Predict traffic with precision to efficiently reallocate resources in real-time.
TELECOM
Interaction Analytics
Understand every customer interaction deeply by analyzing chats, emails, and tickets.
TELECOM
Intrusion Detection
Detect anomalous internet traffic and respond to stop malicious activity.
TELECOM
Geospatial Analysis
Tie information from documents to geospatial analysis to discover new market opportunities.
TELECOM
Network Optimization
Automatically monitor network performance, detect and respond to issues immediately.
TELECOM
Customer Support
Understand customer sentiment and invest in better support options to alleviate friction points.
PUBLIC SECTOR
Back-Office Automation
Manage, sort, and process forms and documents while maintaining data security.
PUBLIC SECTOR
Cybersecurity
Identify network threats, protect from virus or malware, model user behavior, and monitor emails.
PUBLIC SECTOR
Resource Management
Automate management processes and create resource allotment plans or predictive maintenance schedules.
PUBLIC SECTOR
Constituent Communications
Route constituent communications to the right department, offer chatbot services or analyze survey data.
PUBLIC SECTOR
Planning and Policy Making
Process constituent feedback and expert opinions on a large scale to inform policy-making.
PUBLIC SECTOR
Crime Detection
Support field investigation with machine learning to detect non-compliance, money laundering or other financial crimes.
PUBLIC SECTOR
Geospatial Apps
Augment image and vision-based GIS applications to use the sensor, coordinates, zip codes, and other textual data.
SOFTWARE
Search Engine Optimization
Identify named entities in customer search queries and optimize content on websites.
SOFTWARE
Email Filtering & Routing
Classify emails to remove spam and route queries to the correct channels.
SOFTWARE
Invoice Processing
Extract information from invoices or receipts for accounting or expense analysis.
SOFTWARE
Customer Support
Understand customer sentiment to maximize investments in improving customer engagement.
SOFTWARE
Content Moderation
Personalize and moderate content based on user behavior, attributes, and policies
RETAIL
Product Recommendation
Enhance recommender systems by identifying entities (price, keywords, etc.) in product descriptions.
RETAIL
Product Catalogs
Extract product attributes from tables, lists, and forms for cataloging.
RETAIL
Customer Analytics
Extract detailed information from customer receipts to understand analyze shopping behaviors.
RETAIL
Quality Assurance
Detect incidents in call logs to prevent loss of revenue, negative social posts, or calls from upset customers.
RETAIL
RETAIL
Analyze social media posts or consumer surveys to assess brand impressions.
RETAIL
Product Reviews
Classify customer reviews to analyze and understand shopping behaviors.








Image
Technology —




Started at the Stanford AI Lab





The Snorkel AI founding team started the Snorkel Research Project at Stanford AI Lab in 2015, where we set out to explore a higher-level interface to machine learning through training data. This project was sponsored by Google, Intel, DARPA, and several other leading organizations. 





Snorkel's research is represented in 40+ academic papers at:



Image

Learn more






Resources —



Learn More




Read about groundbreaking techniques for programmatic labeling and weak supervision developed by Team Snorkel and the broader data science community.




NATURE COMMS

Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences

J. Fries, et al, 2019
IEEE IVS

Utilizing weak supervision to infer complex objects in autonomous driving data

Z. Wheng, et al, 2019
MEDIUM

Understanding Snorkel

Anna Zubova
Research Paper

Trove: Ontology-driven Weak Supervision for
Medical Entity Classification

J. Fries, et al. 2020
AAAI

Training Complex Models with Multi-Task Weak Supervision

A. Ratner, et al, 2019
ACL

Training Classifiers with Natural Language Explanations

B. Hancock, et al, 2018
Research Paper

Train and You’ll Miss It: Interactive Model Iteration with Weak Supervision…

M. Chen, et al, 2020
FAST FOWARD LABS

Taking Snorkel for a Spin

Fast Forward Labs at Cloudera
Research Paper

SwellShark: A Generative Model for Biomedical NER without Labeled Data

J. Fries, et al, 2017
AI4 CYBER SUMMIT

State of AI in Cyber

Ai4 Cyber Summit
Course

Stanford University: CS229 – Machine Learning

Chris Re
Course

Stanford University: CS 329S: Machine Learning Systems Design

Chip Huyen
KDD

Software 2.0 and Snorkel: Going Beyond Hand-Labeled Data

C. Ré, 2018 (invited)
Research Paper

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

P. Varma, et al, 2017
VLDB

Snuba: Automating Weak Supervision to Label Training Data

P. Varma and C. Ré, 2019
VLDB

Snorkel: Rapid Training Data Creation with Weak Supervision

Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré
Video

Snorkel: Programming Training Data

Paroma Varma
SIGMOD

Snorkel: Fast Training Set Generation for Information Extraction

A. Ratner, et al, 2017
SNORKEL SCIENCE TALKS

Measuring NLP Progress with Sebastian Ruder

Team Snorkel
SIGMOD

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

A. Ratner, et al, 2018
SIGMOD

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

S. Bach, et al, 2019
SNORKEL BLOG

Snorkel AI:
Putting Data First in ML Development

Alex Ratner
TOWARDS DATA SCIENCE

Snorkel — A Weak Supervision System

Shreya Ghelani
NEURIPS

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

V. Chen, et al, 2019
ICCV

Scene Graph Prediction With Limited Labels

V. Chen, et al, 2019
MLSYS SEMINAR

Programmatically Building & Managing Training Data

Alex Ratner at Stanford MLSys Seminar, 2020
AICAMP

Programmatic Supervision for Machine Learning

Paroma Varma at AICamp, 2020
SNORKEL SCIENCE TALKS

Productionizing ML Research With Thomas Wolf

Team Snorkel