AI Beyond Hand-Labeling




Unlock a Better, Faster Way To Build Applications With Snorkel Flow, the First Truly Data-Centric AI Platform




Request demo



Technology developed and deployed with the world’s leading organizations
Image
Image
Image
Image
Image
Image
Image
Image



Benefits —

Data-First AI Development

Unlock a radically new, data-first way to develop and deploy AI applications with programmatic labeling.





Image

Faster Development

Reduce development time by 10-100x with programmatic labeling.



Image

High-Accuracy Models

Increase predictive performance with massive training datasets.



Image

Adaptable Applications

Adapt to changing data or business goals without re-labeling from scratch.



Image

Auditable AI

Easily version and audit with code-based training data creation.



Image

Collaborative Workflows

Bring together data scientists, developers, and domain experts to build solutions previously not possible.



Image

Privacy-Safe Labeling

Keep data in-house or label without humans viewing the majority of the data.
















Platform —




Snorkel Flow




The only AI platform that lets you label data programmatically, train models efficiently, improve performance iteratively, and deploy applications rapidly.




Image
Image

01


Label & Build
Label and build training data programmatically in hours without months of hand-labeling
Image

02


Integrate & Manage
Automatically clean, integrate, and manage programmatic training data from all sources
Image

03


Train & Deploy
Train and deploy state-of-the-art machine learning models in-platform or via Python SDK
Image

04


Analyze & Monitor
Analyze and monitor model performance to rapidly identify and correct error modes in the data



Instead of hand-labeling millions of data points by hand, automatically label vast amounts of training data using programmatic labeling functions—based on rules, heuristics, ontologies, legacy systems, and more—via a no-code UI or Python SDK. 


Learn more


Image






Solutions —

Endless Use Cases

Build and deploy AI applications, previously blocked on training data, on an enterprise-grade platform.





FINANCIAL SERVICES




News Analytics


Extract entities, events, and relationships to improve investment and risk strategies and more.

FINANCIAL SERVICES




Financial Spreading


Manage credit risk by collecting financial and non-financial data in any format from statements.

FINANCIAL SERVICES



Contract Intelligence

Extract and organize data from a wide variety of complex contracts efficiently.

FINANCIAL SERVICES



Account Identification


Confirm customer identity to open more accounts, improve ACH success rates, and reduce fraud.

FINANCIAL SERVICES



Know Your Customer (KYC)


Know your customers better to offer new services via robo-advisors or AI-assisted wealth management.

FINANCIAL SERVICES



Algo Trading


Employ state-of-the-art ML models, trained on your private data, and customized for your trading strategies.

FINANCIAL SERVICES



Process Automation


Automate back-office operations for accounting reconciliations, check validation, overdraft protection, and more.

FINANCIAL SERVICES



Credit Approval


Predict credit-worthiness with fairness and precision using extensive data and organizational resources.

FINANCIAL SERVICES



Cyber Risk Management


Investigate suspicious IP addresses or traffic patterns from network data to prevent cybersecurity breaches.

FINANCIAL SERVICES



Customer Service


Predict issues and route interactions to the right team or fine-tune IVR or chatbot responses.

FINANCIAL SERVICES



Smart Search


Create smart search indexes to locate records with specific attributes, like loan rates or credit scores.

FINANCIAL SERVICES



Anti Money Laundering


Identify money laundering patterns by extracting client ID, IBAN number, and transaction details.

FINANCIAL SERVICES



Compliance Monitoring


Run comprehensive compliance analysis on contracts, emails, reports, and other data sources.

INSURANCE



Risk Classification

Classify policy documents on the basis of the behavior or occupation to assess risk.

INSURANCE



Claims Fraud Detection


Monitor documents and forms to identify potentially suspicious claims.

INSURANCE



Underwriting


Evaluate risks associated with a policy by extracting contextual data from contracts and forms.

INSURANCE



Claims Processing


Recognize entities like involved parties, loss amount, and the policyholder to process claims faster.

HEALTHCARE



Clinical Trial Matching

Determine clinical trial candidates by categorizing patient records.

HEALTHCARE




Condition Detection


Detect anomalies in patient health records to find potentially problematic medical conditions.

HEALTHCARE




Drug Discovery


Automate data extraction from clinical trial records for digital pathology.

HEALTHCARE




Clinical Decision Support


Extract previous health events from patient records to assist with diagnoses and treatment recommendations.

HEALTHCARE




Patient Identification


Find entities in patient records to identify health patterns and improve diagnoses.

TELECOM




Customer Segmentation

Build customized promotions by analyzing customer behavior and demographics.

TELECOM




Traffic Monitoring


Predict traffic with precision to efficiently reallocate resources in real-time.

TELECOM




Interaction Analytics


Understand every customer interaction deeply by analyzing chats, emails, and tickets.

TELECOM




Intrusion Detection


Detect anomalous internet traffic and respond to stop malicious activity.

TELECOM




Geospatial Analysis


Tie information from documents to geospatial analysis to discover new market opportunities.

TELECOM




Network Optimization


Automatically monitor network performance, detect and respond to issues immediately.

TELECOM




Customer Support


Understand customer sentiment and invest in better support options to alleviate friction points.

PUBLIC SECTOR




Back-Office Automation


Manage, sort, and process forms and documents while maintaining data security.

PUBLIC SECTOR




Cybersecurity


Identify network threats, protect from virus or malware, model user behavior, and monitor emails.

PUBLIC SECTOR




Resource Management


Automate management processes and create resource allotment plans or predictive maintenance schedules.

PUBLIC SECTOR




Constituent Communications


Route constituent communications to the right department, offer chatbot services or analyze survey data.

PUBLIC SECTOR




Planning and Policy Making


Process constituent feedback and expert opinions on a large scale to inform policy-making.

PUBLIC SECTOR




Crime Detection


Support field investigation with machine learning to detect non-compliance, money laundering or other financial crimes.

PUBLIC SECTOR




Geospatial Apps


Augment image and vision-based GIS applications to use the sensor, coordinates, zip codes, and other textual data.

SOFTWARE



Search Engine Optimization

Identify named entities in customer search queries and optimize content on websites.

SOFTWARE




Email Filtering & Routing


Classify emails to remove spam and route queries to the correct channels.

SOFTWARE




Invoice Processing


Extract information from invoices or receipts for accounting or expense analysis.

SOFTWARE




Customer Support


Understand customer sentiment to maximize investments in improving customer engagement.

SOFTWARE




Content Moderation


Personalize and moderate content based on user behavior, attributes, and policies

RETAIL



Product Recommendation

Enhance recommender systems by identifying entities (price, keywords, etc.) in product descriptions.

RETAIL




Product Catalogs


Extract product attributes from tables, lists, and forms for cataloging.

RETAIL




Customer Analytics


Extract detailed information from customer receipts to understand analyze shopping behaviors.

RETAIL




Quality Assurance


Detect incidents in call logs to prevent loss of revenue, negative social posts, or calls from upset customers.

RETAIL




Brand Monitoring


Analyze social media posts or consumer surveys to assess brand impressions.

RETAIL




Product Reviews


Classify customer reviews to analyze and understand shopping behaviors.








Image
Technology —

Originated at the Stanford AI Lab




Snorkel's technology is based on novel research carried out from the Stanford AI Lab. It has been co-developed with some of the world’s leading organizations, represented in over 40 research papers, and taught in several machine learning courses at top academic institutions.




Research represented in 40+ academic papers at:


Image
Learn more






Resources —

Learn More

Read about groundbreaking techniques for programmatic labeling and weak supervision developed by Team Snorkel and the broader data science community.




NATURE COMMS

Weakly Supervised Classification of Aortic Valve Malformations Using …

J. Fries, et al, 2019
IEEE IVS

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving…

Z. Wheng, et al, 2019
MEDIUM

Understanding Snorkel

Anna Zubova
Research Paper

Trove: Ontology-driven Weak Supervision for
Medical Entity Classification

J. Fries, et al. 2020
AAAI

Training Complex Models with Multi-Task Weak Supervision

A. Ratner, et al, 2019
ACL

Training Classifiers with Natural Language Explanations

B. Hancock, et al, 2018
Research Paper

Train and You’ll Miss It: Interactive Model Iteration with Weak Supervision…

M. Chen, et al, 2020
CIDR

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

A. Ratner, et al, 2019
FAST FOWARD LABS

Taking Snorkel for a Spin

Fast Forward Labs at Cloudera
Research Paper

SwellShark: A Generative Model for Biomedical NER without Labeled Data

J. Fries, et al, 2017
AI4 CYBER SUMMIT

State of AI in Cyber

Ai4 Cyber Summit
Course

Stanford University: CS229 – Machine Learning

Chris Re
Course

Stanford University: CS 329S: Machine Learning Systems Design

Chip Huyen
KDD

Software 2.0 and Snorkel: Going Beyond Hand-Labeled Data

C. Ré, 2018 (invited)
Research Paper

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

P. Varma, et al, 2017
VLDB

Snuba: Automating Weak Supervision to Label Training Data

P. Varma and C. Ré, 2019
VLDB

Snorkel: Rapid Training Data Creation With Weak Supervision

A. Ratner, et al, 2018
Video

Snorkel: Programming Training Data

Paroma Varma
SIGMOD

Snorkel: Fast Training Set Generation for Information Extraction

A. Ratner, et al, 2017
SNORKEL SCIENCE TALKS

Measuring NLP Progress with Sebastian Ruder

Team Snorkel
SIGMOD

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

A. Ratner, et al, 2018
SIGMOD

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

S. Bach, et al, 2019
SNORKEL BLOG

Snorkel AI:
Putting Data First in ML Development

Alex Ratner
TOWARDS DATA SCIENCE

Snorkel — A Weak Supervision System

Shreya Ghelani
NEURIPS

Slice-based Learning: A Programming Model for Residual Learning…

V. Chen, et al, 2019
ICCV

Scene Graph Prediction With Limited Labels

V. Chen, et al, 2019
MLSYS SEMINAR

Programmatically Building & Managing Training Data

Alex Ratner at Stanford MLSys Seminar, 2020
AICAMP

Programmatic Supervision for Machine Learning

Paroma Varma at AICamp, 2020
SNORKEL SCIENCE TALKS

Productionizing ML Research With Thomas Wolf

Team Snorkel
MLSYS SEMINAR

Principles of Good Machine Learning Systems Design

Chip Huyen at Stanford MLSys Seminar, 2020
Book

Practical Weak Supervision: Doing More with Less Data

Wee Hyong Tok, Amit Bahree, Senja Filipi
DEEM @ SIGMOD

Osprey: Weak Supervision of Imbalanced Extraction Problems without Code

E. Bringer, et al, 2019
NATURE COMMS

Ontology-driven weak supervision for clinical entity classification in electronic health records

J. A. Fries, et al, 2021
NEURIPS

Multi-Resolution Weak Supervision for Sequential Data

P. Varma, et al, 2019
NPJ MEDICINE

Medical Device Surveillance With Electronic Health Records

A. Callahan, et al, 2019
OPEN CORE SUMMIT

Making ML Practical With Snorkel

Braden Hancock at Open Core Summit 2020
SNORKEL BLOG

Machine Learning Production Myths

Chip Huyen
VLDB

Leveraging Organizational Resources to Adapt Models to New Data Modalities

S. Suri, et al, 2020
NEURIPS

Learning to Compose Domain-Specific Transformations for Data Augmentation

A. Ratner, et al, 2017
ICML

Learning the Structure of Generative Models without Labeled Data

S. Bach, et al, 2017
ICML

Learning Dependency Structures for Weak Supervision Models

P. Varma, et al, 2019
KDD

Interactive Programmatic Labeling for Weak Supervision

B. Cohen-Wang, et al, 2019
NEURIPS

Inferring Generative Model Structure with Static Analysis

P. Varma, et al, 2017
SNORKEL BLOG

How To Overcome Practical Challenges for AI in the Public Sector

Charlie Greenbacker
SNORKEL BLOG

How to Overcome Practical Challenges for AI in Healthcare

Brandon Yang
SNORKEL BLOG

How To Overcome Practical Challenges for AI in Finance

Manas Joglekar
STANFORD HAI

How Machine Learning is Changing Software

Chris Re at Stanford HAI, 2021
KDNUGGETS

Hand labeling is the Past. The Future is #NoLabel AI

Russell Jurney
BOREALIS AI BLOG

Generating Labels for Model Training Using Weak Supervision review

F. Duplessis, S. Chow, S. Prince at Borealis AI
SIGMOD

Fonduer: Knowledge Base Construction from Richly Formatted Data

S. Wu, et al, 2018
ICML

Fast & Three-rious: Speed Up Weak Supervision with Triplet Methods

D. Fu, et al, 2020
ICWI

Deep Text Mining of Instagram Data without Strong Supervision

K. Hammar, et al, 2018
SNORKEL BLOG

Debugging AI Applications Pipeline

Chip Huyen
NEURIPS

Data Programming: Creating Large Training Sets, Quickly

A. Ratner, et al. 2016
SIGMOD

Data programming with DDLite: Putting Humans in a Different Part of the Loop

H. Ehrenberg, et al, 2016
CELL PATTERNS

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

J. Dunnmon, et al, 2020
MEDIUM

Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision

Abraham Starosta
Course

Brown University: CSCI 2952-C – Learning with Limited Labeled Data

Stephen Bach
AAAI

Bootstrapping Conversational Agents with Weak Supervision

N. Mallinar, et al, 2019
Book

Advanced Natural Language Processing

Ashish Bansal
NATURE

A Machine-Compiled Database of Genome-Wide Association Studies

V. Kuleshov, et al, 2019
BCM MIDM

A Clinical Text Classification Paradigm Using Weak Supervision…

Y. Wang, et al, 2019