Information Extraction




Rapidly build AI-powered applications that extract information from unstructured text, PDF, tables, or forms from millions of documents without expensive hand-labeling using Snorkel Flow.

Request demo

Image





Technology developed and deployed with the world’s leading organizations
Image
Image
Image
Image
Image
Image
Image
Image



Overview —

Targeted Applications to Tackle Any Entity


Extract useful data from any tables, cells, and forms linked to all headers, units, or references.



Image
Faster, Lower-cost Development
Use programmatic labeling to develop high-quality AI applications in hours instead of spending weeks or months on expensive hand-labeling.
Image
Rapidly Adaptable
Monitor for changes in the data, and rapidly adapt using built-in error analysis tools. Zoom in on errors to fine-tune training data & models with guided iteration.
Image
High-accuracy Models
Leverage large amounts of labeled and unlabeled data, NLP primitives, and state-of-the-art model architectures to build high-accuracy models.
Image
Flexible Integrations
Easily integrate labeling, training, and analysis pipelines defined over diverse input types–text, PDF, HTML, and more–with downstream applications using APIs or a Python SDK.






Industry Use Cases —

Information Extraction Customized for Your Workflow


Build industry-specific AI applications combining state-of-the-art machine learning approaches with industry-specific best practices and last-mile connectors, all on an enterprise-scale platform.



FINANCIAL SERVICES



Contract Intelligence

Banks can classify contracts by terms and conditions to smoothly ensure regulatory complience.
TELECOM & CYBER



Customer Segmentation

Telecom organizations can classify customer usage documents to target promotional offers.
HEALTHCARE



Clinical Trial Matching

Biotech organizations can classify patient records to identify actionable clinical trial candidates.
INSURANCE



Risk Classification

Insurance underwriters can classify policy documents by behavioral or occupational variables to assess risk.

SOFTWARE



Search Engine Optimization

Software companies can recognize named entities in customer search queries and to optimize website content.
RETAIL



Product Recommendation

E-commerce sites can recognize entities in product descriptions (price, key words, etc.) to improve recommender systems.






Case Study —

Image
A top U.S. bank uses Snorkel Flow to quickly build AI applications that classify and extract information from contracts and other legal documents.



Problem




The bank estimated that, for a time-sensitive use case, labeling data by hand would take over a month.

Solution




With Snorkel Flow, the team produced a AI-powered contract intelligence application that was over 99% accurate in under 24 hours.

Results




The resulting AI application was quickly and easily adapted to new problems.

99.1%
Snorkel Flow Accuracy
<0hrs
To develop the first custom ML model
<24hrs
From problem start
+0%
Accuracy for contract classification
>250K
# Documents processed
0K
Contracts processed in minutes

Read more






An End-to-end ML Platform —

Designed for Collaboration




Image

For Data Scientists


  • Ready-to-use model zoo
  • Auto-generated analysis tools
  • Integrated Python notebooks
Image

For Domain Experts


  • Rich data annotation suite
  • Intuitive, no-code labeling UI
  • Model error analysis reports
Image

For Developers


  • Fully interoperable API and web UI
  • Write custom operators with Python SDK
  • Integrations to deploy models at scale






Resources —

Explore More About Snorkel


Learn more about groundbreaking techniques for programmatic labeling and weak supervision developed by Team Snorkel and the broader data science community.



NATURE COMMS

Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences

J. Fries, et al, 2019
IEEE IVS

Utilizing weak supervision to infer complex objects in autonomous driving data

Z. Wheng, et al, 2019
MEDIUM

Understanding Snorkel

Anna Zubova
Research Paper

Trove: Ontology-driven Weak Supervision for
Medical Entity Classification

J. Fries, et al. 2020
AAAI

Training Complex Models with Multi-Task Weak Supervision

A. Ratner, et al, 2019
ACL

Training Classifiers with Natural Language Explanations

B. Hancock, et al, 2018
Research Paper

Train and You’ll Miss It: Interactive Model Iteration with Weak Supervision…

M. Chen, et al, 2020
FAST FOWARD LABS

Taking Snorkel for a Spin

Fast Forward Labs at Cloudera
Research Paper

SwellShark: A Generative Model for Biomedical NER without Labeled Data

J. Fries, et al, 2017
AI4 CYBER SUMMIT

State of AI in Cyber

Ai4 Cyber Summit
Course

Stanford University: CS229 – Machine Learning

Chris Re
Course

Stanford University: CS 329S: Machine Learning Systems Design

Chip Huyen
KDD

Software 2.0 and Snorkel: Going Beyond Hand-Labeled Data

C. Ré, 2018 (invited)
Research Paper

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

P. Varma, et al, 2017
VLDB

Snuba: Automating Weak Supervision to Label Training Data

P. Varma and C. Ré, 2019
VLDB

Snorkel: Rapid Training Data Creation With Weak Supervision

A. Ratner, et al, 2018
Video

Snorkel: Programming Training Data

Paroma Varma
SIGMOD

Snorkel: Fast Training Set Generation for Information Extraction

A. Ratner, et al, 2017
SNORKEL SCIENCE TALKS

Measuring NLP Progress with Sebastian Ruder

Team Snorkel
SIGMOD

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

A. Ratner, et al, 2018
SIGMOD

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

S. Bach, et al, 2019
SNORKEL BLOG

Snorkel AI:
Putting Data First in ML Development

Alex Ratner
TOWARDS DATA SCIENCE

Snorkel — A Weak Supervision System

Shreya Ghelani
NEURIPS

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

V. Chen, et al, 2019
ICCV

Scene Graph Prediction With Limited Labels

V. Chen, et al, 2019
MLSYS SEMINAR

Programmatically Building & Managing Training Data

Alex Ratner at Stanford MLSys Seminar, 2020
AICAMP

Programmatic Supervision for Machine Learning

Paroma Varma at AICamp, 2020
SNORKEL SCIENCE TALKS

Productionizing ML Research With Thomas Wolf

Team Snorkel
MLSYS SEMINAR

Principles of Good Machine Learning Systems Design

Chip Huyen at Stanford MLSys Seminar, 2020
Book

Practical Weak Supervision: Doing More with Less Data

Wee Hyong Tok, Amit Bahree, Senja Filipi
DEEM @ SIGMOD

Osprey: Weak Supervision of Imbalanced Extraction Problems without Code

E. Bringer, et al, 2019
NATURE COMMS

Ontology-driven weak supervision for clinical entity classification in electronic health records

J. A. Fries, et al, 2021
NEURIPS

Multi-Resolution Weak Supervision for Sequential Data

P. Varma, et al, 2019
NPJ MEDICINE

Medical Device Surveillance With Electronic Health Records

A. Callahan, et al, 2019
OPEN CORE SUMMIT

Making ML Practical With Snorkel

Braden Hancock at Open Core Summit 2020
SNORKEL BLOG

Machine Learning Production Myths

Chip Huyen
VLDB

Leveraging Organizational Resources to Adapt Models to New Data Modalities

S. Suri, et al, 2020
NEURIPS

Learning to Compose Domain-Specific Transformations for Data Augmentation

A. Ratner, et al, 2017
ICML

Learning the Structure of Generative Models without Labeled Data

S. Bach, et al, 2017
ICML

Learning Dependency Structures for Weak Supervision Models

P. Varma, et al, 2019
KDD

Interactive Programmatic Labeling for Weak Supervision

B. Cohen-Wang, et al, 2019
NEURIPS

Inferring Generative Model Structure with Static Analysis

P. Varma, et al, 2017
SNORKEL BLOG

How To Overcome Practical Challenges for AI in the Public Sector

Charlie Greenbacker
SNORKEL BLOG

How to Overcome Practical Challenges for AI in Healthcare

Brandon Yang
SNORKEL BLOG

How To Overcome Practical Challenges for AI in Finance

Manas Joglekar
STANFORD HAI

How Machine Learning is Changing Software

Chris Re at Stanford HAI, 2021
KDNUGGETS

Hand labeling is the Past. The Future is #NoLabel AI

Russell Jurney
BOREALIS AI BLOG

Generating Labels for Model Training Using Weak Supervision review

F. Duplessis, S. Chow, S. Prince at Borealis AI
SIGMOD

Fonduer: Knowledge Base Construction from Richly Formatted Data

S. Wu, et al, 2018
ICML

Fast & Three-rious: Speed Up Weak Supervision with Triplet Methods

D. Fu, et al, 2020
SNORKEL BLOG

Debugging AI Applications Pipeline

Chip Huyen
NEURIPS

Data Programming: Creating Large Training Sets, Quickly

A. Ratner, et al. 2016
SIGMOD

Data programming with DDLite: Putting Humans in a Different Part of the Loop

H. Ehrenberg, et al, 2016
CELL PATTERNS

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

J. Dunnmon, et al, 2020
MEDIUM

Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision

Abraham Starosta
Course

Brown University: CSCI 2952-C – Learning with Limited Labeled Data

Stephen Bach
AAAI

Bootstrapping Conversational Agents with Weak Supervision

N. Mallinar, et al, 2019
Book

Advanced Natural Language Processing

Ashish Bansal
NATURE

A Machine-Compiled Database of Genome-Wide Association Studies

V. Kuleshov, et al, 2019
BCM MIDM

A clinical text classification paradigm using weak supervision and deep representation

Y. Wang, et al, 2019
NATURE COMMS

Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences

J. Fries, et al, 2019
IEEE IVS

Utilizing weak supervision to infer complex objects in autonomous driving data

Z. Wheng, et al, 2019
MEDIUM

Understanding Snorkel

Anna Zubova
Research Paper

Trove: Ontology-driven Weak Supervision for
Medical Entity Classification

J. Fries, et al. 2020
AAAI

Training Complex Models with Multi-Task Weak Supervision

A. Ratner, et al, 2019
ACL

Training Classifiers with Natural Language Explanations

B. Hancock, et al, 2018
Research Paper

Train and You’ll Miss It: Interactive Model Iteration with Weak Supervision…

M. Chen, et al, 2020
FAST FOWARD LABS

Taking Snorkel for a Spin

Fast Forward Labs at Cloudera
Research Paper

SwellShark: A Generative Model for Biomedical NER without Labeled Data

J. Fries, et al, 2017
AI4 CYBER SUMMIT

State of AI in Cyber

Ai4 Cyber Summit
Course

Stanford University: CS229 – Machine Learning

Chris Re
Course

Stanford University: CS 329S: Machine Learning Systems Design

Chip Huyen
KDD

Software 2.0 and Snorkel: Going Beyond Hand-Labeled Data

C. Ré, 2018 (invited)
Research Paper

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

P. Varma, et al, 2017
VLDB

Snuba: Automating Weak Supervision to Label Training Data

P. Varma and C. Ré, 2019
VLDB

Snorkel: Rapid Training Data Creation With Weak Supervision

A. Ratner, et al, 2018
Video

Snorkel: Programming Training Data

Paroma Varma
SIGMOD

Snorkel: Fast Training Set Generation for Information Extraction

A. Ratner, et al, 2017
SNORKEL SCIENCE TALKS

Measuring NLP Progress with Sebastian Ruder

Team Snorkel
SIGMOD

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

A. Ratner, et al, 2018
SIGMOD

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

S. Bach, et al, 2019
SNORKEL BLOG

Snorkel AI:
Putting Data First in ML Development

Alex Ratner
TOWARDS DATA SCIENCE

Snorkel — A Weak Supervision System

Shreya Ghelani
NEURIPS

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

V. Chen, et al, 2019
ICCV

Scene Graph Prediction With Limited Labels

V. Chen, et al, 2019
MLSYS SEMINAR

Programmatically Building & Managing Training Data

Alex Ratner at Stanford MLSys Seminar, 2020
AICAMP

Programmatic Supervision for Machine Learning

Paroma Varma at AICamp, 2020
SNORKEL SCIENCE TALKS

Productionizing ML Research With Thomas Wolf

Team Snorkel
MLSYS SEMINAR

Principles of Good Machine Learning Systems Design

Chip Huyen at Stanford MLSys Seminar, 2020
Book

Practical Weak Supervision: Doing More with Less Data

Wee Hyong Tok, Amit Bahree, Senja Filipi
DEEM @ SIGMOD

Osprey: Weak Supervision of Imbalanced Extraction Problems without Code

E. Bringer, et al, 2019
NATURE COMMS

Ontology-driven weak supervision for clinical entity classification in electronic health records

J. A. Fries, et al, 2021
NEURIPS

Multi-Resolution Weak Supervision for Sequential Data

P. Varma, et al, 2019
NPJ MEDICINE

Medical Device Surveillance With Electronic Health Records

A. Callahan, et al, 2019
OPEN CORE SUMMIT

Making ML Practical With Snorkel

Braden Hancock at Open Core Summit 2020
SNORKEL BLOG

Machine Learning Production Myths

Chip Huyen
VLDB

Leveraging Organizational Resources to Adapt Models to New Data Modalities

S. Suri, et al, 2020
NEURIPS

Learning to Compose Domain-Specific Transformations for Data Augmentation

A. Ratner, et al, 2017
ICML

Learning the Structure of Generative Models without Labeled Data

S. Bach, et al, 2017
ICML

Learning Dependency Structures for Weak Supervision Models

P. Varma, et al, 2019
KDD

Interactive Programmatic Labeling for Weak Supervision

B. Cohen-Wang, et al, 2019
NEURIPS

Inferring Generative Model Structure with Static Analysis

P. Varma, et al, 2017
SNORKEL BLOG

How To Overcome Practical Challenges for AI in the Public Sector

Charlie Greenbacker
SNORKEL BLOG

How to Overcome Practical Challenges for AI in Healthcare

Brandon Yang
SNORKEL BLOG

How To Overcome Practical Challenges for AI in Finance

Manas Joglekar
STANFORD HAI

How Machine Learning is Changing Software

Chris Re at Stanford HAI, 2021
KDNUGGETS

Hand labeling is the Past. The Future is #NoLabel AI

Russell Jurney
BOREALIS AI BLOG

Generating Labels for Model Training Using Weak Supervision review

F. Duplessis, S. Chow, S. Prince at Borealis AI
SIGMOD

Fonduer: Knowledge Base Construction from Richly Formatted Data

S. Wu, et al, 2018
ICML

Fast & Three-rious: Speed Up Weak Supervision with Triplet Methods

D. Fu, et al, 2020
SNORKEL BLOG

Debugging AI Applications Pipeline

Chip Huyen
NEURIPS

Data Programming: Creating Large Training Sets, Quickly

A. Ratner, et al. 2016
SIGMOD

Data programming with DDLite: Putting Humans in a Different Part of the Loop

H. Ehrenberg, et al, 2016
CELL PATTERNS

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

J. Dunnmon, et al, 2020
MEDIUM

Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision

Abraham Starosta
Course

Brown University: CSCI 2952-C – Learning with Limited Labeled Data

Stephen Bach
AAAI

Bootstrapping Conversational Agents with Weak Supervision

N. Mallinar, et al, 2019
Book

Advanced Natural Language Processing

Ashish Bansal
NATURE

A Machine-Compiled Database of Genome-Wide Association Studies

V. Kuleshov, et al, 2019
BCM MIDM

A clinical text classification paradigm using weak supervision and deep representation

Y. Wang, et al, 2019