Alex Ratner

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCAbased algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real-world tasks. Under certain conditions, we show that the amount of unlabeled data needed can scale sublinearly or even logarithmically with the number of sources m, improving over previous efforts that ignore the...

Research Paper

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCAbased algorithm for learning these dependency structures, establish improved theoretical recovery…

Dec 09, 2019 •

P. Varma, et al, 2019

Learn more about Learning Dependency Structures for Weak Supervision Models

Interactive Programmatic Labeling for Weak Supervision

Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.

Research Paper

Interactive Programmatic Labeling for Weak Supervision

Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.

Dec 08, 2019 •

B. Cohen-Wang, et al, 2019

Learn more about Interactive Programmatic Labeling for Weak Supervision

A Machine-Compiled Database of Genome-Wide Association Studies

Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.

Research Paper

A Machine-Compiled Database of Genome-Wide Association Studies

Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.

Dec 06, 2019 •

V. Kuleshov, et al, 2019

Learn more about A Machine-Compiled Database of Genome-Wide Association Studies

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

Presenting Snorkel MeTal, an end-to-end system for multi-task learning.

Research Paper

Snorkel MeTaL: Weak Supervision for Multi-Task Learning

Presenting Snorkel MeTal, an end-to-end system for multi-task learning.

Dec 18, 2018 •

A. Ratner, et al, 2018

Learn more about Snorkel MeTaL: Weak Supervision for Multi-Task Learning

Snorkel: Fast Training Set Generation for Information Extraction

Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.

Research Paper

Snorkel: Fast Training Set Generation for Information Extraction

Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.

Dec 20, 2017 •

A. Ratner, et al, 2017

Learn more about Snorkel: Fast Training Set Generation for Information Extraction

Learning to Compose Domain-Specific Transformations for Data Augmentation

Automating data augmentation by learning a generative sequence model over user-specified transformation functions.

Research Paper

Learning to Compose Domain-Specific Transformations for Data Augmentation

Automating data augmentation by learning a generative sequence model over user-specified transformation functions.

Dec 19, 2017 •

A. Ratner, et al, 2017

Learn more about Learning to Compose Domain-Specific Transformations for Data Augmentation

Learning the Structure of Generative Models Without Labeled Data

Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.

Research Paper

Learning the Structure of Generative Models Without Labeled Data

Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.

Dec 18, 2017 •

S. Bach, et al, 2017

Learn more about Learning the Structure of Generative Models Without Labeled Data

Swellshark: A Generative Model for Biomedical Named Entity Recognition Without Labeled Data

Introducing SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly.

Research Paper

Swellshark: A Generative Model for Biomedical Named Entity Recognition Without Labeled Data

Introducing SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly.

Nov 13, 2017 •

J. Fries, et al, 2017

Learn more about Swellshark: A Generative Model for Biomedical Named Entity Recognition Without Labeled Data

Snorkel: Rapid Training Data Creation With Weak Supervision

This paper presents a flexible interface layer to write labeling functions based on experience.

Research Paper

Snorkel: Rapid Training Data Creation With Weak Supervision

This paper presents a flexible interface layer to write labeling functions based on experience.

Oct 04, 2017 •

Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

Learn more about Snorkel: Rapid Training Data Creation With Weak Supervision

Alex Ratner

The latest from Alex

For models that need to be right. Not just good enough.

How do you want to work with Snorkel?