Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting
The Role of Massively Multi-Task and Weak Supervision in Software 2.0
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
A Machine-Compiled Database of Genome-Wide Association Studies
Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.
Software 2.0 and Snorkel: Beyond Hand-Labeled Data
This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.
Fonduer: Knowledge Base Construction From Richly Formatted Data
Introducing Fonduer, a machine-learning-based KBC system for richly formatted data.
Deep Text Mining of Instagram Data Without Strong Supervision
This paper showcases methods for unsupervised mining of fashion attributes from Instagram text, which can enable a new kind of user recommendation in the fashion domain.
Snorkel: Fast Training Set Generation for Information Extraction
Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.
Learning to Compose Domain-Specific Transformations for Data Augmentation
Automating data augmentation by learning a generative sequence model over user-specified transformation functions.
Learning the Structure of Generative Models Without Labeled Data
Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.
Data Programming: Creating Large Training Sets, Quickly
A paradigm for labeling training datasets programmatically rather than by hand.