Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.
This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.
Introducing Fonduer, a machine-learning-based KBC system for richly formatted data.
This paper showcases methods for unsupervised mining of fashion attributes from Instagram text, which can enable a new kind of user recommendation in the fashion domain.
Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.
Automating data augmentation by learning a generative sequence model over user-specified transformation functions.
Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.
A paradigm for labeling training datasets programmatically rather than by hand.