This work shows a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate.
Introducing FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions
This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.
Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
Presenting Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large, unlabeled dataset in the weak supervision setting.
This is first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting
This work focuses on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real world tasks.
This work develops a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models.
This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.
- Page 1 of 2