This work shows a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate.
Introducing FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions
This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.
Presenting Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules.
Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
Proposing Slice-based Learning, a new programming model in which the slicing function (SF), a programmer abstraction, is used to specify additional model capacity for each slice.
Proposing Dugong, the first framework to model multi-resolution weak supervision sources with complex correlations to assign probabilistic labels to training data.
Showcasing state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data.
This work focuses on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real world tasks.
- Page 1 of 2