This work shows a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate.
Introducing FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions
This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
Presenting Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large, unlabeled dataset in the weak supervision setting.
Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.
Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.
This paper presents a framework called search, label, and propagate (SLP) for bootstrapping intents from existing chat logs using weak supervision.
This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.
Introducing Socratic learning, a paradigm that uses feedback from a discriminative model to automatically identify latent data subsets in training data.