This work shows a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate.
Fast and Three-Rious: Speed up Weak Supervision With Triplet Methods
Introducing FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions
Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…
This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.
The Role of Massively Multi-Task and Weak Supervision in Software 2.0
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
Snuba: Automating Weak Supervision to Label Training Data
Presenting Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large, unlabeled dataset in the weak supervision setting.
Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code
Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.
Interactive Programmatic Labeling for Weak Supervision
Demonstrating in synthetic and real-world experiments how two simple labeling function acquisition strategies outperform a random baseline.
Bootstrapping Conversational Agents with Weak Supervision
This paper presents a framework called search, label, and propagate (SLP) for bootstrapping intents from existing chat logs using weak supervision.
Software 2.0 and Snorkel: Beyond Hand-Labeled Data
This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.
Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data
Introducing Socratic learning, a paradigm that uses feedback from a discriminative model to automatically identify latent data subsets in training data.