Chris Ré

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address this bottleneck of training data, we explore the applicability of weak supervision, or relying on higher level, noisier forms of supervision to label training data. Specifically, we use data programming, a paradigm that can learn the accuracy and dependency structure...

Research Paper

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address…

Dec 19, 2019 •

Z. Wheng, et al, 2019

Learn more about Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

Training Complex Models with Multi-Task Weak Supervision

Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting

Research Paper

Training Complex Models with Multi-Task Weak Supervision

Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting

Dec 18, 2019 •

A. Ratner, et al, 2019

Learn more about Training Complex Models with Multi-Task Weak Supervision

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.

Research Paper

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.

Dec 17, 2019 •

A. Ratner, et al, 2019

Learn more about The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Snuba: Automating Weak Supervision to Label Training Data

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text patterns. To address these challenges, we present Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large,...

Research Paper

Snuba: Automating Weak Supervision to Label Training Data

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain…

Dec 16, 2019 •

P. Varma and C. Ré, 2019

Learn more about Snuba: Automating Weak Supervision to Label Training Data

Slice-Based Learning: A Programming Model for Residual Learning

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets---we define these as slices, the key abstraction in our approach. To address slice-level performance, practitioners often train separate "expert" models on slice subsets or use multi-task hard parameter sharing. We propose Slice-based Learning, a new programming model in which the...

Research Paper

Slice-Based Learning: A Programming Model for Residual Learning

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and “question” sentences might be important to a dialogue agent’s language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets—we…

Dec 14, 2019 •

V. Chen, et al, 2019

Learn more about Slice-Based Learning: A Programming Model for Residual Learning

Scene Graph Prediction With Limited Labels

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text patterns. To address these challenges, we present Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large,...

Research Paper

Scene Graph Prediction With Limited Labels

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain…

Dec 13, 2019 •

V. Chen, et al, 2019

Learn more about Scene Graph Prediction With Limited Labels

Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code

Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.

Research Paper

Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code

Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.

Dec 12, 2019 •

E. Bringer, et al, 2019

Learn more about Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code

Multi-Resolution Weak Supervision for Sequential Data

Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision is estimating the unknown accuracies and correlations of these sources without using labeled data. Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence. We propose Dugong, the first framework...

Research Paper

Multi-Resolution Weak Supervision for Sequential Data

Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision…

Dec 11, 2019 •

P. Varma, et al, 2019

Learn more about Multi-Resolution Weak Supervision for Sequential Data

Medical Device Surveillance With Electronic Health Records

Showcasing state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data.

Research Paper

Medical Device Surveillance With Electronic Health Records

Showcasing state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data.

Dec 10, 2019 •

A. Callahan, et al, 2019

Learn more about Medical Device Surveillance With Electronic Health Records