Image
author

Chris Ré

Co-Founder
,
Snorkel AI
Professor @ Stanford University

Christopher (Chris) Ré is a professor in the department of computer science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with widely used products from technology and enterprise companies including Google Ads, Gmail, YouTube, and Apple.

He has co-founded four companies based on his research into machine learning systems, SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017, and Inductiv (HoloClean) in 2020.

His research contributions have spanned database theory, database systems, and machine learning. His work has won the best paper or test-of-time awards at the premier venues in each area. He still can’t believe he won the MacArthur Foundation Fellowship.

The latest from Chris

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data
While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address this bottleneck of training data, we explore the applicability of weak supervision, or relying on higher level, noisier forms of supervision to label training data. Specifically, we use data programming, a paradigm that can learn the accuracy and dependency structure...
Research Paper
Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address…

Dec 19, 2019
Z. Wheng, et al, 2019
Learn more about Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data
Training Complex Models with Multi-Task Weak Supervision
Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting
Research Paper
Training Complex Models with Multi-Task Weak Supervision

Proposing a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting

Dec 18, 2019
A. Ratner, et al, 2019
Learn more about Training Complex Models with Multi-Task Weak Supervision
The Role of Massively Multi-Task and Weak Supervision in Software 2.0
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
Research Paper
The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.

Dec 17, 2019
A. Ratner, et al, 2019
Learn more about The Role of Massively Multi-Task and Weak Supervision in Software 2.0
Snuba: Automating Weak Supervision to Label Training Data
As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text patterns. To address these challenges, we present Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large,...
Research Paper
Snuba: Automating Weak Supervision to Label Training Data

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain…

Dec 16, 2019
P. Varma and C. Ré, 2019
Learn more about Snuba: Automating Weak Supervision to Label Training Data
Slice-Based Learning: A Programming Model for Residual Learning
In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets---we define these as slices, the key abstraction in our approach. To address slice-level performance, practitioners often train separate "expert" models on slice subsets or use multi-task hard parameter sharing. We propose Slice-based Learning, a new programming model in which the...
Research Paper
Slice-Based Learning: A Programming Model for Residual Learning

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and “question” sentences might be important to a dialogue agent’s language understanding for product purposes. While machine learning models can achieve quality performance on coarse-grained metrics like F1-score and overall accuracy, they may underperform on these critical subsets—we…

Dec 14, 2019
V. Chen, et al, 2019
Learn more about Slice-Based Learning: A Programming Model for Residual Learning
Scene Graph Prediction With Limited Labels
As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text patterns. To address these challenges, we present Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large,...
Research Paper
Scene Graph Prediction With Limited Labels

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain…

Dec 13, 2019
V. Chen, et al, 2019
Learn more about Scene Graph Prediction With Limited Labels
Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code
Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.
Research Paper
Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code

Proposing Osprey, a weak-supervision system suited for highly imbalanced data, built on top of the Snorkel framework.

Dec 12, 2019
E. Bringer, et al, 2019
Learn more about Osprey: Weak Supervision of Imbalanced Extraction Problems Without Code
Multi-Resolution Weak Supervision for Sequential Data
Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision is estimating the unknown accuracies and correlations of these sources without using labeled data. Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence. We propose Dugong, the first framework...
Research Paper
Multi-Resolution Weak Supervision for Sequential Data

Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision…

Dec 11, 2019
P. Varma, et al, 2019
Learn more about Multi-Resolution Weak Supervision for Sequential Data
Medical Device Surveillance With Electronic Health Records
Showcasing state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data.
Research Paper
Medical Device Surveillance With Electronic Health Records

Showcasing state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data.

Dec 10, 2019
A. Callahan, et al, 2019
Learn more about Medical Device Surveillance With Electronic Health Records
1 2 3 4

For models that need to be right. Not just good enough.