Image
author

Chris Ré

Co-Founder
,
Snorkel AI
Professor @ Stanford University

Christopher (Chris) Ré is a professor in the department of computer science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with widely used products from technology and enterprise companies including Google Ads, Gmail, YouTube, and Apple.

He has co-founded four companies based on his research into machine learning systems, SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017, and Inductiv (HoloClean) in 2020.

His research contributions have spanned database theory, database systems, and machine learning. His work has won the best paper or test-of-time awards at the premier venues in each area. He still can’t believe he won the MacArthur Foundation Fellowship.

The latest from Chris

Intelligence per watt: Measuring intelligence efficiency of local AI
Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to scale infrastructure at pace. Two advances enable us to rethink this paradigm: small LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M4 Max) run these models at interactive latencies. This raises the question: can local inference viably redistribute demand from centralized infrastructure? Answering this requires measuring whether local LMs can accurately answer real-world queries and whether they can do so efficiently enough to...
Research Paper
Intelligence per watt: Measuring intelligence efficiency of local AI

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to scale infrastructure at pace. Two advances enable us to rethink this paradigm: small LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M4 Max)…

Nov 11, 2025
Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré
Learn more about Intelligence per watt: Measuring intelligence efficiency of local AI
WONDERBREAD: a benchmark for evaluating multimodal foundation models on business process management tasks
Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task– full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today– simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on...
Research Paper
WONDERBREAD: a benchmark for evaluating multimodal foundation models on business process management tasks

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task– full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the…

Oct 01, 2024
Michael Wornow, Avanika Narayan, Ben Viggiano, Ishan S. Khare, Tathagat Verma, Tibor Thompson, Miguel Angel Fuentes Hernandez, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan Agrawal, Althea Hudson, Nigam H. Shah, Christopher Ré, Stanford University
Learn more about WONDERBREAD: a benchmark for evaluating multimodal foundation models on business process management tasks
Skill-It! A data-driven skills framework for understanding and training language models
The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of...
Research Paper
Skill-It! A data-driven skills framework for understanding and training language models

The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow…

Oct 20, 2023
MF Chen, et al.
Learn more about Skill-It! A data-driven skills framework for understanding and training language models
Efficiently Modeling Long Sequences with Structured State Spaces
This paper introduces the Structured State Space sequence model (s4), which uses a new parameterization for the state-space model to improve long-range dependency handling both mathematically and empirically.
Research Paper
Efficiently Modeling Long Sequences with Structured State Spaces

This paper introduces the Structured State Space sequence model (s4), which uses a new parameterization for the state-space model to improve long-range dependency handling both mathematically and empirically.

Mar 29, 2022
A. Gu, et al
Learn more about Efficiently Modeling Long Sequences with Structured State Spaces
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
This paper proposes cross-modal data programming (XMDP) for machine learning (ML) in medicine.
Research Paper
Cross-Modal Data Programming Enables Rapid Medical Machine Learning

This paper proposes cross-modal data programming (XMDP) for machine learning (ML) in medicine.

Nov 14, 2020
J. Dunnmon, et al, 2020
Learn more about Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…
This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.
Research Paper
Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…

This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.

Nov 13, 2020
M. Chen, et al, 2020
Learn more about Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…
Low-Dimensional Hyperbolic Knowledge Graph Embeddings
Knowledge graph (KG) embeddings learn lowdimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data, hyperbolic embedding methods have shown promise for high-fidelity and parsimonious representations. However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs. In this work, we introduce a class of hyperbolic KG embedding models that simultaneously capture hierarchical and logical patterns. Our approach combines hyperbolic reflections and rotations with attention to model complex relational patterns. Experimental results on standard KG benchmarks show that...
Research Paper
Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Knowledge graph (KG) embeddings learn lowdimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data, hyperbolic embedding methods have shown promise for high-fidelity and parsimonious representations. However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs. In…

Jul 05, 2020
I. Chami, et al.
Learn more about Low-Dimensional Hyperbolic Knowledge Graph Embeddings
Ivy: Instrumental Variable Synthesis for Causal Inference
A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates—which are not necessarily strong, or even valid, IVs—into a single "summary" that is plugged into causal effect estimators in place of an IV. In genetic epidemiology, such approaches are known as allele scores. Allele scores require...
Research Paper
Ivy: Instrumental Variable Synthesis for Causal Inference

A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more…

Jun 02, 2020
Z. Kuang, et al.
Learn more about Ivy: Instrumental Variable Synthesis for Causal Inference
Extracting chemical reactions from text using Snorkel
Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types. We developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature...
Research Paper
Extracting chemical reactions from text using Snorkel

Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often…

May 27, 2020
E. Mallory, et al.
Learn more about Extracting chemical reactions from text using Snorkel
1 2 3 4

For models that need to be right. Not just good enough.