Chris Ré

Intelligence per watt: Measuring intelligence efficiency of local AI

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to scale infrastructure at pace. Two advances enable us to rethink this paradigm: small LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M4 Max) run these models at interactive latencies. This raises the question: can local inference viably redistribute demand from centralized infrastructure? Answering this requires measuring whether local LMs can accurately answer real-world queries and whether they can do so efficiently enough to...

Research Paper

Intelligence per watt: Measuring intelligence efficiency of local AI

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to scale infrastructure at pace. Two advances enable us to rethink this paradigm: small LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M4 Max)…

Nov 11, 2025 •

Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré

Learn more about Intelligence per watt: Measuring intelligence efficiency of local AI

WONDERBREAD: a benchmark for evaluating multimodal foundation models on business process management tasks

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task– full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today– simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on...

Research Paper

WONDERBREAD: a benchmark for evaluating multimodal foundation models on business process management tasks

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task– full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the…

Oct 01, 2024 •

Michael Wornow, Avanika Narayan, Ben Viggiano, Ishan S. Khare, Tathagat Verma, Tibor Thompson, Miguel Angel Fuentes Hernandez, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan Agrawal, Althea Hudson, Nigam H. Shah, Christopher Ré, Stanford University

Learn more about WONDERBREAD: a benchmark for evaluating multimodal foundation models on business process management tasks

Skill-It! A data-driven skills framework for understanding and training language models

The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of...

Research Paper

Skill-It! A data-driven skills framework for understanding and training language models

The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow…

Oct 20, 2023 •

MF Chen, et al.

Learn more about Skill-It! A data-driven skills framework for understanding and training language models

Efficiently Modeling Long Sequences with Structured State Spaces

This paper introduces the Structured State Space sequence model (s4), which uses a new parameterization for the state-space model to improve long-range dependency handling both mathematically and empirically.

Research Paper

Efficiently Modeling Long Sequences with Structured State Spaces

This paper introduces the Structured State Space sequence model (s4), which uses a new parameterization for the state-space model to improve long-range dependency handling both mathematically and empirically.

Mar 29, 2022 •

A. Gu, et al

Learn more about Efficiently Modeling Long Sequences with Structured State Spaces

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

This paper proposes cross-modal data programming (XMDP) for machine learning (ML) in medicine.

Research Paper

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

This paper proposes cross-modal data programming (XMDP) for machine learning (ML) in medicine.

Nov 14, 2020 •

J. Dunnmon, et al, 2020

Learn more about Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…

This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.

Research Paper

Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…

This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.

Nov 13, 2020 •

M. Chen, et al, 2020

Learn more about Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…

Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Knowledge graph (KG) embeddings learn lowdimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data, hyperbolic embedding methods have shown promise for high-fidelity and parsimonious representations. However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs. In this work, we introduce a class of hyperbolic KG embedding models that simultaneously capture hierarchical and logical patterns. Our approach combines hyperbolic reflections and rotations with attention to model complex relational patterns. Experimental results on standard KG benchmarks show that...

Research Paper

Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Knowledge graph (KG) embeddings learn lowdimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data, hyperbolic embedding methods have shown promise for high-fidelity and parsimonious representations. However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs. In…

Jul 05, 2020 •

I. Chami, et al.

Learn more about Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Ivy: Instrumental Variable Synthesis for Causal Inference

A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates—which are not necessarily strong, or even valid, IVs—into a single "summary" that is plugged into causal effect estimators in place of an IV. In genetic epidemiology, such approaches are known as allele scores. Allele scores require...

Research Paper

Ivy: Instrumental Variable Synthesis for Causal Inference

A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more…

Jun 02, 2020 •

Z. Kuang, et al.

Learn more about Ivy: Instrumental Variable Synthesis for Causal Inference

Extracting chemical reactions from text using Snorkel

Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types. We developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature...

Research Paper

Extracting chemical reactions from text using Snorkel

Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often…

May 27, 2020 •

E. Mallory, et al.

Learn more about Extracting chemical reactions from text using Snorkel

Chris Ré

The latest from Chris

For models that need to be right. Not just good enough.

How do you want to work with Snorkel?