Image
author

Fred Sala

Chief Scientist
,
Snorkel AI
Assistant Professor @ University of Wisconsin-Madison

Frederic Sala is Chief Scientist at Snorkel AI and an assistant professor in the Computer Sciences Department at the University of Wisconsin-Madison. His research studies the fundamentals of data-driven systems and machine learning, with a focus on data-centric AI, foundation models, and automated machine learning. He and his group received the 2024 DARPA Young Faculty Award, a best student paper runner-up award at UAI ’22, the outstanding Ph.D. dissertation award from the UCLA Department of Electrical Engineering, the NSF Graduate Research Fellowship.

The latest from Fred

Weak-to-strong generalization through the data-centric lens
The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns...
Research Paper
Weak-to-strong generalization through the data-centric lens

The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively,…

Mar 01, 2025
Changho Shin, John Cooper, Frederic Sala Department of Computer Science University of Wisconsin-Madison
Learn more about Weak-to-strong generalization through the data-centric lens
Theoretical Physics Benchmark (TPBench)—a dataset and study of AI reasoning capabilities in theoretical physics
We introduce a benchmark to evaluate the capability of AI to solve problems in theoretical physics, focusing on high-energy theory and cosmology. The first iteration of our benchmark consists of 57 problems of varying difficulty, from undergraduate to research level. These problems are novel in the sense that they do not come from public problem collections. We evaluate our data set on various open and closed language models, including o3-mini, o1, DeepSeek-R1, GPT-4o and versions of Llama and Qwen. While we find impressive progress in model performance with the most recent models, our research-level difficulty problems are mostly unsolved. We...
Research Paper
Theoretical Physics Benchmark (TPBench)—a dataset and study of AI reasoning capabilities in theoretical physics

We introduce a benchmark to evaluate the capability of AI to solve problems in theoretical physics, focusing on high-energy theory and cosmology. The first iteration of our benchmark consists of 57 problems of varying difficulty, from undergraduate to research level. These problems are novel in the sense that they do not come from public problem collections. We evaluate our data…

Feb 01, 2025
Daniel J.H. Chung, Zhiqi Gao, Yurii Kvasiuk, Tianyi Li, Moritz Munchmeyer, Maja Rudolph, Frederic Sala, and Sai Chaitanya Tadepalli
Learn more about Theoretical Physics Benchmark (TPBench)—a dataset and study of AI reasoning capabilities in theoretical physics
Zero-Shot Robustification of Zero-Shot Models with Foundation Models
Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose ROBOSHOT, a method that improves the robustness of pretrained model embeddings in a fully zero-shot fashion. First, we use zero-shot language models (LMs) to obtain useful insights from task descriptions. These insights are embedded and used to remove harmful and boost useful...
Research Paper
Zero-Shot Robustification of Zero-Shot Models with Foundation Models

Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose ROBOSHOT, a…

Sep 18, 2024
D. Adila, et al.
Learn more about Zero-Shot Robustification of Zero-Shot Models with Foundation Models
The ALCHEmist: automated labeling 500x cheaper than LLM data annotators
Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to...
Research Paper
The ALCHEmist: automated labeling 500x cheaper than LLM data annotators

Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather…

Sep 18, 2024
TH. Huang, et al.
Learn more about The ALCHEmist: automated labeling 500x cheaper than LLM data annotators
Product Manifold Representations for Learning on Biological Pathways
Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is important for researchers looking to understand the underpinnings of disease and train high-quality predictive models on these networks. In this work, we investigate the effects of embedding pathway graphs in nonEuclidean mixed-curvature spaces and compare against traditional Euclidean graph representation...
Research Paper
Product Manifold Representations for Learning on Biological Pathways

Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is…

Sep 18, 2024
D. McNeela, et al.
Learn more about Product Manifold Representations for Learning on Biological Pathways
Pretrained Hybrids with MAD Skills
While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed hybrid architectures seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose Manticore, 1 a framework that addresses these challenges. Manticore automates the design of hybrid architectures while reusing pretrained models to create pretrained hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by...
Research Paper
Pretrained Hybrids with MAD Skills

While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed hybrid architectures seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must…

Sep 18, 2024
N. Roberts, et al.
Learn more about Pretrained Hybrids with MAD Skills
Pearls from Pebbles: Improved Confidence Functions for Auto-labeling
Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model’s confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the optimal TBAL confidence function. We develop a tractable version...
Research Paper
Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model’s confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is…

Sep 18, 2024
H. Vishwakarma, et al.
Learn more about Pearls from Pebbles: Improved Confidence Functions for Auto-labeling
OTTER: Improving Zero-Shot Classification via Optimal Transport
Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the...
Research Paper
OTTER: Improving Zero-Shot Classification via Optimal Transport

Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge of the true label balance in…

Sep 18, 2024
C. Shin, et al.
Learn more about OTTER: Improving Zero-Shot Classification via Optimal Transport
Multimodal Data Curation via Object Detection and Filter Ensembles
We propose an approach for curating multimodal data that we used for our entry in the 2023 DataComp competition filtering track. Our technique combines object detection and weak supervision-based ensembling. In the first of two steps in our approach, we employ an out-of-the-box zero-shot object detection model to extract granular information and produce a variety of filter designs. In the second step, we employ weak supervision to ensemble filtering rules. This approach results in a 4% performance improvement when compared to the best-performing baseline, producing the top-ranking position in the small scale track at the time of writing. Furthermore, in...
Research Paper
Multimodal Data Curation via Object Detection and Filter Ensembles

We propose an approach for curating multimodal data that we used for our entry in the 2023 DataComp competition filtering track. Our technique combines object detection and weak supervision-based ensembling. In the first of two steps in our approach, we employ an out-of-the-box zero-shot object detection model to extract granular information and produce a variety of filter designs. In the…

Sep 18, 2024
TH. Huang, et al.
Learn more about Multimodal Data Curation via Object Detection and Filter Ensembles
1 2 6 7

For models that need to be right. Not just good enough.