Image
author

Fred Sala

Chief Scientist
,
Snorkel AI
Assistant Professor @ University of Wisconsin-Madison

Frederic Sala is Chief Scientist at Snorkel AI and an assistant professor in the Computer Sciences Department at the University of Wisconsin-Madison. His research studies the fundamentals of data-driven systems and machine learning, with a focus on data-centric AI, foundation models, and automated machine learning. He and his group received the 2024 DARPA Young Faculty Award, a best student paper runner-up award at UAI ’22, the outstanding Ph.D. dissertation award from the UCLA Department of Electrical Engineering, the NSF Graduate Research Fellowship.

The latest from Fred

AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
AutoWS-Bench-101 is a framework for evaluating automated weak supervision techniques compared to other baseline methods such as zero-shot foundation models and supervised learning, in order to help practitioners choose the best method to generate additional labels.
Research Paper
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

AutoWS-Bench-101 is a framework for evaluating automated weak supervision techniques compared to other baseline methods such as zero-shot foundation models and supervised learning, in order to help practitioners choose the best method to generate additional labels.

Mar 15, 2023
Snorkel Team
Learn more about AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
Lifting Weak Supervision To Structured Prediction
This paper finds that weak supervision can be used beyond classification applications, including rankings, graphs, and manifolds, and can provide generalization guarantees nearly identical to models trained on clean data.
Research Paper
Lifting Weak Supervision To Structured Prediction

This paper finds that weak supervision can be used beyond classification applications, including rankings, graphs, and manifolds, and can provide generalization guarantees nearly identical to models trained on clean data.

Mar 15, 2023
Vishwakarma, et al
Learn more about Lifting Weak Supervision To Structured Prediction
Generative Modeling Helps Weak Supervision (and Vice Versa)
This work proposes and theoretically justifies a model that fuses weak supervision and generative adversarial networks to improve the estimate of unobserved labels and data augmentation, outperforming baseline weak supervision models on multiclass image classification datasets.
Research Paper
Generative Modeling Helps Weak Supervision (and Vice Versa)

This work proposes and theoretically justifies a model that fuses weak supervision and generative adversarial networks to improve the estimate of unobserved labels and data augmentation, outperforming baseline weak supervision models on multiclass image classification datasets.

Mar 15, 2023
B. Boecking, et al
Learn more about Generative Modeling Helps Weak Supervision (and Vice Versa)
Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision
Liger, a combination of foundation models and weak supervision frameworks, improves existing weak supervision techniques by partitioning the embedding space and extending source votes in embedding space, resulting in improved performance on six benchmark NLP and video tasks.
Research Paper
Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

Liger, a combination of foundation models and weak supervision frameworks, improves existing weak supervision techniques by partitioning the embedding space and extending source votes in embedding space, resulting in improved performance on six benchmark NLP and video tasks.

Mar 15, 2023
M. Chen, et al
Learn more about Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision
Auto LF generation: Lots of little models, big benefits
Blog
Auto LF generation: Lots of little models, big benefits

Constructing labeling functions (LFs) is at the heart of using weak supervision. We often think of these labeling functions as programmatic expressions of domain expertise or heuristics. Indeed, much of the advantage of weak supervision is that we can save time—writing labeling functions and applying them to data at scale is much more efficient compared to hand-labeling huge numbers of…

May 31, 2022
Learn more about Auto LF generation: Lots of little models, big benefits
Universalizing Weak Supervision
This paper proposes a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees.
Research Paper
Universalizing Weak Supervision

This paper proposes a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees.

Apr 04, 2022
C. Shin, et al
Learn more about Universalizing Weak Supervision
Hidden network generating rules from partially observed complex networks
Complex biological, neuroscience, geoscience and social networks exhibit heterogeneous self-similar higher order topological structures that are usually characterized as being multifractal in nature. However, describing their topological complexity through a compact mathematical description and deciphering their topological governing rules has remained elusive and prevented a comprehensive understanding of networks. To overcome this challenge, we propose a weighted multifractal graph model capable of capturing the underlying generating rules of complex systems and characterizing their node heterogeneity and pairwise interactions. To infer the generating measure with hidden information, we introduce a variational expectation maximization framework. We demonstrate the robustness of the network...
Research Paper
Hidden network generating rules from partially observed complex networks

Complex biological, neuroscience, geoscience and social networks exhibit heterogeneous self-similar higher order topological structures that are usually characterized as being multifractal in nature. However, describing their topological complexity through a compact mathematical description and deciphering their topological governing rules has remained elusive and prevented a comprehensive understanding of networks. To overcome this challenge, we propose a weighted multifractal graph model…

Sep 01, 2021
R. Yang, et al.
Learn more about Hidden network generating rules from partially observed complex networks
Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation
Labeling data for modern machine learning is expensive and time-consuming. Latent variable models can be used to infer labels from weaker, easier-to-acquire sources operating on unlabeled data. Such models can also be trained using labeled data, presenting a key question: should a user invest in few labeled or many unlabeled points? We answer this via a framework centered on model misspecification in method-of-moments latent variable estimation. Our core result is a bias-variance decomposition of the generalization error, which shows that the unlabeled-only approach incurs additional bias under misspecification. We then introduce a correction that provably removes this bias in certain...
Research Paper
Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation

Labeling data for modern machine learning is expensive and time-consuming. Latent variable models can be used to infer labels from weaker, easier-to-acquire sources operating on unlabeled data. Such models can also be trained using labeled data, presenting a key question: should a user invest in few labeled or many unlabeled points? We answer this via a framework centered on model…

Mar 18, 2021
M. Chen, et al.
Learn more about Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation
Cut out the annotator, keep the cutout: better segmentation with weak supervision
Constructing large, labeled training datasets for segmentation models is an expensive and labor-intensive process. This is a common challenge in machine learning, addressed by methods that require few or no labeled data points such as few-shot learning (FSL) and weakly-supervised learning (WS). Such techniques, however, have limitations when applied to image segmentation—FSL methods often produce noisy results and are strongly dependent on which few datapoints are labeled, while WS models struggle to fully exploit rich image information. We propose a framework that fuses FSL and WS for segmentation tasks, enabling users to train high-performing segmentation networks with very few hand-labeled...
Research Paper
Cut out the annotator, keep the cutout: better segmentation with weak supervision

Constructing large, labeled training datasets for segmentation models is an expensive and labor-intensive process. This is a common challenge in machine learning, addressed by methods that require few or no labeled data points such as few-shot learning (FSL) and weakly-supervised learning (WS). Such techniques, however, have limitations when applied to image segmentation—FSL methods often produce noisy results and are strongly…

Jan 12, 2021
S. Hooper, et al.
Learn more about Cut out the annotator, keep the cutout: better segmentation with weak supervision
1 2 6 7

For models that need to be right. Not just good enough.