

Frederic Sala is Chief Scientist at Snorkel AI and an assistant professor in the Computer Sciences Department at the University of Wisconsin-Madison. His research studies the fundamentals of data-driven systems and machine learning, with a focus on data-centric AI, foundation models, and automated machine learning. He and his group received the 2024 DARPA Young Faculty Award, a best student paper runner-up award at UAI ’22, the outstanding Ph.D. dissertation award from the UCLA Department of Electrical Engineering, the NSF Graduate Research Fellowship.
The latest from Fred


AutoWS-Bench-101 is a framework for evaluating automated weak supervision techniques compared to other baseline methods such as zero-shot foundation models and supervised learning, in order to help practitioners choose the best method to generate additional labels.


This paper finds that weak supervision can be used beyond classification applications, including rankings, graphs, and manifolds, and can provide generalization guarantees nearly identical to models trained on clean data.


This work proposes and theoretically justifies a model that fuses weak supervision and generative adversarial networks to improve the estimate of unobserved labels and data augmentation, outperforming baseline weak supervision models on multiclass image classification datasets.


Liger, a combination of foundation models and weak supervision frameworks, improves existing weak supervision techniques by partitioning the embedding space and extending source votes in embedding space, resulting in improved performance on six benchmark NLP and video tasks.


Constructing labeling functions (LFs) is at the heart of using weak supervision. We often think of these labeling functions as programmatic expressions of domain expertise or heuristics. Indeed, much of the advantage of weak supervision is that we can save time—writing labeling functions and applying them to data at scale is much more efficient compared to hand-labeling huge numbers of…
This paper proposes a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees.
Complex biological, neuroscience, geoscience and social networks exhibit heterogeneous self-similar higher order topological structures that are usually characterized as being multifractal in nature. However, describing their topological complexity through a compact mathematical description and deciphering their topological governing rules has remained elusive and prevented a comprehensive understanding of networks. To overcome this challenge, we propose a weighted multifractal graph model…
Labeling data for modern machine learning is expensive and time-consuming. Latent variable models can be used to infer labels from weaker, easier-to-acquire sources operating on unlabeled data. Such models can also be trained using labeled data, presenting a key question: should a user invest in few labeled or many unlabeled points? We answer this via a framework centered on model…


Constructing large, labeled training datasets for segmentation models is an expensive and labor-intensive process. This is a common challenge in machine learning, addressed by methods that require few or no labeled data points such as few-shot learning (FSL) and weakly-supervised learning (WS). Such techniques, however, have limitations when applied to image segmentation—FSL methods often produce noisy results and are strongly…



