

Stephen Bach is the Eliot Horowitz Assistant Professor in the Computer Science Department at Brown University. Previously, he was a visiting scholar at Google, and a postdoctoral scholar in the computer science department at Stanford University advised by Christopher Ré.
He received his Ph.D. in computer science from the University of Maryland, where he was advised by Lise Getoor. His research focuses on weakly supervised, zero-shot, and few-shot machine learning. The goal of his work is to create methods and systems that drive down the labor cost of AI. He was a core contributor to the Snorkel framework, which was recognized with a Best of VLDB 2018 award. He also co-led the team that developed the T0 family of large language models. The team was also one of the proposers of instruction tuning, which is the process of fine-tuning language models with supervised training to follow instructions. Instruction tuning is now a standard part of training large language models. Stephen is also an advisor to Snorkel AI.
The latest from Stephen


We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary…


As post hoc explanation methods are increasingly being leveragedto explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across all subgroups of a population. For instance, it should not be the case that explanations associated with instances belonging to, e.g., women, are less accurate than those associated with…


AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4’s safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe…


Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying red cube by reasoning over the constituents red and cube. In this work, we focus on the ability of a…


We conducted research to reduce the amount of labeled data required to train machine learning systems. The pinnacle of this effort is the development of TAGLETS, a machine learning system that seamlessly integrates widely known collections of labeled data with a diverse array of machine learning algorithms, known as weak labelers. The system’s evolution has been significantly influenced by comprehensive…


The paper explores the use of pseudolabels, which are heuristic labels for unlabeled data, to enhance the performance of vision-language models like CLIP via prompt tuning. The authors investigate different learning paradigms and prompt modalities and find that iterative prompt-training strategies leveraging CLIP-based pseudolabels lead to significant improvements in CLIP’s image classification performance.


The paper introduces Alfred, a system for programmatic weak supervision (PWS) that creates training data for machine learning by prompting. It enables users to encode their subject matter expertise via natural language prompts for language and vision-language models.


We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) like CLIP. We develop CSP for compositional zero-shot learning, the task of predicting unseen attribute-object compositions (e.g., old cat and young tiger). VLMs have a flexible text encoder that can represent arbitrary classes as natural language prompts but they…


Zero-shot learning with Common Sense Knowledge Graphs is a general-purpose framework with a novel transformer graph convolutional network for generating class representations from common sense knowledge graphs, which improves over existing WordNet-based methods on zero-shot learning tasks.



