

Stephen Bach is the Eliot Horowitz Assistant Professor in the Computer Science Department at Brown University. Previously, he was a visiting scholar at Google, and a postdoctoral scholar in the computer science department at Stanford University advised by Christopher Ré.
He received his Ph.D. in computer science from the University of Maryland, where he was advised by Lise Getoor. His research focuses on weakly supervised, zero-shot, and few-shot machine learning. The goal of his work is to create methods and systems that drive down the labor cost of AI. He was a core contributor to the Snorkel framework, which was recognized with a Best of VLDB 2018 award. He also co-led the team that developed the T0 family of large language models. The team was also one of the proposers of instruction tuning, which is the process of fine-tuning language models with supervised training to follow instructions. Instruction tuning is now a standard part of training large language models. Stephen is also an advisor to Snorkel AI.
The latest from Stephen
This paper presents a rigorous approach for using a set of arbitrarily correlated weak supervision sources in order to solve a multiclass classification task when only a very small set of labeled data is available


In this paper, we propose a learning algorithm for training deep neural networks when there is not sufficient labeled data. To improve the generalization capabilities of the deep model, we adopt a learning scheme to train two related tasks simultaneously. One is the original task (target), and the other is an auxiliary task (source). In order to create a related…


We propose a framework for training sequence tagging models with weak supervision consisting of multiple heuristic rules of unknown accuracy. In addition to supporting rules that vote on tags in the output sequence, we introduce a new type of weak supervision, called linking rules, that vote on how sequence elements should be grouped into spans with the same tag. These…
This is first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting
Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.
Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.


This paper presents a flexible interface layer to write labeling functions based on experience.



