stephen bach (steve bach)
author

Stephen Bach

Applied Research Scientist
,
Brown University
Eliot Horowitz Assistant Professor, Computer Science Department

Stephen Bach is the Eliot Horowitz Assistant Professor in the Computer Science Department at Brown University. Previously, he was a visiting scholar at Google, and a postdoctoral scholar in the computer science department at Stanford University advised by Christopher Ré.

He received his Ph.D. in computer science from the University of Maryland, where he was advised by Lise Getoor. His research focuses on weakly supervised, zero-shot, and few-shot machine learning. The goal of his work is to create methods and systems that drive down the labor cost of AI. He was a core contributor to the Snorkel framework, which was recognized with a Best of VLDB 2018 award. He also co-led the team that developed the T0 family of large language models. The team was also one of the proposers of instruction tuning, which is the process of fine-tuning language models with supervised training to follow instructions. Instruction tuning is now a standard part of training large language models. Stephen is also an advisor to Snorkel AI.

The latest from Stephen

Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Detoxifying multilingual Large Language Models (LLMs) has become crucial due to their increasing global use. In this work, we explore zero-shot cross-lingual generalization of preference tuning in detoxifying LLMs. In contrast to prior work that suggests limited crosslingual generalization for other safety tasks, we show that Direct Preference Optimization (DPO) training with only English data can significantly reduce toxicity in multilingual openended generations. For instance, the probability of mGPT-1.3B in generating toxic continuations drops from 46.8% to 3.9% across 17 different languages after training. Our results also generalize to other multilingual LLMs, such as BLOOM, Llama3, and Aya-23. Using mechanistic...
Research Paper
Preference Tuning For Toxicity Mitigation Generalizes Across Languages

Detoxifying multilingual Large Language Models (LLMs) has become crucial due to their increasing global use. In this work, we explore zero-shot cross-lingual generalization of preference tuning in detoxifying LLMs. In contrast to prior work that suggests limited crosslingual generalization for other safety tasks, we show that Direct Preference Optimization (DPO) training with only English data can significantly reduce toxicity in…

Sep 18, 2024
X. Li, et al.
Learn more about Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First, generated PDDL code is typically evaluated using planning validators that check whether the problem can be solved with a planner. This method is insufficient because a language model might generate valid PDDL code that does not align with the natural language description of the task....
Research Paper
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First, generated PDDL code is typically…

Sep 18, 2024
M. Zuo, et al.
Learn more about Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
Data scarcity in low-resource languages can be addressed with word-to-word translations from labeled task data in high-resource languages using bilingual lexicons. However, bilingual lexicons often have limited lexical overlap with task data, which results in poor translation coverage and lexicon utilization. We propose lexicon-conditioned data generation (LexC-Gen), a method that generates lowresource-language classification task data at scale. Specifically, LexC-Gen first uses highresource-language words from bilingual lexicons to generate lexicon-compatible task data, and then it translates them into low-resource languages with bilingual lexicons via word translation. Across 17 extremely low-resource languages, LexC-Gen generated data is competitive with expert-translated gold data, and...
Research Paper
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons

Data scarcity in low-resource languages can be addressed with word-to-word translations from labeled task data in high-resource languages using bilingual lexicons. However, bilingual lexicons often have limited lexical overlap with task data, which results in poor translation coverage and lexicon utilization. We propose lexicon-conditioned data generation (LexC-Gen), a method that generates lowresource-language classification task data at scale. Specifically, LexC-Gen first…

Sep 18, 2024
ZX. Yong, et al.
Learn more about LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
We introduce Bonito, an open-source model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We aim to enable zeroshot task adaptation of large language models on users’ specialized, private data. We train Bonito by fine-tuning a pretrained large language model on a new large-scale dataset with 1.65M examples created by remixing existing instruction tuning datasets into metatemplates. The meta-templates for a dataset produce training examples where the input is the unannotated text and the task attribute and the output consists of the instruction and the response. We use Bonito to generate synthetic tasks...
Research Paper
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

We introduce Bonito, an open-source model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We aim to enable zeroshot task adaptation of large language models on users’ specialized, private data. We train Bonito by fine-tuning a pretrained large language model on a new large-scale dataset with 1.65M examples created by remixing existing instruction…

Sep 18, 2024
N. Nayak et al.
Learn more about Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
"Recent works often assume that VisionLanguage Model (VLM) representations are based on visual attributes like shape. However, it is unclear to what extent VLMs prioritize this information to represent concepts. We propose Extract and Explore (EX2), a novel approach to characterize important textual features for VLMs. EX2 uses reinforcement learning to align a large language model with VLM preferences and generates descriptions that incorporate the important features for the VLM. Then, we inspect the descriptions to identify the features that contribute to VLM representations. We find that spurious descriptions have a major role in VLM representations despite providing no helpful...
Research Paper
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions

“Recent works often assume that VisionLanguage Model (VLM) representations are based on visual attributes like shape. However, it is unclear to what extent VLMs prioritize this information to represent concepts. We propose Extract and Explore (EX2), a novel approach to characterize important textual features for VLMs. EX2 uses reinforcement learning to align a large language model with VLM preferences and…

Sep 18, 2024
R. Esfandiarpoor, et al.
Learn more about If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
Language Models in the Loop: Incorporating Prompting into Weak Supervision
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data....
Research Paper
Language Models in the Loop: Incorporating Prompting into Weak Supervision

We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct…

Aug 22, 2024
R. Smith et al.
Learn more about Language Models in the Loop: Incorporating Prompting into Weak Supervision
Large language model training: how three training phases shape LLMs
Blog
Large language model training: how three training phases shape LLMs

Training large language models is a multi-layered stack of processes, each with its unique role and contribution to the model’s performance.

Feb 27, 2024
Learn more about Large language model training: how three training phases shape LLMs
Learning to Generate Instructions to Adapt Language Models to New Tasks
We present Bonito, the first open-source model for conditional task generation: the problem of converting unannotated corpus into a collection of tasks for instruction tuning. Our goal is to enable efficient task adaptation of instruction tuned language models on users' specialized, private data without relying on proprietary API-access-only models like GPT-4. We create Bonito by remixing existing, general-purpose instruction tuning data into a new training mixture for conditional task generation. Bonito learns to generate new tasks conditioned on the text and desired task type. The generated instructions in the specialized domain can be used to further train language models. We...
Research Paper
Learning to Generate Instructions to Adapt Language Models to New Tasks

We present Bonito, the first open-source model for conditional task generation: the problem of converting unannotated corpus into a collection of tasks for instruction tuning. Our goal is to enable efficient task adaptation of instruction tuned language models on users’ specialized, private data without relying on proprietary API-access-only models like GPT-4. We create Bonito by remixing existing, general-purpose instruction tuning…

Nov 26, 2023
N. Nayak et al.
Learn more about Learning to Generate Instructions to Adapt Language Models to New Tasks
Follow-Up Differential Descriptions: Langauge Models Resolve Ambiguities for Image Classification
A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish between them. For instance, they may use color instead of bill shape to distinguish between sparrows and wrens, which are both brown. We propose Follow-up Differential Descriptions (FuDD), a zero-shot approach that tailors the class descriptions to each dataset and...
Research Paper
Follow-Up Differential Descriptions: Langauge Models Resolve Ambiguities for Image Classification

A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish…

Nov 10, 2023
R. Esfandiarpoor, et al.
Learn more about Follow-Up Differential Descriptions: Langauge Models Resolve Ambiguities for Image Classification
1 2 3 4

For models that need to be right. Not just good enough.