Stephen Bach

Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes

This paper demonstrates a mathematical analysis of zero-shot learning with attributes, providing a tight lower bound on the worst-case error of the best map from attributes to classes and showing that this bound is predictive of how standard zero-shot methods behave in practice.

Research Paper

Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes

This paper demonstrates a mathematical analysis of zero-shot learning with attributes, providing a tight lower bound on the worst-case error of the best map from attributes to classes and showing that this bound is predictive of how standard zero-shot methods behave in practice.

Mar 15, 2023 •

A. Mazzetto, et al.

Learn more about Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

PromptSource is a system that provides a templating language, an interface, and a set of guidelines to create, share, and use natural language prompts to train and query language models.

Research Paper

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

PromptSource is a system that provides a templating language, an interface, and a set of guidelines to create, share, and use natural language prompts to train and query language models.

Mar 09, 2023 •

S. Bach, et al

Learn more about PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Blog

Better not bigger: How to get GPT-3 quality at 0.1% the cost

We created Data-centric Foundation Model Development to bridge the gaps between foundation models and enterprise AI. New Snorkel Flow capabilities (Foundation Model Fine-tuning, Warm Start, and Prompt Builder) give data science and machine learning teams the tools they need to effectively put foundation models (FMs) to use for performance-critical enterprise use cases. The need is clear: despite undeniable excitement about…

Nov 17, 2022 •

Stephen Bach, Jason Fries, Braden Hancock

Learn more about Better not bigger: How to get GPT-3 quality at 0.1% the cost

TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data

This paper describes TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers

Research Paper

TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data

This paper describes TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers

Apr 28, 2022 •

W. Piriyakulkij, et al

Learn more about TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data

Multitask prompted training enables zero-shot task generalization

This paper showcases how using a data-centric approach to generate high-quality training data at massive scale to improve the zero-shot abilities of that model.

Research Paper

Multitask prompted training enables zero-shot task generalization

This paper showcases how using a data-centric approach to generate high-quality training data at massive scale to improve the zero-shot abilities of that model.

Apr 02, 2022 •

V. Sanh, et al

Learn more about Multitask prompted training enables zero-shot task generalization

Learning from Multiple Noisy Partial Labelers

This work enables users to create partial labelers that output subsets of possible class labels would greatly expand the expressivity of programmatic weak supervision.

Research Paper

Learning from Multiple Noisy Partial Labelers

This work enables users to create partial labelers that output subsets of possible class labels would greatly expand the expressivity of programmatic weak supervision.

Mar 28, 2022 •

P. Yu, et al

Learn more about Learning from Multiple Noisy Partial Labelers

Semi-Supervised Aggregation of Dependent Weak Supervision Sources with Performance Guarantees

This work shows a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate.

Research Paper

Semi-Supervised Aggregation of Dependent Weak Supervision Sources with Performance Guarantees

This work shows a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate.

Jul 18, 2021 •

A. Mazzetto, et al, 2021

Learn more about Semi-Supervised Aggregation of Dependent Weak Supervision Sources with Performance Guarantees

Extended Few-Shot Learning: Exploiting Existing Resources for Novel Tasks

In many practical few-shot learning problems, even though labeled examples are scarce, there are abundant auxiliary datasets that potentially contain useful information. We propose the problem of extended few-shot learning to study these scenarios. We then introduce a framework to address the challenges of efficiently selecting and effectively using auxiliary data in few-shot image classification. Given a large auxiliary dataset and a notion of semantic similarity among classes, we automatically select pseudo shots, which are labeled examples from other classes related to the target task. We show that naive approaches, such as (1) modeling these additional examples the same as...

Research Paper

Extended Few-Shot Learning: Exploiting Existing Resources for Novel Tasks

In many practical few-shot learning problems, even though labeled examples are scarce, there are abundant auxiliary datasets that potentially contain useful information. We propose the problem of extended few-shot learning to study these scenarios. We then introduce a framework to address the challenges of efficiently selecting and effectively using auxiliary data in few-shot image classification. Given a large auxiliary dataset…

Jul 03, 2021 •

R. Esfandiarpoor, et al.

Learn more about Extended Few-Shot Learning: Exploiting Existing Resources for Novel Tasks

What will it take to generate fairness-preserving explanations?

In situations where explanations of black-box models may be useful, the fairness of the blackbox is also often a relevant concern. However, the link between the fairness of the black-box model and the behavior of explanations for the black-box is unclear. We focus on explanations applied to tabular datasets, suggesting that explanations do not necessarily preserve the fairness properties of the black-box algorithm. In other words, explanation algorithms can ignore or obscure critical relevant properties, creating incorrect or misleading explanations. More broadly, we propose future research directions for evaluating and generating explanations such that they are informative and relevant from...

Research Paper

What will it take to generate fairness-preserving explanations?

In situations where explanations of black-box models may be useful, the fairness of the blackbox is also often a relevant concern. However, the link between the fairness of the black-box model and the behavior of explanations for the black-box is unclear. We focus on explanations applied to tabular datasets, suggesting that explanations do not necessarily preserve the fairness properties of…

Jun 24, 2021 •

J. Dai, et al.

Learn more about What will it take to generate fairness-preserving explanations?

Stephen Bach

The latest from Stephen

For models that need to be right. Not just good enough.

How do you want to work with Snorkel?