Image
author

Braden Hancock

Co-founder
,
Snorkel AI

Braden is a co-founder and Head of Technology at Snorkel AI. Before Snorkel, Braden spent four years developing new programmatic approaches for efficiently labeling, augmenting, and structuring training data with the Stanford AI Lab, Facebook, and Google. Prior to that, he performed NLP and ML research at Johns Hopkins University and MIT Lincoln Laboratory and earned a B.S. in Mechanical Engineering from Brigham Young University.

The latest from Braden

Systems and methods for programmatic labeling of training data for machine learning models via clustering and language model prompting
Embodiments introduce an approach to semi-automatically generate labels for data based on implementation of a clustering or language model prompting technique and can be used to implement a form of programmatic labeling to accelerate the development of classifiers and other forms of models. The disclosed methodology is particularly helpful in generating labels or annotations for unstructured data. In some embodiments, the disclosed approach may be used with data in the form of text, images, or other form of unstructured data.
Research Paper
Systems and methods for programmatic labeling of training data for machine learning models via clustering and language model prompting

Embodiments introduce an approach to semi-automatically generate labels for data based on implementation of a clustering or language model prompting technique and can be used to implement a form of programmatic labeling to accelerate the development of classifiers and other forms of models. The disclosed methodology is particularly helpful in generating labels or annotations for unstructured data. In some embodiments,…

Sep 23, 2024
RN Smith, et all.
Learn more about Systems and methods for programmatic labeling of training data for machine learning models via clustering and language model prompting
The Llama 3 Herd of Models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B...
Research Paper
The Llama 3 Herd of Models

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents…

Sep 18, 2024
A. Dubey, et al.
Learn more about The Llama 3 Herd of Models
Language Models in the Loop: Incorporating Prompting into Weak Supervision
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data....
Research Paper
Language Models in the Loop: Incorporating Prompting into Weak Supervision

We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct…

Aug 22, 2024
R. Smith et al.
Learn more about Language Models in the Loop: Incorporating Prompting into Weak Supervision
DMLR: Data-centric Machine Learning Research-Past, Present and Future
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.
Research Paper
DMLR: Data-centric Machine Learning Research-Past, Present and Future

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods…

Nov 21, 2023
L. Oala, et al.
Learn more about DMLR: Data-centric Machine Learning Research-Past, Present and Future
Better not bigger: How to get GPT-3 quality at 0.1% the cost
Blog
Better not bigger: How to get GPT-3 quality at 0.1% the cost

We created Data-centric Foundation Model Development to bridge the gaps between foundation models and enterprise AI. New Snorkel Flow capabilities (Foundation Model Fine-tuning, Warm Start, and Prompt Builder) give data science and machine learning teams the tools they need to effectively put foundation models (FMs) to use for performance-critical enterprise use cases. The need is clear: despite undeniable excitement about…

Learn more about Better not bigger: How to get GPT-3 quality at 0.1% the cost
ICLR 2022 recap from Snorkel AI
Blog
ICLR 2022 recap from Snorkel AI

We are honored to be part of the International Conference on Learning Representations (ICLR) 2022, where Snorkel AI founders and researchers will be presenting five papers on data-centric AI topics The field of artificial intelligence moves fast!  This is a world we are intimately familiar with at Snorkel AI, having spun out of academia in 2019. For over half a…

Apr 20, 2022
Learn more about ICLR 2022 recap from Snorkel AI
Blog
Making Automated Data Labeling a Reality in Modern AI

Moving from Manual to Programmatic Labeling Labeling training data by hand is exhausting. It’s tedious, slow, and expensive—the de facto bottleneck most AI/ML teams face today 1. Eager to alleviate this pain point of AI development, machine learning practitioners have long sought ways to automate this labor-intensive labeling process (i.e., “automated data labeling”) 2, and have reached for classic approaches…

Feb 04, 2022
Learn more about Making Automated Data Labeling a Reality in Modern AI
How to Use Snorkel to Build AI Applications
Blog
How to Use Snorkel to Build AI Applications

The how, what, and why of Snorkel’s programmatic data labeling approach and the state-of-the-art Snorkel Flow platform. The year was 2015. For the first time, machine learning (ML) had outperformed humans in the annual ImageNet challenge.

Jul 09, 2021
Learn more about How to Use Snorkel to Build AI Applications
Blog
3 Impractical Assumptions About AI to Avoid

Impractical ML assumptions are made every day in research, which limit its adoption. In the real world, these assumptions do not hold up. Learn more about how to avoid making these assumptions about AI application development.

May 04, 2021
Learn more about 3 Impractical Assumptions About AI to Avoid
1 2

For models that need to be right. Not just good enough.