All articles on
Research

Auto LF generation: Lots of little models, big benefits

Constructing labeling functions (LFs) is at the heart of using weak supervision. We often think of these labeling functions as programmatic expressions of domain expertise or heuristics. Indeed, much of the advantage of weak supervision is that we can save time—writing labeling functions and applying them to data at scale is much more efficient compared to hand-labeling huge numbers of…

May 31, 2022

Building a COVID fact-checking system with external knowledge

Powerful resources to leverage as labeling functions In this post, we’ll use the COVID-FACT dataset to demonstrate how to use existing resources as labeling functions (LFs), to build a fact-checking system. The COVID-FACT dataset contains 4086 claims about the COVID-19 pandemic; it contains claims, evidence for the claims, and contradictory claims refuted by the evidence. The evidence retrieval is formulated…

Annie Yang portrayed
May 26, 2022

Panel discussion: Academic and industry perspectives on ethical AI

This post showcases a panel discussion on the academic and industry perspectives of ethical AI, which was moderated by Director of Federal Strategy and Growth, Alexis Zumwalt, Fouts Family Early Career Professor and Lead of Ethical AI (NSF AI Institute AI4OPT), Georgia Institute of Technology, Swati Gupta, Chief Data Officer, Department of the Navy, Thomas Sasalsa, Senior Manager of Responsible…

Dr. Bubbles, Snorkel AI's mascot
May 24, 2022

Programmatic labeling

The founding team of Snorkel AI has spent over half a decade—first at the Stanford AI Lab and now at Snorkel AI—researching programmatic labeling and other techniques for breaking through the biggest bottleneck in AI: the lack of labeled training data. This research has resulted in the Snorkel research project and 150+ peer-reviewed publications. Snorkel’s programmatic labeling technology has been…

Dr. Bubbles, Snorkel AI's mascot
May 22, 2022

Data-centric AI: A complete primer

The founding team of Snorkel AI has spent over half a decade—first at the Stanford AI Lab and now at Snorkel AI—researching data-centric techniques to overcome the biggest bottleneck in AI: The lack of labeled training data. In this video Snorkel AI co-founder Paroma Varma gives an overview of the key principles of data-centric AI development. What is data-centric AI?…

Dr. Bubbles, Snorkel AI's mascot
May 17, 2022

Liger: Fusing foundation model embeddings & weak supervision

Showcasing Liger—a combination of foundation model embeddings to improve weak supervision techniques. Machine learning whiteboard (MLW) open-source series In this talk, Mayee Chen, a PhD student in Computer Science at Stanford University focuses on her work combining weak supervision and foundation model embeddings that improve two essential aspects of current weak supervision techniques. Check out the full episode here or…

Dr. Bubbles, Snorkel AI's mascot
May 9, 2022

Active learning: an overview

A primer on active learning presented by Josh McGrath. Machine learning whiteboard (MLW) open-source series This video defines active learning, explores variants and design decisions made within active learning pipelines, and compares it to related methods. It contains references to some seminal papers in machine learning that we find instructive. Check out the full video below or on Youtube. Additionally, a…

May 4, 2022

Using few-shot learning language models as weak supervision

Utilizing large language models as zero-shot and few-shot learners with Snorkel for better quality and more flexibility Large language models (LLMs) such as BERT, T5, GPT-3, and others are exceptional resources for applying general knowledge to your specific problem. Being able to frame a new task as a question for a language model (zero-shot learning), or showing it a few…

May 3, 2022

ICLR 2022 recap from Snorkel AI

We are honored to be part of the International Conference on Learning Representations (ICLR) 2022, where Snorkel AI founders and researchers will be presenting five papers on data-centric AI topics The field of artificial intelligence moves fast!  This is a world we are intimately familiar with at Snorkel AI, having spun out of academia in 2019. For over half a…

April 20, 2022

Algorithms that leverage data from other tasks with Chelsea Finn

The Future of Data-Centric AI Talk Series Background Chelsea Finn is an assistant professor of computer science and electrical engineering at Stanford University, whose research has been widely recognized, including in the New York Times and MIT Technology Review. In this talk, Chelsea talks about algorithms that use data from tasks you are interested in and data from other tasks….

Dr. Bubbles, Snorkel AI's mascot
March 31, 2022

Learning with imperfect labels and visual data with Anima Anandkumar

The future of data-centric AI talk series Background Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech and the director of machine learning research at NVIDIA. Anima also has a long list of accomplishments ranging from the Alfred P. Sloan scholarship to the prestigious NSF career award and many more. She recently joined…

Dr. Bubbles, Snorkel AI's mascot
March 18, 2022

Weak Supervision Modeling with Fred Sala

Understanding the label model. Machine learning whiteboard (MLW) open-source series Background Frederic Sala, is an assistant professor at the University of Wisconsin-Madison, and a research scientist at Snorkel AI. Previously, he was a postdoc in Chris Re’s lab at Stanford. His research focuses on data-driven systems and weak supervision. In this talk, Fred focuses on weak supervision modeling. This machine…

Dr. Bubbles, Snorkel AI's mascot
March 17, 2022

Making Automated Data Labeling a Reality in Modern AI

Moving from Manual to Programmatic Labeling Labeling training data by hand is exhausting. It’s tedious, slow, and expensive—the de facto bottleneck most AI/ML teams face today 1. Eager to alleviate this pain point of AI development, machine learning practitioners have long sought ways to automate this labor-intensive labeling process (i.e., “automated data labeling”) 2, and have reached for classic approaches…

February 4, 2022

The Principles of Data-Centric AI Development

The Future of Data-Centric AI Talk Series Background Alex Ratner is CEO and co-founder of Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. He recently joined the Future of Data-Centric AI event, where he presented the principles of data-centric AI and where it’s headed. If you would like to watch his presentation in full,…

Dr. Bubbles, Snorkel AI's mascot
January 25, 2022

Prompting Methods with Language Models and Their Applications to Weak Supervision

Machine Learning Whiteboard (MLW) Open-source Series  Today, Ryan Smith, machine learning research engineer at Snorkel AI, talks about prompting methods with language models and some applications they have with weak supervision. In this talk, we’re essentially going to be using this paper as a template—this paper is a great survey over some methods in prompting from the last few years…

Dr. Bubbles, Snorkel AI's mascot
January 19, 2022
1 3 4 5 6 7
Image

Ready to accelerate AI development?

Deploy production AI and ML applications 10-100x faster with Snorkel Flow, the AI data development platform.
Request a demo