Constructing labeling functions (LFs) is at the heart of using weak supervision. We often think of these labeling functions as programmatic expressions of domain expertise or heuristics. Indeed, much of the advantage of weak supervision is that we can save time—writing labeling functions and applying them to data at scale is much more efficient compared to hand-labeling huge numbers of…
Powerful resources to leverage as labeling functions In this post, we’ll use the COVID-FACT dataset to demonstrate how to use existing resources as labeling functions (LFs), to build a fact-checking system. The COVID-FACT dataset contains 4086 claims about the COVID-19 pandemic; it contains claims, evidence for the claims, and contradictory claims refuted by the evidence. The evidence retrieval is formulated…
This post showcases a panel discussion on the academic and industry perspectives of ethical AI, which was moderated by Director of Federal Strategy and Growth, Alexis Zumwalt, Fouts Family Early Career Professor and Lead of Ethical AI (NSF AI Institute AI4OPT), Georgia Institute of Technology, Swati Gupta, Chief Data Officer, Department of the Navy, Thomas Sasalsa, Senior Manager of Responsible…
The founding team of Snorkel AI has spent over half a decade—first at the Stanford AI Lab and now at Snorkel AI—researching programmatic labeling and other techniques for breaking through the biggest bottleneck in AI: the lack of labeled training data. This research has resulted in the Snorkel research project and 150+ peer-reviewed publications. Snorkel’s programmatic labeling technology has been…
The founding team of Snorkel AI has spent over half a decade—first at the Stanford AI Lab and now at Snorkel AI—researching data-centric techniques to overcome the biggest bottleneck in AI: The lack of labeled training data. In this video Snorkel AI co-founder Paroma Varma gives an overview of the key principles of data-centric AI development. What is data-centric AI?…
Showcasing Liger—a combination of foundation model embeddings to improve weak supervision techniques. Machine learning whiteboard (MLW) open-source series In this talk, Mayee Chen, a PhD student in Computer Science at Stanford University focuses on her work combining weak supervision and foundation model embeddings that improve two essential aspects of current weak supervision techniques. Check out the full episode here or…
A primer on active learning presented by Josh McGrath. Machine learning whiteboard (MLW) open-source series This video defines active learning, explores variants and design decisions made within active learning pipelines, and compares it to related methods. It contains references to some seminal papers in machine learning that we find instructive. Check out the full video below or on Youtube. Additionally, a…
Utilizing large language models as zero-shot and few-shot learners with Snorkel for better quality and more flexibility Large language models (LLMs) such as BERT, T5, GPT-3, and others are exceptional resources for applying general knowledge to your specific problem. Being able to frame a new task as a question for a language model (zero-shot learning), or showing it a few…
We are honored to be part of the International Conference on Learning Representations (ICLR) 2022, where Snorkel AI founders and researchers will be presenting five papers on data-centric AI topics The field of artificial intelligence moves fast! This is a world we are intimately familiar with at Snorkel AI, having spun out of academia in 2019. For over half a…
The Future of Data-Centric AI Talk Series Background Chelsea Finn is an assistant professor of computer science and electrical engineering at Stanford University, whose research has been widely recognized, including in the New York Times and MIT Technology Review. In this talk, Chelsea talks about algorithms that use data from tasks you are interested in and data from other tasks….
The future of data-centric AI talk series Background Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech and the director of machine learning research at NVIDIA. Anima also has a long list of accomplishments ranging from the Alfred P. Sloan scholarship to the prestigious NSF career award and many more. She recently joined…
Understanding the label model. Machine learning whiteboard (MLW) open-source series Background Frederic Sala, is an assistant professor at the University of Wisconsin-Madison, and a research scientist at Snorkel AI. Previously, he was a postdoc in Chris Re’s lab at Stanford. His research focuses on data-driven systems and weak supervision. In this talk, Fred focuses on weak supervision modeling. This machine…
Moving from Manual to Programmatic Labeling Labeling training data by hand is exhausting. It’s tedious, slow, and expensive—the de facto bottleneck most AI/ML teams face today 1. Eager to alleviate this pain point of AI development, machine learning practitioners have long sought ways to automate this labor-intensive labeling process (i.e., “automated data labeling”) 2, and have reached for classic approaches…
The Future of Data-Centric AI Talk Series Background Alex Ratner is CEO and co-founder of Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. He recently joined the Future of Data-Centric AI event, where he presented the principles of data-centric AI and where it’s headed. If you would like to watch his presentation in full,…
Machine Learning Whiteboard (MLW) Open-source Series Today, Ryan Smith, machine learning research engineer at Snorkel AI, talks about prompting methods with language models and some applications they have with weak supervision. In this talk, we’re essentially going to be using this paper as a template—this paper is a great survey over some methods in prompting from the last few years…