The future of data-centric AI talk series Background Michael DAndrea is the Principal Data Scientist at Genentech. He earned his MBA from Cornell University and a Master’s degree in Computing and Education from Columbia University. He currently works on using unstructured data sources for clinical trial analytics and his team is partnered with the Stanford “AI For Health” initiative as…
We’re excited to announce the Q4 2021 LTS release of Snorkel Flow, our data-centric AI development platform powered by programmatic labeling. This latest release introduces a number of new product capabilities and enhancements, from a streamlined programmatic data development interface, to enhanced auto-suggest for labeling functions, to new machine learning capabilities like AutoML, to significant performance enhancements for PDF data…
Moving from Manual to Programmatic Labeling Labeling training data by hand is exhausting. It’s tedious, slow, and expensive—the de facto bottleneck most AI/ML teams face today 1. Eager to alleviate this pain point of AI development, machine learning practitioners have long sought ways to automate this labor-intensive labeling process (i.e., “automated data labeling”) 2, and have reached for classic approaches…
The Future of Data-Centric AI Talk Series Background Alex Ratner is CEO and co-founder of Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. He recently joined the Future of Data-Centric AI event, where he presented the principles of data-centric AI and where it’s headed. If you would like to watch his presentation in full,…
Machine Learning Whiteboard (MLW) Open-source Series Today, Ryan Smith, machine learning research engineer at Snorkel AI, talks about prompting methods with language models and some applications they have with weak supervision. In this talk, we’re essentially going to be using this paper as a template—this paper is a great survey over some methods in prompting from the last few years…
The Snorkel AI founding team started the Snorkel Research Project at Stanford AI Lab in 2015, where we set out to explore a higher-level interface to machine learning through training data. This project was sponsored by Google, Intel, DARPA, and several other leading organizations and the research was represented in over 40 academic conferences such as ACL, NeurIPS, Nature and…
The Future of Data-Centric AI Talk Series Background Roshni Malani received her PhD in Software Engineering from the University of California, San Diego, and has previously worked on Siri at Apple and as a founding engineer for Google Photos. She gave a presentation at the Future of Data-Centric AI virtual conference in September 2021. Her presentation is below, lightly edited…
Machine Learning Whiteboard (MLW) Open-source Series We launched the machine learning whiteboard series (MLW) was launched earlier this year as an open-invitation forum to brainstorm ideas and discuss the latest papers, techniques, and workflows in artificial intelligence. Everyone interested in learning about machine learning can participate in an informal and open environment. If you are interested in learning about ML,…
ScienceTalks with Abigail See. Diving into the misconceptions of AI, the challenges of natural language generation (NLG), and the path to large-scale NLG deployment In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Abigail See, an expert natural language processing (NLP) researcher and educator from Stanford University. We discuss Abigail’s path into machine learning (ML), her previous…
Machine Learning Whiteboard (MLW) Open-source Series For our new visitors, we started our machine learning whiteboard (MLW) series earlier this year as an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. In which, we emphasize an informal and open environment to everyone interested in learning about machine learning. So, if you are interested…
Enabling iterative development workflows with Snorkel Flow’s Application Studio. Consider this scenario— we’re AI engineers, and we’re building a social media monitoring application to track the sentiment of Fortune 500 company mentions in the news.
The Future of Data-Centric AI Talk Series Background Snorkel co-founder Chris Ré is an associate professor of Computer Science at Stanford University and an award-winning researcher in data-based theory and machine learning. He has co-founded four companies based on his research in machine learning systems. Chris recently presented at the Future of Data-Centric AI virtual event in September, where he…
ScienceTalks with Saam Motamedi We at Snorkel AI have received many requests from data scientists and machine learning engineers who aspire to be founders, where do they start and how should they get started on their entrepreneurial journey? We genuinely believe that data scientists and machine learning engineers will build the next generation of mega-enterprises. Over the summer, we’ve recorded…
Machine Learning Whiteboard (MLW) Open-source Series We started our machine learning whiteboard (MLW) series earlier this year as an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Fait Poms, a Ph.D. student at Stanford…
Main takeaways from The Future of Data-Centric AI Event We recently hosted The Future of Data-Centric AI, where academia, research, and industry experts and practitioners came together to discuss the shift from model-centric AI development to data-centric AI and what lies ahead. This post gives you a quick overview of the event and top takeaways from over eight hours of…
Defining and Building Malleable ML Systems – Machine Learning Whiteboard (MLW) Open-Source Series As you may know, earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine…
Frontend Development Best Practices for Working With Lots of Data From Snorkel AI Engineering As a frontend engineer, it’s often easy to run into limitations when scaling large applications. At Snorkel AI, we often run into times where our users work with data that scales into the gigabytes when using Snorkel Flow. We have built Snorkel Flow around two core…
Snorkel Flow LTS Release Summer ‘21 By adopting Snorkel Flow, a data-centric AI development platform powered by programmatic labeling, our customers have changed how they build and deploy AI applications. We’ve seen our customers save tens-of-millions of dollars in manual labeling costs and person-years of time by applying weak supervision with Snorkel Flow.Over the last few months, we’ve been hard…
ScienceTalks with Paroma Varma In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Paroma Varma – a co-founder of Snorkel AI and one of the first and leading contributors to the Snorkel project. We discuss Paroma’s path into machine learning, her work in optimization and signal processing during her undergrad, weak supervision and image data during her…
Diving Into SliceLine – Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Kaushik Shivakumar dives into…
Join the live discussion. Learn how to unlock data-centric AI and make AI development practical in your organization Working with vast unstructured and unlabeled data is one of the bottlenecks in the machine learning lifecycle. Machine learning models can only get as reliable and accurate as the data being fed to them. With a data-centric approach 1, your data science…
We started the Snorkel project at the Stanford AI lab in 2015 around two core hypotheses:
Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, Manan Shah dives into “Glean: Structured Extractions from…
The how, what, and why of Snorkel’s programmatic data labeling approach and the state-of-the-art Snorkel Flow platform. The year was 2015. For the first time, machine learning (ML) had outperformed humans in the annual ImageNet challenge.
Machine Learning Whiteboard (MLW) Open-source Series Our machine learning whiteboard (MLW) is an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in discovering more about machine learning.In this episode, Hiromu Hota, Vincent Sunn Chen, Daniel Y. Fu, and Frederic Sala dive…
In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Jason Fries – a research scientist at Stanford University’s Biomedical Informatics Research lab and Snorkel Research, and one of the first contributors to the Snorkel open-source library. We discuss Jason’s path into machine learning, empowering doctors and scientists with weak supervision, and utilizing organizational resources in biomedical applications of Snorkel. This episode is part…
Machine Learning Whiteboard (MLW) Open-source Series Earlier this year, we started our machine learning whiteboard (MLW) series, an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.In this episode, our Co-founder and Head of Technology. Braden Hancock…
In this episode of Science Talks, Frederic Sala – an assistant professor of Computer Science at the University of Wisconsin Madison and a research scientist at Snorkel discusses his path into machine learning, the central thesis that ties together his multidisciplinary research, his thoughts on the future of weak supervision, as well as his decision to go into academia.
Impractical ML assumptions are made every day in research, which limit its adoption. In the real world, these assumptions do not hold up. Learn more about how to avoid making these assumptions about AI application development.
In this episode of Science Talks, Explosion AI’s Ines Montani sat down with Snorkel AI’s Braden Hancock to discuss her path into machine learning, key design decisions behind the popular spaCy library for industrial-strength NLP, the importance of bringing together different stakeholders in the ML development process, and more.This episode is part of the #ScienceTalks video series hosted by the…