

Proliferating web technology has contributed to information warfare in recent conflicts. Artificial Intelligence (AI) can play a significant role in stemming disinformation campaigns, cyber-attacks, and informing diplomacy in the rapidly evolving situation in Ukraine. Snorkel AI is dedicated to supporting the National Security community and other enterprise organizations with state-of-the-art AI technology. We see this as our responsibility in the…
Genentech, a global biotech leader and member of the Roche Group, leveraged Snorkel Flow to extract critical information from lengthy clinical trial protocol (CTP) pdf documents. They built AI applications that used NER, entity linking, text extraction, and classification models to determine inclusion/ exclusion criteria and to analyze Schedules of Assessments. Genentech’s team achieved 95-99% model accuracy by using Snorkel…
The future of data-centric AI talk series Background Michael DAndrea is the Principal Data Scientist at Genentech. He earned his MBA from Cornell University and a Master’s degree in Computing and Education from Columbia University. He currently works on using unstructured data sources for clinical trial analytics and his team is partnered with the Stanford “AI For Health” initiative as…
We’re excited to announce the Q4 2021 LTS release of Snorkel Flow, our data-centric AI development platform powered by programmatic labeling. This latest release introduces a number of new product capabilities and enhancements, from a streamlined programmatic data development interface, to enhanced auto-suggest for labeling functions, to new machine learning capabilities like AutoML, to significant performance enhancements for PDF data…
Moving from Manual to Programmatic Labeling Labeling training data by hand is exhausting. It’s tedious, slow, and expensive—the de facto bottleneck most AI/ML teams face today 1. Eager to alleviate this pain point of AI development, machine learning practitioners have long sought ways to automate this labor-intensive labeling process (i.e., “automated data labeling”) 2, and have reached for classic approaches…


The Future of Data-Centric AI Talk Series Background Alex Ratner is CEO and co-founder of Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. He recently joined the Future of Data-Centric AI event, where he presented the principles of data-centric AI and where it’s headed. If you would like to watch his presentation in full,…
Machine Learning Whiteboard (MLW) Open-source Series Today, Ryan Smith, machine learning research engineer at Snorkel AI, talks about prompting methods with language models and some applications they have with weak supervision. In this talk, we’re essentially going to be using this paper as a template—this paper is a great survey over some methods in prompting from the last few years…
The Snorkel AI founding team started the Snorkel Research Project at Stanford AI Lab in 2015, where we set out to explore a higher-level interface to machine learning through training data. This project was sponsored by Google, Intel, DARPA, and several other leading organizations and the research was represented in over 40 academic conferences such as ACL, NeurIPS, Nature and…
The Future of Data-Centric AI Talk Series Background Roshni Malani received her PhD in Software Engineering from the University of California, San Diego, and has previously worked on Siri at Apple and as a founding engineer for Google Photos. She gave a presentation at the Future of Data-Centric AI virtual conference in September 2021. Her presentation is below, lightly edited…





