Latest posts

Data extraction from SEC filings (10-Ks) with Snorkel Flow

Leveraging Snorkel Flow to extract critical data from annual quarterly reports (10-Ks) Introduction It can surprise those who have never logged into EDGAR how much information is available in annual reports from public companies. You can find tactical details like the names of senior leadership, top shareholders, and more strategic information like earnings, risk factors, and the company strategy and vision. Warren…

May 10, 2022

Liger: Fusing foundation model embeddings & weak supervision

Showcasing Liger—a combination of foundation model embeddings to improve weak supervision techniques. Machine learning whiteboard (MLW) open-source series In this talk, Mayee Chen, a PhD student in Computer Science at Stanford University focuses on her work combining weak supervision and foundation model embeddings that improve two essential aspects of current weak supervision techniques. Check out the full episode here or…

Dr. Bubbles, Snorkel AI's mascot
May 9, 2022

AI in cybersecurity an introduction and case studies

An introduction to AI in cybersecurity with real-world case studies in a Fortune 500 organization and a government agency Despite all the recent advances in artificial intelligence and machine learning (AI/ML) applied to a vast array of application areas and use cases, success in AI in cybersecurity remains elusive. The key component to building AI/ML applications is training data, which…

Nic Acton portrayed
May 5, 2022

Active learning: an overview

A primer on active learning presented by Josh McGrath. Machine learning whiteboard (MLW) open-source series This video defines active learning, explores variants and design decisions made within active learning pipelines, and compares it to related methods. It contains references to some seminal papers in machine learning that we find instructive. Check out the full video below or on Youtube. Additionally, a…

May 4, 2022

Using few-shot learning language models as weak supervision

Utilizing large language models as zero-shot and few-shot learners with Snorkel for better quality and more flexibility Large language models (LLMs) such as BERT, T5, GPT-3, and others are exceptional resources for applying general knowledge to your specific problem. Being able to frame a new task as a question for a language model (zero-shot learning), or showing it a few…

May 3, 2022

Accelerating AI in healthcare

How can data-centric AI speeds your end-to-end healthcare AI development and deployment Healthcare is a field that is awash in data, and managing it all is complicated and expensive. As an industry, it benefits tremendously from the ongoing development of machine learning and data-centric AI. The potential benefits of AI integration in healthcare can be broken down into two categories:…

Dr. Bubbles, Snorkel AI's mascot
April 29, 2022

Bill of materials for responsible AI: collaborative labeling

In our previous posts, we discussed how explainable AI is crucial to ensure the transparency and auditability of your AI deployments and how trustworthy AI adoption and its successful integration into our country’s critical infrastructure and systems are paramount. In this post, we dive into making trustworthy and responsible AI possible with Snorkel Flow, the data-centric AI platform for government and federal agencies. Collaborative labeling and…

Alexis Zumwalt portrayed
April 28, 2022

ICLR 2022 recap from Snorkel AI

We are honored to be part of the International Conference on Learning Representations (ICLR) 2022, where Snorkel AI founders and researchers will be presenting five papers on data-centric AI topics The field of artificial intelligence moves fast!  This is a world we are intimately familiar with at Snorkel AI, having spun out of academia in 2019. For over half a…

April 20, 2022

Explainability through provenance and lineage

In our previous post, we discussed how trustworthy AI adoption and its successful integration into our country’s critical infrastructure and systems are paramount. In this post, we discuss how explainability in AI is crucial to ensure the transparency and auditability of your AI deployments. Outputs from trustworthy AI applications must be explainable in understandable terms based on the design and implementation of…

Alexis Zumwalt portrayed
April 19, 2022

Spring 2022 Snorkel Flow release roundup

Latest features and platform improvements for Snorkel Flow 2022 is off to a strong start as we continue to make the benefits of data-centric AI more accessible to the enterprise. With this release, we’re further empowering AI/ML teams to drive rapid, analysis-driven training data iteration and development. Improvements include streamlined data exploration and programmatic labeling workflows, integrated active learning and AutoML,…

Molly Friederich portrayed
April 14, 2022

Introduction to trustworthy AI

The adoption of trustworthy AI and its successful integration into our country’s most critical systems is paramount to achieving the goal of employing AI applications to accelerate economic prosperity and national security. However, traditional approaches to developing AI applications suffer from a critical flaw that leads to significant ethics and governance concerns. Specifically, AI today relies on massive, hand-labeled training datasets…

Alexis Zumwalt portrayed
April 7, 2022

How to better govern ML models? Hint: auditable training data

ML models will always have some level of bias. Rather than relying on black-box algorithms, how can we make the entire AI development workflow more auditable? How do we build applications where bias can be easily detected and quickly managed? Today, most organizations focus their model governance efforts on investigating model performance and the bias within the predictions. Data science…

April 6, 2022

Algorithms that leverage data from other tasks with Chelsea Finn

The Future of Data-Centric AI Talk Series Background Chelsea Finn is an assistant professor of computer science and electrical engineering at Stanford University, whose research has been widely recognized, including in the New York Times and MIT Technology Review. In this talk, Chelsea talks about algorithms that use data from tasks you are interested in and data from other tasks….

Dr. Bubbles, Snorkel AI's mascot
March 31, 2022
March 21, 2022

Learning with imperfect labels and visual data with Anima Anandkumar

The future of data-centric AI talk series Background Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech and the director of machine learning research at NVIDIA. Anima also has a long list of accomplishments ranging from the Alfred P. Sloan scholarship to the prestigious NSF career award and many more. She recently joined…

Dr. Bubbles, Snorkel AI's mascot
March 18, 2022

Weak Supervision Modeling with Fred Sala

Understanding the label model. Machine learning whiteboard (MLW) open-source series Background Frederic Sala, is an assistant professor at the University of Wisconsin-Madison, and a research scientist at Snorkel AI. Previously, he was a postdoc in Chris Re’s lab at Stanford. His research focuses on data-driven systems and weak supervision. In this talk, Fred focuses on weak supervision modeling. This machine…

Dr. Bubbles, Snorkel AI's mascot
March 17, 2022

Tips for using a data-centric AI approach

The future of data-centric AI talk series Background Andrew Ng is a machine-learning pioneer, founder and CEO of Landing AI, and a former team leader at Google Brain. Recently he gave a presentation to the Future of Data-Centric AI virtual conference, where he discussed some practical tips for responsible data-centric AI development. This presentation dives into tips for data-centric AI applicable…

Dr. Bubbles, Snorkel AI's mascot
March 9, 2022

Resilient enterprise AI application development

Using a data-centric approach to capture the best of rule-based systems and ML models for enterprise AI One of the biggest challenges to making AI practical for the enterprise is keeping the AI application relevant (and therefore valuable) in the face of ever-changing input data and evolving business objectives. Practitioners typically use one of two approaches to build these AI applications:…

March 3, 2022

How AI can be used to rapidly respond to information warfare in the Russia-Ukraine conflict

Proliferating web technology has contributed to information warfare in recent conflicts. Artificial Intelligence (AI) can play a significant role in stemming disinformation campaigns, cyber-attacks, and informing diplomacy in the rapidly evolving situation in Ukraine. Snorkel AI is dedicated to supporting the National Security community and other enterprise organizations with state-of-the-art AI technology. We see this as our responsibility in the…

Nic Acton portrayed
February 28, 2022

How Genentech extracted information for clinical trial analytics with Snorkel Flow

Genentech, a global biotech leader and member of the Roche Group, leveraged Snorkel Flow to extract critical information from lengthy clinical trial protocol (CTP) pdf documents. They built AI applications that used NER, entity linking, text extraction, and classification models to determine inclusion/ exclusion criteria and to analyze Schedules of Assessments. Genentech’s team achieved 95-99% model accuracy by using Snorkel…

Dr. Bubbles, Snorkel AI's mascot
February 26, 2022

Augmenting the clinical trial design process with information extraction

The future of data-centric AI talk series Background Michael DAndrea is the Principal Data Scientist at Genentech. He earned his MBA from Cornell University and a Master’s degree in Computing and Education from Columbia University. He currently works on using unstructured data sources for clinical trial analytics and his team is partnered with the Stanford “AI For Health” initiative as…

Dr. Bubbles, Snorkel AI's mascot
February 22, 2022

Q4 LTS Release of Snorkel Flow

We’re excited to announce the Q4 2021 LTS release of Snorkel Flow, our data-centric AI development platform powered by programmatic labeling. This latest release introduces a number of new product capabilities and enhancements, from a streamlined programmatic data development interface, to enhanced auto-suggest for labeling functions, to new machine learning capabilities like AutoML, to significant performance enhancements for PDF data…

February 8, 2022

Making Automated Data Labeling a Reality in Modern AI

Moving from Manual to Programmatic Labeling Labeling training data by hand is exhausting. It’s tedious, slow, and expensive—the de facto bottleneck most AI/ML teams face today 1. Eager to alleviate this pain point of AI development, machine learning practitioners have long sought ways to automate this labor-intensive labeling process (i.e., “automated data labeling”) 2, and have reached for classic approaches…

February 4, 2022

The Principles of Data-Centric AI Development

The Future of Data-Centric AI Talk Series Background Alex Ratner is CEO and co-founder of Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. He recently joined the Future of Data-Centric AI event, where he presented the principles of data-centric AI and where it’s headed. If you would like to watch his presentation in full,…

Dr. Bubbles, Snorkel AI's mascot
January 25, 2022

Prompting Methods with Language Models and Their Applications to Weak Supervision

Machine Learning Whiteboard (MLW) Open-source Series  Today, Ryan Smith, machine learning research engineer at Snorkel AI, talks about prompting methods with language models and some applications they have with weak supervision. In this talk, we’re essentially going to be using this paper as a template—this paper is a great survey over some methods in prompting from the last few years…

Dr. Bubbles, Snorkel AI's mascot
January 19, 2022

Advancing Snorkel from research to production

The Snorkel AI founding team started the Snorkel Research Project at Stanford AI Lab in 2015, where we set out to explore a higher-level interface to machine learning through training data. This project was sponsored by Google, Intel, DARPA, and several other leading organizations and the research was represented in over 40 academic conferences such as ACL, NeurIPS, Nature and…

Dr. Bubbles, Snorkel AI's mascot
January 18, 2022

Building AI Applications Collaboratively Using Data-centric AI

The Future of Data-Centric AI Talk Series Background Roshni Malani received her PhD in Software Engineering from the University of California, San Diego, and has previously worked on Siri at Apple and as a founding engineer for Google Photos. She gave a presentation at the Future of Data-Centric AI virtual conference in September 2021. Her presentation is below, lightly edited…

Dr. Bubbles, Snorkel AI's mascot
January 14, 2022

Epoxy: Using Semi-Supervised Learning to Augment Weak Supervision

Machine Learning Whiteboard (MLW) Open-source Series We launched the machine learning whiteboard series (MLW) was launched earlier this year as an open-invitation forum to brainstorm ideas and discuss the latest papers, techniques, and workflows in artificial intelligence. Everyone interested in learning about machine learning can participate in an informal and open environment. If you are interested in learning about ML,…

Dr. Bubbles, Snorkel AI's mascot
December 16, 2021

Artificial Intelligence (AI) Facts and Myths

ScienceTalks with Abigail See. Diving into the misconceptions of AI, the challenges of natural language generation (NLG), and the path to large-scale NLG deployment In this episode of Science Talks, Snorkel AI’s Braden Hancock chats with Abigail See, an expert natural language processing (NLP) researcher and educator from Stanford University. We discuss Abigail’s path into machine learning (ML), her previous…

Dr. Bubbles, Snorkel AI's mascot
November 23, 2021

PonderNet: Learning to Ponder by DeepMind

Machine Learning Whiteboard (MLW) Open-source Series For our new visitors, we started our machine learning whiteboard (MLW) series earlier this year as an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. In which, we emphasize an informal and open environment to everyone interested in learning about machine learning. So, if you are interested…

Dr. Bubbles, Snorkel AI's mascot
November 10, 2021
1 6 7 8 9 10