The Snorkel AI Blog

Unmasking Trafficking Risk in Commercial Sex Supply Chains with Machine Learning

Hamsa Bastani presented a summary of her and her co-authors’ ongoing work using machine learning and Snorkel AI’s tools to detect and track activities that are associated with a high risk for global sex trafficking.

Data Development, Data Labeling, NLP, Partners

Team Snorkel

January 20, 2023

Prompting and weak supervision to build better, smaller models

Snorkel AI co-founder and CEO Alex Ratner recently interviewed several Snorkel researchers about their published academic papers. In this video, Alex talks with Ryan Smith, Senior Applied Scientist at Snorkel, about the work he did on using foundation models to build compact, deployable, and effective models.

Annotation, Data-Centric AI, Evaluation, Foundation Models, NLP

Team Snorkel

January 19, 2023

FM Summit shows Foundation Model hurdles and potential

Snorkel AI held its Foundation Model Summit Jan 17, bringing together 12 presenters and over 600 attendees at 10 virtual sessions. The event drew registrants from across many sectors, including the tech industry, healthcare, and financial services.

Alignment, Data Development, Data Labeling, Data-Centric AI, Evaluation, Fine-Tuning, Foundation Models, LLMs, NLP, Product Releases

Matt Casey

January 18, 2023

Contrastive Learning boosts Foundation Model specialization

Snorkel AI co-founder and CEO Alex Ratner talks with Ananya Kumar about the work he did on improving the effectiveness of foundation models by using contrastive learning, image augmentations, and labeled subsamples.

Computer vision, Data-Centric AI, Fine-Tuning, Foundation Models, NLP

Team Snorkel

January 13, 2023

Adapting language-based models beyond English

While a majority of Natural Language Processing (NLP) models focus on English, the real world requires solutions that work with languages across the globe. This demo shows how effectively users can build cross-language models in Snorkel Flow.

Data Development, Data Labeling, Data-Centric AI, Fine-Tuning, Foundation Models, LLMs, NLP

Anastassia Kornilova, April Guo

January 12, 2023

How Pixability uses foundation models to accelerate NLP application development by months

Using Snorkel Flow, Pixability has created a way to build classifiers for massive amounts of YouTube data quickly—that was previously out of reach.

Data Development, Data Labeling, Data-Centric AI, Foundation Models, NLP

Nick Harvey

January 11, 2023

Speech AI Demystified | FDCAI Lightning Talk

Sirisha Rella, Technical Product Marketing Manager at Nvidia, recently gave a Lightning Talk presentation on “demystifying” speech AI at Snorkel AI’s Future of Data-Centric AI virtual conference.

Data-Centric AI, Fine-Tuning, NLP, Product Releases

Team Snorkel

January 10, 2023

Snorkel AI to host Foundation Model Virtual Summit, registration now open

Snorkel AI will hold a free Foundation Model Virtual Summit on Tuesday, January 17 where speakers from across the technology industry, including some from Google and Stanford University, will discuss the enterprise use of Foundation Models.

Data-Centric AI, Fine-Tuning, Foundation Models, NLP, Partners

Team Snorkel

January 5, 2023

Demo: Using Snorkel Flow to train Microsoft Azure Form Recognizer models

Snorkel Flow debuts a new integration with Microsoft Azure Form Recognizer to help organizations leverage Azure AI services.

Computer vision, Data-Centric AI, Fine-Tuning, Partners, Product Releases

Team Snorkel

January 5, 2023

Ask Me Anything approach bolsters foundation models

Researcher Simran Arora tells Snorkel CEO Alex Ratner how she improved foundation model effectiveness by using “Ask Me Anything”-style questions.

Data-Centric AI, Evaluation, Fine-Tuning, Foundation Models, NLP

Team Snorkel

January 4, 2023

Snorkel Flow 2022 year-end release roundup

See what’s in our latest Snorkel Flow release and how we’re accelerating data-centric AI development further.

Data Development, Data Labeling, Data-Centric AI, Foundation Models, LLMs, NLP, Product Releases

Aparna Lakshmiratan

January 3, 2023

Combining human and artificial intelligence with human-in-the-loop ML | FDCAI

More components in an ML lifecycle are designed to run on autopilot, but some tasks require human-in-the-loop ML, an active research topic that has seen an increasing number of publications in the last 10 years.

Annotation, Computer vision, Data-Centric AI, Evaluation, NLP

Team Snorkel

December 28, 2022

How a top 3 US bank used Snorkel Flow to automate 10-K review for their analysts

A central innovation team at a top US bank wanted to modernize its AI development and data annotation processes in order to create a custom natural language processing (NLP) model that could extract important financial information from 10-Ks. Manually reviewing these documents was taking up valuable time that could be better spent assisting customers. The team used Snorkel Flow’s data-centric AI development process and programmatic labeling to train a customized NLP model that could accurately extract information on interest rate swaps.

Annotation, Banking & Finance, Data Labeling, Data-Centric AI, NLP

Nick Harvey

December 23, 2022

How programmatic labeling can minimize data exposure

MIT’s Technology Review reported this week that workers in Venezuela contracted by outsourced data annotation services provider shared customer data—low-angled pictures intended to be labeled, including one that featured a woman in a private moment in the bathroom—with each other on social media. Programmatic labeling could have minimized this.

Annotation, Data Labeling, Data-Centric AI

Devang Sachdev

December 21, 2022

How Georgetown University’s CSET uses Snorkel Flow to build NLP applications to inform policy research

Georgetown University’s CSET is building next-generation NLP applications using Snorkel Flow to classify complex research documents. Snorkel Flow drastically reduced labeling, model training, and iteration time and better equipped CSET’s data science team to collaborate closely with analysts to gather, process, and interpret data at scale.

Data Development, Data Labeling, Data-Centric AI, NLP, Partners

Nick Harvey

December 19, 2022

Seven research papers push foundation model boundaries

The recent debut of ChatGPT astounded the public with the power and speed of foundation models, but their enterprise use remains hampered by adaptation and deployment challenges. In the past year, Snorkel AI has researched several ways to overcome those challenges.

Data-Centric AI, Foundation Models, NLP

Matt Casey

December 15, 2022

Snorkel AI Partners with Advanced Analytics Consultancy Aimpoint Digital

Snorkel AI is delighted to announce a partnership with Aimpoint Digital, a premier analytics firm specializing in AI application development that builds, operationalizes, and scales data science solutions for biopharma, manufacturing, retail, and other major industries. Aimpoint Digital leads the industry in solving complex challenges and exploiting value-generating opportunities for organizations of all sizes through data. The company helps clients…

Data Development, Data Labeling, Data-Centric AI, Foundation Models, MLOps, NLP, Partners

Friea Berg

December 12, 2022

Supercharge data scientist and domain expert collaboration with Comments and Tags in Snorkel Flow

Labeling data manually can be a grind. Snorkel Flow slashes labeling time from months to minutes by allowing data scientists and domain experts collaborate through labeling functions. Snorkel Flow offers two unique capabilities that further supercharge that collaboration: Comments and Tags.

Annotation, Data Labeling, Data-Centric AI

Marty Moesta

December 9, 2022

Snorkel AI Team presents research at NeurIPS 2022

The Snorkel AI team will present five research papers advancing weak supervision and programmatic labeling at the NeurIPS 2022 conference that started this week.

Data Labeling, Data-Centric AI, Evaluation, Foundation Models, NLP

Team Snorkel

November 29, 2022

Deepening Snorkel AI’s partnership with Microsoft Azure AI

Snorkel AI is excited to build on our partnership with Microsoft Azure to help enterprises and government agencies solve their most impactful problems and unlock value from their data using AI. Learn how Azure customers can easily deploy Snorkel Flow on their Azure cloud infrastructure to accelerate AI application development with data-centric workflows and programmatic labeling.

Data Labeling, Data-Centric AI, Fine-Tuning, Foundation Models, NLP, Partners, Product Releases

Henry Ehrenberg

November 22, 2022

Data-centric Foundation Model Development: Bridging the gap between foundation models and enterprise AI

Introducing new capabilities for Data-centric Foundation Model Development in Snorkel Flow Powerful new large language or foundation models (FMs) like GPT-3, Stable Diffusion, BERT, and more have taken the AI space by storm, going viral—even beyond technical practitioners—thanks to incredible capabilities around text generation, image synthesis, and more. However, enterprises face fundamental barriers to using these foundation models on real,…

Data Development, Data Labeling, Data-Centric AI, Fine-Tuning, Foundation Models, NLP, Product Releases

Alex Ratner

November 17, 2022

Better not bigger: How to get GPT-3 quality at 0.1% the cost

We created Data-centric Foundation Model Development to bridge the gaps between foundation models and enterprise AI. New Snorkel Flow capabilities (Foundation Model Fine-tuning, Warm Start, and Prompt Builder) give data science and machine learning teams the tools they need to effectively put foundation models (FMs) to use for performance-critical enterprise use cases. The need is clear: despite undeniable excitement about…

Data Development, Data-Centric AI, Fine-Tuning, Foundation Models, NLP, Product Releases

Stephen Bach, Jason Fries, Braden Hancock

November 17, 2022

What can Data-Centric AI learn from data & ML engineering?

Databricks’ Chief Technologist: Data-Centric AI can learn from Data Engineering and ML Engineering in five ways: continuous updates, versioning, code-centric deployment, data privatization and actionable monitoring.

Annotation, Data Development, Data-Centric AI, Evaluation, MLOps

Team Snorkel

November 5, 2022

Building an NLP application to analyze ESG factors in Earnings Calls using Snorkel Flow

Create a data-centric AI application using Snorkel Flow to save your analysts time of manual labeling and information extraction related to environmental, social, and governance (ESG) factors from earnings call transcripts. Rapidly and accurately extract all existing and new factors from the transcripts to make the right investment decision.

Annotation, Data Labeling, Data-Centric AI, Evaluation, MLOps, NLP

Amir Imani

November 3, 2022

Building Trustworthy AI applications with data-centric AI

AI is generally accepted as necessary for organizations across private and public sectors to build (or maintain) a competitive advantage. However, a major challenge to adopting AI successfully is our ability to build reliable, predictable, and equitable solutions. A critical flaw with traditional approaches to developing AI is the reliance on hand-labeled training datasets and/or “pre-trained” black-box models that are effectively ungovernable and unauditable. In this article, we explore the motivations and challenges for Trustworthy AI that we’ve encountered and discuss how core tenants of Data-Centric AI, including programmatic labeling, help ameliorate them.

Data Development, Data Labeling, Data-Centric AI, Evaluation, Foundation Models

Arjun Prakash

October 4, 2022

Top-10 US bank uses AI/ML to triage loan documents based on risk exposure

To meet the requirements of unexpected regulatory changes brought on by the pandemic, a top-10 US bank needed to urgently adapt its underperforming model-centric artificial intelligence and machine learning development approach to a data-centric one. The team used Snorkel Flow to automatically classify thousands of loan documents and extract critical clauses in just 24 hours, saving loan managers thousands of hours of manual document review.

Banking & Finance, Data Development, Data Labeling, Data-Centric AI, NLP

Nick Harvey

September 30, 2022

How Schlumberger uses Snorkel Flow to enhance proactive well management

Schlumberger is the world’s leading provider of technology and services for the energy industry, operating in over 120 countries. The company provides well maintenance and analytics services to the world’s biggest oil companies, and it believes that large-scale data analysis and artificial intelligence/machine learning will help them remain a leader in the market. One way they’ve been able to achieve this is by building their own AI application using Snorkel Flow to automatically extract geological entities and critical field data across a variety of document structures and report types they receive from their customers.

Data Development, Data Labeling, Data-Centric AI, NLP

Nick Harvey

September 30, 2022

Improving upon Precision, Recall, and F1 with Gain metrics

This blog post introduces variants of Precision, Recall, and F1 metrics called Precision Gain, Recall Gain, and F1 Gain. The gain variants have desirable properties such as meaningful linear interpolation of PR curves and a universal baseline across tasks. This post explains what these benefits mean for you, how the gain metrics are calculated and outline some examples for intuitive comparison.

Data-Centric AI, Evaluation, NLP

Bradley Fowler

September 8, 2022

Summer 2022 Snorkel Flow release roundup

On the heels of the second annual Future of Data-Centric AI event, we’re energized by what we learned from data scientists, machine learning engineers, and AI leaders who are adopting data-centric approaches to accelerate AI success. The Snorkel Flow platform provides these teams with a seamless workflow across training data creation, model training, and analysis—the scaffolding to make data-centric AI…

Annotation, Data Labeling, Data-Centric AI, Evaluation, MLOps, Product Releases

Molly Friederich

August 30, 2022

Introducing Continuous Model Feedback to drive rapid data quality improvement

Continuous Model Feedback, available in beta as part of the new Studio experience, is Snorkel Flow’s latest capabilities to make training data creation and model development more integrated, automated, and guided.

Data Development, Data Labeling, Data-Centric AI, MLOps, Product Releases

Molly Friederich

August 29, 2022

Latest posts

Join our newsletter for expert advice, the latest research, and exclusive events.

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance