Latest posts

How programmatic labeling can minimize data exposure

MIT’s Technology Review reported this week that workers in Venezuela contracted by outsourced data annotation services provider shared customer data—low-angled pictures intended to be labeled, including one that featured a woman in a private moment in the bathroom—with each other on social media. Programmatic labeling could have minimized this.

Devang Sachdev portrayed
December 21, 2022

How Georgetown University’s CSET uses Snorkel Flow to build NLP applications to inform policy research

Georgetown University’s CSET is building next-generation NLP applications using Snorkel Flow to classify complex research documents. Snorkel Flow drastically reduced labeling, model training, and iteration time and better equipped CSET’s data science team to collaborate closely with analysts to gather, process, and interpret data at scale. 

Nick Harvey author profile
December 19, 2022

Seven research papers push foundation model boundaries

The recent debut of ChatGPT astounded the public with the power and speed of foundation models, but their enterprise use remains hampered by adaptation and deployment challenges. In the past year, Snorkel AI has researched several ways to overcome those challenges. 

December 15, 2022

Snorkel AI Partners with Advanced Analytics Consultancy Aimpoint Digital

Snorkel AI is delighted to announce a partnership with Aimpoint Digital, a premier analytics firm specializing in AI application development that builds, operationalizes, and scales data science solutions for biopharma, manufacturing, retail, and other major industries. Aimpoint Digital leads the industry in solving complex challenges and exploiting value-generating opportunities for organizations of all sizes through data. The company helps clients…

December 12, 2022

Supercharge data scientist and domain expert collaboration with Comments and Tags in Snorkel Flow

Labeling data manually can be a grind. Snorkel Flow slashes labeling time from months to minutes by allowing data scientists and domain experts collaborate through labeling functions. Snorkel Flow offers two unique capabilities that further supercharge that collaboration: Comments and Tags.

December 9, 2022

Snorkel AI Team presents research at NeurIPS 2022

The Snorkel AI team will present five research papers advancing weak supervision and programmatic labeling at the NeurIPS 2022 conference that started this week.

Dr. Bubbles, Snorkel AI's mascot
November 29, 2022

Deepening Snorkel AI’s partnership with Microsoft Azure AI

Snorkel AI is excited to build on our partnership with Microsoft Azure to help enterprises and government agencies solve their most impactful problems and unlock value from their data using AI. Learn how Azure customers can easily deploy Snorkel Flow on their Azure cloud infrastructure to accelerate AI application development with data-centric workflows and programmatic labeling.

November 22, 2022

Data-centric Foundation Model Development: Bridging the gap between foundation models and enterprise AI

Introducing new capabilities for Data-centric Foundation Model Development in Snorkel Flow Powerful new large language or foundation models (FMs) like GPT-3, Stable Diffusion, BERT, and more have taken the AI space by storm, going viral—even beyond technical practitioners—thanks to incredible capabilities around text generation, image synthesis, and more. However, enterprises face fundamental barriers to using these foundation models on real,…

November 17, 2022

Better not bigger: How to get GPT-3 quality at 0.1% the cost

We created Data-centric Foundation Model Development to bridge the gaps between foundation models and enterprise AI. New Snorkel Flow capabilities (Foundation Model Fine-tuning, Warm Start, and Prompt Builder) give data science and machine learning teams the tools they need to effectively put foundation models (FMs) to use for performance-critical enterprise use cases. The need is clear: despite undeniable excitement about…

What can Data-Centric AI learn from data & ML engineering?

Databricks’ Chief Technologist: Data-Centric AI can learn from Data Engineering and ML Engineering in five ways: continuous updates, versioning, code-centric deployment, data privatization and actionable monitoring.

Dr. Bubbles, Snorkel AI's mascot
November 5, 2022

Building an NLP application to analyze ESG factors in Earnings Calls using Snorkel Flow

Create a data-centric AI application using Snorkel Flow to save your analysts time of manual labeling and information extraction related to environmental, social, and governance (ESG) factors from earnings call transcripts. Rapidly and accurately extract all existing and new factors from the transcripts to make the right investment decision.

November 3, 2022

Building Trustworthy AI applications with data-centric AI

AI is generally accepted as necessary for organizations across private and public sectors to build (or maintain) a competitive advantage. However, a major challenge to adopting AI successfully is our ability to build reliable, predictable, and equitable solutions. A critical flaw with traditional approaches to developing AI is the reliance on hand-labeled training datasets and/or “pre-trained” black-box models that are effectively ungovernable and unauditable. In this article, we explore the motivations and challenges for Trustworthy AI that we’ve encountered and discuss how core tenants of Data-Centric AI, including programmatic labeling, help ameliorate them.

October 4, 2022

Top-10 US bank uses AI/ML to triage loan documents based on risk exposure

To meet the requirements of unexpected regulatory changes brought on by the pandemic, a top-10 US bank needed to urgently adapt its underperforming model-centric artificial intelligence and machine learning development approach to a data-centric one. The team used Snorkel Flow to automatically classify thousands of loan documents and extract critical clauses in just 24 hours, saving loan managers thousands of hours of manual document review.

Nick Harvey author profile
September 30, 2022

How Schlumberger uses Snorkel Flow to enhance proactive well management

Schlumberger is the world’s leading provider of technology and services for the energy industry, operating in over 120 countries. The company provides well maintenance and analytics services to the world’s biggest oil companies, and it believes that large-scale data analysis and artificial intelligence/machine learning will help them remain a leader in the market. One way they’ve been able to achieve this is by building their own AI application using Snorkel Flow to automatically extract geological entities and critical field data across a variety of document structures and report types they receive from their customers.

Nick Harvey author profile
September 30, 2022

Improving upon Precision, Recall, and F1 with Gain metrics

This blog post introduces variants of Precision, Recall, and F1 metrics called Precision Gain, Recall Gain, and F1 Gain. The gain variants have desirable properties such as meaningful linear interpolation of PR curves and a universal baseline across tasks. This post explains what these benefits mean for you, how the gain metrics are calculated and outline some examples for intuitive comparison. 

September 8, 2022

Summer 2022 Snorkel Flow release roundup

On the heels of the second annual Future of Data-Centric AI event, we’re energized by what we learned from data scientists, machine learning engineers, and AI leaders who are adopting data-centric approaches to accelerate AI success. The Snorkel Flow platform provides these teams with a seamless workflow across training data creation, model training, and analysis—the scaffolding to make data-centric AI…

Molly Friederich portrayed
August 30, 2022

Introducing Continuous Model Feedback to drive rapid data quality improvement

Continuous Model Feedback, available in beta as part of the new Studio experience, is Snorkel Flow’s latest capabilities to make training data creation and model development more integrated, automated, and guided.

Molly Friederich portrayed
August 29, 2022

The Future of Data-Centric AI 2022 day 2 highlights

Snorkel AI just hosted the second day of The Future of Data-Centric AI conference 2022. Across 40+ sessions, 50+ Data scientists, ML engineers, and AI leaders came together to share insights, best practices, and research on adopting data-centric approaches with thousands of attendees from all around the world. Aarti Bagul, a Snorkel AI ML Solutions Engineer and one of the…

Louis Bouchard portrayed
August 5, 2022

The Future of Data-Centric AI 2022 day 1 highlights

Snorkel AI just hosted the first day of The Future of Data-Centric AI conference 2022. This conference brings together data scientists, ML engineers, and AI leaders to share insights, best practices, and research on how to evolve the ML lifecycle from model-centric to data-centric approaches. This conference takes place over two days with 40+ sessions, 50+ speakers, and thousands of…

Louis Bouchard portrayed
August 4, 2022

10-Ks information extraction case studies

Building NLP techniques to understand 10-Ks is time-consuming, costly, and challenging. In this post, Machine Learning Engineer, Aarti Bagul discusses three information extraction case studies on how banks around the world are building highly accurate NLP applications using Snorkel Flow’s AI platform. From retail banking to hedge fund investing, NLP is used across the financial industry. By processing and extracting…

Dr. Bubbles, Snorkel AI's mascot
July 6, 2022

Introducing Cluster View: Instant data insight made actionable to speed AI development

Programmatic labeling moves a classic technique from interesting to high-impact So much of real-world AI development entails working with text data that’s messy — in fact, 80%+ of enterprise data is unstructured. And while state-of-the-art models get a lot of the glory, creating the training data that conveys what your model needs to learn is more often the biggest determiner of AI…

Molly Friederich portrayed
June 30, 2022

Data-centric approaches to multi-label classification

AI systems are well-suited to tasks involving recognizing and predicting data patterns. Supervised classification systems categorize unseen data into a finite set of discrete classes by learning from millions of hand-labeled labeled sample points. These classifiers are powerful business tools – they automate document sorting, customer sentiment analysis, sales performance, and other distinct business problems. However, they also require an…

Kanyes Thaker portrayed
June 29, 2022

Data annotation guidelines and best practices

What is data annotation? Data annotation refers to the process of categorizing and labeling data for training datasets. This process plays a critical role in preparing data for machine learning models, as high-quality training data enables more accurate predictions and insights. In order for a training dataset to be usable, it must be categorized appropriately and annotated for a specific…

June 28, 2022

3 ways to use Snorkel’s Labeling Functions

Labeling functions are fundamental building blocks of programmatic labeling that encode diverse sources of weak labeling signals to produce high-quality labeled data at scale. Let’s start with the core motivation for labeling functions: over time, every major commercial organization and government agency builds various valuable, often bespoke knowledge resources. These resources include employee expertise, wikis and ontologies, business logic, and…

Nic Acton portrayed
June 24, 2022

Clinical entity classification in electronic health records

Research recap: Ontology-driven weak supervision for clinical entity classification in electronic health records (EHRs)  In this post, I have summarized the research published in this academic paper, Ontology-driven weak supervision for clinical entity classification in electronic health records by Jason Fries et al. This paper was published in Nature Communications in 2021.Problem statement Electronic health records (EHR) contain a rich…

Nazanin Makkinejad
June 17, 2022

Building AI models for financial document processing best practices

Highlighting the best practices for building and deploying AI models for financial document processing applications AI has massive potential in the financial industry. Building AI models to automate information extraction, fraud detection, and compliance monitoring can provide efficient and faster responses and support repurposing domain experts’ labor to more meaningful tasks. Developing AI models is not just about having models…

Hoang Tran portrayed.
June 15, 2022

The benefits of programmatic labeling for trustworthy AI

The following post is based on a talk discussing the benefits of programmatic labeling for trustworthy AI, which was presented as part of the Trustworthy AI: A Practical Roadmap for Government event that took place this past April, with Snorkel AI Co-founder and Head of Technology, Braden Hancock. If you would like to watch Braden’s presentation, we have included it…

Dr. Bubbles, Snorkel AI's mascot
June 9, 2022

Uncovering the unknowns of deep neural networks by Sharon Li

Learning about the challenges and opportunities behind deep neural networks  In this talk, Assistant Professor in Computer Science Sharon Li shares some exciting work about uncovering the unknowns of deep neural networks. She also shares some exciting challenges and opportunities in this domain. If you would like to watch Sharon’s presentation, we have included it below, or you can find…

Dr. Bubbles, Snorkel AI's mascot
June 8, 2022

Named entity extraction and recognition with Snorkel Flow

If you were ever amazed at how Google accurately finds the answer to your question just by a few keywords, you’ve witnessed the power of named entity recognition (NER). By quickly and accurately identifying different entities in a sea of unstructured articles, like names of people, places, and organizations, the search engine can figure out each article’s main topics and…

June 7, 2022

A data-centric perspective on trustworthy and interpretable AI

The future of data-centric AI talk series In this talk, Assistant Professor of Biomedical Data Science at Stanford University, James Zou, discusses the work he and his team have been doing from a data-centric perspective to trustworthy and interpretable AI. If you would like to watch James’ presentation, we have included it below, or you can find the entire event…

Dr. Bubbles, Snorkel AI's mascot
June 6, 2022
1 5 6 7 8 9 10