Multi-Label Classification, Sequence Labeling, and More

Snorkel Flow LTS Release Summer ‘21

By adopting Snorkel Flow, a data-centric AI development platform powered by programmatic labeling, our customers have changed how they build and deploy AI applications. We’ve seen our customers save tens-of-millions of dollars in manual labeling costs and person-years of time by applying weak supervision with Snorkel Flow.

Over the last few months, we’ve been hard at work improving Snorkel Flow to support additional machine learning (ML) tasks types and data modalities. We’re excited to announce the following additions to Snorkel Flow:

  • Multi-Label Classification: Snorkel Flow now supports multi-label classification models.
  • Additional Data Modalities: We’ve rolled out templates to classify and extract information from native PDFs, richly formatted documents, HTML data, conversational text, and time series data.
  • Sequence Labeling: Snorkel Flow now supports labeling and training sequence labeling and transformer-style models for sequence tagging.

Multi-Label Classification

Multi-label classification for Snorkel Flow, the Enterprise AI data-centric platform

Multi-label classifiers allow you to assign one or more classes to a single data point. They’re commonly used for use cases like product categorization and content tagging. Unfortunately, it’s time-consuming and difficult to manually annotate training data for multi-label classifiers because there can be a large number of classes (dozens to more than 10,000!) with complex dependencies that need to be labeled and potentially modeled independently.

The complexity of creating labeled datasets by hand for multi-label classifiers makes it a perfect task type for programmatic labeling. Over the summer, we’ve added support to Snorkel Flow for building multi-label training sets and models. With our latest release, you can now:

  • Build labeling functions within Snorkel Flow that can assign multiple class labels to an individual datapoint.
  • Train and evaluate the performance of multi-label models inside Snorkel Flow.
  • Get feedback on how to improve your labeling functions on an overall model and per-class basis.

Additional Data Modalities

Additional data modalities for Snorkel Flow, the Enterprise AI data-centric platform

A few months ago, we added the ability to create complex AI applications in Snorkel Flow that span multiple steps and address a variety of ML tasks. Our initial focus was on text- and document-based use cases. However, these data modalities only represent a portion of the data that enterprises possess.

We want Snorkel Flow to be our customers’ first choice for building machine learning models with weak supervision, regardless of their data modality. To better do this, we’ve added the application templates for the following data modalities to Snorkel Flow:

  • Native PDFs
  • Richly Formatted Documents
  • HTML Data
  • Conversational Text
  • Time Series

You can use these application templates to build multi-stage AI applications that can perform pre and post-processing of data and combine models such as a classifier, an extractor, and a linker.

Sequence Labeling

Sequence labeling for Snorkel Flow, the Enterprise AI data-centric platform

Sequence labeling and transformer-style models have dramatically improved performance on named entity recognition (NER) and information extraction (IE) tasks. However, they come at the cost of needing more labeled data and increased annotation complexity. Instead of annotating entire documents, you need to annotate text sequences or tokens within documents. Fortunately, these models are a perfect fit for weak supervision!

With the latest release of Snorkel Flow, we’ve added support for:

  • Building label functions that support sequence labels
  • Training and tuning transformer-style sequence tagging ML models with your data
  • Evaluation and analysis of sequence tagging ML models

Other Features

We’ve also released these additional improvements to Snorkel Flow:

  • Support for Admission Roles: Adds the ability to designate a single sign-on (SSO) role for application access.
  • Additional IdP Support: Extends SSO support to IdPs that use OAuth and OIDC.
  • Data Peeking: You can now “peek” into datasets to get a preview of what the raw data looks like without having to open up a parquet file or a CSV.
  • Local Data Upload: Adds the ability to upload datasets directly from your local machine.

Request a Demo

Interested in finding out more about Snorkel Flow and how your organization can adopt data-centric AI? Request a demo to see how your organization can build and deploy AI applications to production faster and cheaper, while maintaining privacy and model quality.

Join Our Team

Interested in helping build Snorkel Flow? If you’re passionate about solving problems nearly every data science and developer team struggles with and want to shape the future of AI, we want to hear from you! We’re hiring for engineering, SRE, product, design, marketing, sales, solution engineering, and many other roles. Check out our careers page for more details.


Accelerate your AI application development today

Technology developed and deployed with the world’s leading organizations

Related articles

Artificial Intelligence (AI) Facts and Myths
Read more
PonderNet: Learning to Ponder by DeepMind
Read more
Design Principles for Iteratively Building AI Applications
Read more