Advancing Snorkel from research to production

Snorkel Research Project


The Snorkel AI founding team started the Snorkel Research Project at Stanford AI Lab in 2015, where we set out to explore a higher-level interface to machine learning through training data. This project was sponsored by Google, Intel, DARPA, and several other leading organizations and the research was represented in over 40 academic conferences such as ACL, NeurIPS, Nature and more.
Snorkel Open Source Research Library was primarily developed from 2015 to 2017 as a prototyping tool. Unlike Snorkel Flow, it is not a comprehensive platform for AI development. It is a Python library that contains a legacy base class for defining code-based Labeling Functions (LFs) and some early algorithms for combining LF votes.

Image

Snorkel Research Project

Snorkel Flow


Snorkel Flow is a platform built by the original creators of the Snorkel Research Project, incorporating years of experience from applying weak supervision and programmatic labeling concepts to real-world ML problems.

In Snorkel Flow, users can label and manage data programmatically, train models and identify model error modes to iteratively improve them in a rapid, data-centric workflow, using both SDK and no-code interfaces.

This shortens the development cycle and improves application quality significantly while also making it easier to manage bias and adapt to changes in production data or business objectives.

Snorkel Flow is used by some of the world’s most advanced organizations in banking, insurance, biotech, telecommunications, and several government agencies.

Image
Image
Snorkel Flow: a data-centric AI platform

Feature evolution

Snorkel Flow is an enterprise-grade platform built to make the core concepts of the Snorkel Research Project and data-centric AI practical for the enterprise. With Snorkel Flow, enterprises build and deploy accurate and adaptable AI applications rapidly.

ImageImageImage

Snorkel Research Project
Programmatic labeling
Feature
Snorkel OSS
SRP
Snorkel Flow
SF
Data scientists write Labeling Functions (LF) in Python code
Data scientists and domain expert users create LFs in a no-code, push-button UI
UI-based analysis, feedback, and suggesting to guide iterative LF development
Auto-suggest and auto-tuning of LFs
Built-in interactive data visualization with support for building LFs by drawing directly on data plots
Automated management and versioning of LFs
Training dataset management
Feature
Snorkel OSS
SRP
Snorkel Flow
SF
Basic algorithms for denoising and combining LF outputs
Advanced algorithms and automated tuning for denoising and combining LF outputs, including correlation analysis
One-click to execute LFs with automated parallelization and label model optimization
Automated management and versioning of training datasets
Model training and analysis
Feature
Snorkel OSS
SRP
Snorkel Flow
SF
Train custom models using Python
One-click to train and tune pre-configured, state-of-the-art models via the built-in model zoo
One-click to execute LFs with automated parallelization and label model optimization
Auto-generated UI-based model analysis with suggestions for model and LF improvement
Automated management and versioning of models
Application and model serving
Feature
Snorkel OSS
SRP
Snorkel Flow
SF

Train custom models using Python

One-click endpoint creation for model/application serving
One-click model/application export for serving at scale
Deployment and security
Feature
Snorkel OSS
SRP
Snorkel Flow
SF
REST API, monitoring services, and managed workers for job execution
Snorkel AI-hosted and managed hybrid cloud (AWS) deployment
Support for distributed deployment via Kubernetes
Encryption (in-transit and at-rest), authentication, and role-based access control (RBAC)
Managed SSO integration with SAML 2.0 support
Training and support
Feature
Snorkel OSS
SRP
Snorkel Flow
SF
Enterprise training and support from Snorkel AI engineers
Image

The Future of

Data-Centric AI


August 3-4, 2022 | Virtual

Register now