Snorkel AI is thrilled to announce our partnership with Databricks and seamless end-to-end integration across the Databricks Data Intelligence Platform. 

This integration grants Snorkel Flow users access to data within Databricks with just a few clicks (as detailed here) while also facilitating the streamlined registration of custom, use-case-specific models to the Databricks Workspace Model Registry. 

The synergy between Snorkel and Databricks enables data scientists to navigate their entire machine learning pipeline—from data access to model deployment—all within Snorkel Flow. 

Closing the loop with end-to-end integration across the Databricks platform

Snorkel Flow integrates seamlessly into existing enterprise workflows. Snorkel offers a full suite of third-party data connectors, making data stored in popular cloud repositories like Databricks quickly and easily accessible for data-centric AI development with Snorkel Flow. 

The new Databricks Model Registry integration equips Snorkel Flow users to automatically register custom, use case-specific models trained in Snorkel Flow to the Databricks platform, which provides a unified service for deploying, governing, querying, and monitoring models.

The new Databricks Model Registry integration equips Snorkel Flow users to automatically register custom, use case-specific models trained in Snorkel Flow to the Databricks platform, which provides a unified service for deploying, governing, querying, and monitoring models.

Data-centric AI development with Snorkel Flow

One of the most painstaking and time-consuming issues with developing AI applications is the process of curating and labeling unstructured data. Snorkel AI eases this bottleneck with the Snorkel Flow AI data development platform.

Data science and machine learning teams use Snorkel Flow to intelligently capture knowledge from various sources—such as previously labeled data (even when imperfect), heuristics from subject matter experts, business logic, and even the latest foundation models and large language models—and then scale this knowledge to label large quantities of data.

As users integrate more sources of knowledge, the platform enables them to rapidly improve training data quality and model performance using integrated error analysis tools. Once they have completed the data labeling process, Snorkel Flow users can apply their labeled data to train predictive models or filter data for generative AI applications.

Snorkel Flow + Databricks Model Registry

Snorkel further streamlines the machine learning development process for organizations that rely on Databricks through a native integration with Databricks Model Registry built directly into the platform. After training, adapting, or distilling a model using the Snorkel Flow data development platform, users can easily register their custom, use case-specific models to the Databricks Workspace Model Registry with just a few clicks.

Here’s how it works:

  1. Register a new model registry for your Databricks workspace and access token.
  2. Fill out the experiment name in the format /Users/<your-username>/<experiment_name>, where <your-username> should be your Databricks username.
  3. Upon clicking the “Deploy” button, Snorkel Flow registers a model to your Databricks Workspace Model Registry.

Once users register a model to the Databricks Workspace Model Registry, they can deploy the model to the Databricks Model Serving or use it on a Spark cluster.

In an upcoming release, Snorkel will expand this integration to allow registering a model to the Databricks Unity Catalog.

Learn how to get more value from your PDF documents!

Transforming unstructured data such as text and documents into structured data is crucial for enterprise AI development. On December 17, we’ll hold a webinar that explains how to capture SME domain knowledge and use it to automate and scale PDF classification and information extraction tasks.

Sign up here!