Snorkel AI + Databricks

Build better AI with a data-centric approach.
Efficiently transform unstructured data in your Databricks Data Intelligence Platform into custom ML and GenAI applications.


Build and deploy custom AI faster using Snorkel AI and Databricks

Accelerate production-ready AI with an efficient, end-to-end workflow using Snorkel to curate the proprietary data that powers AI and ML solutions built, deployed, and monitored by Databricks Mosaic AI.

Enrich your data with your expertise

Instantly access unstructured data in Databricks via Snorkel Flow, then programmatically curate data with your specialized knowledge to meet your unique business requirements.

Build AI that speaks your language

Enhance Mosaic AI model development capabilities by developing data in Snorkel Flow to adapt and fine-tune models, fix RAG retrieval errors, and build custom LLM benchmarks.

Manage model lifecycle at scale

Register MLflow models and datasets adapted in Snorkel Flow with Databricks Unity Catalog and capitalize on Databricks lineage, quality, control and data privacy capabilities.

Integration Highlights

Access data with a few clicks

  • Use the native Snorkel Flow connector to seamlessly access data unified in the Databricks Data Intelligence Platform

Create, tune, and evaluate production-quality 
AI faster

  • Efficiently label, filter, slice, sample, augment unstructured data using Snorkel Flow
  • Use Mosaic AI MPT LLMs for AI-powered data curation in Snorkel Flow
  • Fine-tune and align Mosaic AI models with training datasets tuned with Snorkel Flow
  • Adapt and distill customized, domain-specific MLflow models in Snorkel Flow

Efficiently deploy, monitor, and manage models at scale

  • Automatically register MLflow models adapted with Snorkel Flow with Unity Catalog
  • Access and deploy MLflow models adapted with Snorkel Flow via Catalog Explorer
  • Seamlessly integrate Databricks evaluation workflows & metrics to build custom, fine-grained benchmarks
in Snorkel