Databricks customers can now access millions of rows of data seamlessly within the Snorkel Flow platform thanks to a new Databricks connector. With a few clicks, users have access to massive amounts of Databricks data and can use Snorkel Flow’s data-centric approach to develop, fine-tune, and adapt ML models of all sizes—including multi-billion parameter foundation models—using their own proprietary data and subject matter expertise.

The new Databricks connector adds to Snorkel Flow’s suite of third-party data connectors, making data stored in external repositories like Databricks quickly and easily accessible for AI application development.

We’re excited to announce this new connector in conjunction with our upcoming The Future of Data-Centric AI virtual event. On June 7, the first day of the conference, Databricks Chief Technologist and Co-founder Matei Zaharia will discuss “Making LLM Applications Production Grade” at 1:30 PM PDT.

Weeks later, on June 29, Snorkel AI Founding Engineer and Product Director Vincent Chen will present at “Building AI-Powered Products with Foundation Models” at the Databricks Data + AI Summit. Both events will benefit the AI and ML community and continue to advance the conversation around this exciting technology.


Data-centric AI development with Snorkel Flow

One of the most painstaking and time-consuming issues with developing AI applications is the process of curating and labeling unstructured data. Snorkel AI solves this bottleneck with Snorkel Flow, a novel data-centric AI platform.

Data science and machine learning teams use Snorkel Flow’s programmatic labeling to intelligently capture knowledge from various sources—such as previously labeled data (even when imperfect), heuristics from subject matter experts, business logic, and even the latest foundation models and large language models—and then scale this knowledge to label large quantities of data.

As users integrate more sources of knowledge, the platform enables them to rapidly improve training data quality and model performance using integrated error analysis tools.

Snorkel Flow + Databricks

Snorkel is further streamlining the machine learning development process for organizations that rely on Databricks with the new Databricks SQL connector built directly into the platform interface. This connector makes clients’ Databricks data accessible to Snorkel Flow with just a few clicks.


Here’s how it works:

  • Select “Databricks SQL”’ as a data source when creating a new dataset in Snorkel Flow.
  • Enter Databricks SQL connection details and credentials. To make sure sensitive credentials are never exposed, all credentials are encrypted end-to-end.
  • Use SQL queries to access relevant data, select splits, and identify inconsistencies that may cause issues.
  • Select the unique identifier column or choose to have Snorkel Flow autogenerate one.
  • Snorkel Flow will then ingest the dataset, making it immediately referenceable throughout the platform.
  • Data can then be labeled programmatically using a data-centric AI workflow in Snorkel Flow to quickly generate high-quality training sets over complex, highly variable data. Snorkel Flow includes templates to classify and extract information from unstructured text, native PDFs, richly formatted documents, HTML data, conversational text, and more.
  • Newly labeled datasets can then be used to either train custom ML models or fine-tune pre-built models.

The new Databricks connector is currently in private preview and will be generally available soon.

Making it easier than ever to get value out of your data

The new Databricks connector joins our suite of third-party connectors in the Snorkel Flow platform. Each makes it easier for our customers to get their data onto the Snorkel Flow platform, where they can rapidly and iteratively build probabilistic training sets and construct valuable, deployable models faster.

To learn more about how Databricks and Snorkel can help your enterprise build and deploy powerful, valuable machine learning applications, join Snorkel AI at The Future of Data-Centric AI and Databricks at Databricks Data + AI Summit.

Learn more

If you'd like to learn how the Snorkel AI team can help you develop high-quality LLMs or deliver value to your organization from generative AI, contact us to get started. See what Snorkel can do to accelerate your data science and machine learning teams. Book a demo today.