Product

How to import Databricks data into Snorkel Flow

June 2, 2023
4 min read

Databricks customers can now access millions of rows of data seamlessly within the Snorkel Flow platform thanks to a new Databricks connector. With a few clicks, users have access to massive amounts of Databricks data and can use Snorkel Flow’s data-centric approach to develop, fine-tune, and adapt ML models of all sizes—including multi-billion parameter foundation models—using their own proprietary data and subject matter expertise.

The new Databricks connector adds to Snorkel Flow’s suite of third-party data connectors, making data stored in external repositories like Databricks quickly and easily accessible for AI application development.

We’re excited to announce this new connector in conjunction with our upcoming The Future of Data-Centric AI virtual event. On June 7, the first day of the conference, Databricks Chief Technologist and Co-founder Matei Zaharia will discuss “Making LLM Applications Production Grade” at 1:30 PM PDT.

Weeks later, on June 29, Snorkel AI Founding Engineer and Product Director Vincent Chen will present at “Building AI-Powered Products with Foundation Models” at the Databricks Data + AI Summit. Both events will benefit the AI and ML community and continue to advance the conversation around this exciting technology.

Untitled

Data-centric AI development with Snorkel Flow

One of the most painstaking and time-consuming issues with developing AI applications is the process of curating and labeling unstructured data. Snorkel AI solves this bottleneck with Snorkel Flow, a novel data-centric AI platform.

Data science and machine learning teams use Snorkel Flow’s programmatic labeling to intelligently capture knowledge from various sources—such as previously labeled data (even when imperfect), heuristics from subject matter experts, business logic, and even the latest foundation models and large language models—and then scale this knowledge to label large quantities of data.

As users integrate more sources of knowledge, the platform enables them to rapidly improve training data quality and model performance using integrated error analysis tools.

Snorkel Flow + Databricks

Snorkel is further streamlining the machine learning development process for organizations that rely on Databricks with the new Databricks SQL connector built directly into the platform interface. This connector makes clients’ Databricks data accessible to Snorkel Flow with just a few clicks.

Untitled

Here’s how it works:

  • Select “Databricks SQL”’ as a data source when creating a new dataset in Snorkel Flow.
  • Enter Databricks SQL connection details and credentials. To make sure sensitive credentials are never exposed, all credentials are encrypted end-to-end.
  • Use SQL queries to access relevant data, select splits, and identify inconsistencies that may cause issues.
  • Select the unique identifier column or choose to have Snorkel Flow autogenerate one.
  • Snorkel Flow will then ingest the dataset, making it immediately referenceable throughout the platform.
  • Data can then be labeled programmatically using a data-centric AI workflow in Snorkel Flow to quickly generate high-quality training sets over complex, highly variable data. Snorkel Flow includes templates to classify and extract information from unstructured text, native PDFs, richly formatted documents, HTML data, conversational text, and more.
  • Newly labeled datasets can then be used to either train custom ML models or fine-tune pre-built models.

The new Databricks connector is currently in private preview and will be generally available soon.

Making it easier than ever to get value out of your data

The new Databricks connector joins our suite of third-party connectors in the Snorkel Flow platform. Each makes it easier for our customers to get their data onto the Snorkel Flow platform, where they can rapidly and iteratively build probabilistic training sets and construct valuable, deployable models faster.

To learn more about how Databricks and Snorkel can help your enterprise build and deploy powerful, valuable machine learning applications, join Snorkel AI at The Future of Data-Centric AI and Databricks at Databricks Data + AI Summit.

Ready to accelerate AI development?

Deploy production AI and ML applications 10-100x faster with Snorkel’s experts, using our proprietary technology.

Request a demo

Share this article
Image
Friea Berg
VP of Strategy

As VP of Strategy for Snorkel, Friea Berg leverages over a decade of channel experience to help the world’s most innovative enterprises realize the promise of AI using proprietary data. Friea joined Snorkel to build the startup’s channel strategy from the ground up. Under her leadership, Snorkel has built successful partnerships with Google, Microsoft, AWS, Databricks, Snowflake, and Hugging Face plus unlocked new routes-to-market via Marketplace and global resellers. Partners are now integral to every team at Snorkel, one of CRN’s 10 Hottest Data Science/ML Startups in 2022 and one of Forbes’s 50 most promising AI startups in the world in 2023.

Prior to diving into startups, Friea held leadership, alliance, and business development positions at Splunk, NetApp, and other technology leaders. At Splunk she built and scaled global strategic partnerships with Google, Cisco, and Palo Alto Networks. She also led a team that incubated first-of-a-kind ‘market maker’ partnerships with Deloitte, SAP, Cerner, Salesforce, and others.

Image
Hiromu Hota
Machine Learning Engineer

Hiromu Hota is a Staff Engineer at Snorkel AI, where he brings extensive expertise in applied machine learning as the Tech Lead Manager and Lead Machine Learning Engineer. Prior to Snorkel AI, he held roles as a Senior Researcher and Researcher at Hitachi, focusing on advanced research and development. Hiromu also serves as a Visiting Scholar at Stanford University’s School of Engineering, where he contributes to academic advancements in computational science and engineering.

With a background that includes software engineering at Hitachi Data Systems and internships, Hiromu holds a Ph.D. in Computational Science and Engineering and a Master of Engineering from Nagoya University, underscoring his deep technical knowledge and academic achievements.

Connect with Hiromu to discuss machine learning, computational science, or collaborative opportunities in applied research and engineering.

Recommended articles

View all articles
agentic-in-action
The Standard for Agents You Can Trust: Lessons from the Federal Front Lines
In the first installment of Agentic in Action — a series about real AI deployments, not demos — Snorkel AI’s Kevin Olivieri sat down with three people who have spent their careers where trust isn’t optional: Chris Sniffen, Federal Applied AI Lead at Snorkel AI; John Hickey, President of August Schell; and Mike Baca, CIO of August Schell. The conversation focused on
June 5, 2026
Snorkel Team
collab-gym-thumbnail
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
At our latest Snorkel AI Reading Group, Yijia Shao (Stanford NLP) stopped by our San Francisco office to present Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration. As LLM agents get better at automating tasks on their own, a large class of real-world problems still needs a human in the loop – for their preferences, their domain expertise, or simply for control.
June 4, 2026
Alexis Sobel
Image
Benchtalks #2: The future of coding benchmarks
For our second Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with John Yang, a Stanford PhD student and creator of the SWE-bench franchise, SWE-smith, CodeClash, and most recently ProgramBench. Highlights More on ProgramBench: See the benchmark and the upcoming leaderboard at programbench.com. More from John Yang: Publications and writing at john-b-yang.github.io. Snorkel
June 3, 2026
Vincent Sunn Chen
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.