Product

Snorkel Flow 2023.R4: enhanced UI + PDF and Databricks tools

January 9, 2024
4 min read

We’re thrilled to announce the release of Snorkel Flow 2023.R4, a continuation of our commitment to creating a robust AI data development platform that empowers enterprises to accelerate custom AI data breakthroughs by 100x. This release introduces new features and enhancements designed to streamline processes and boost performance for even the most challenging scenarios.

Before we delve into the details, here’s a quick tl;dr of what is included in this latest release:

  • New Unified Prompting UI and RAG: Enhanced interface for a more intuitive user experience.
  • Advanced PDF Annotation: Simplifies labeling and boosts efficiency for large documents.
  • Databricks MLflow Deployment Integration: new streamlined deployment of machine learning models to the Databricks MLflow Model Registry
  • Performance and usability improvements: Snorkel Flow Studio and dataset loads 2x faster.

Enhanced PDF capabilities

Working with PDF data often involves annotating entire documents, not just individual tokens—especially when dealing with large files or sparse entities.

Using Snorkel Flow 2023.R4, users can now quickly and effortlessly sample specific documents at the document level using document IDs. This new feature simplifies the data annotation process, allowing for a more targeted approach rather than sampling annotation data by spans.

Whether you’re handling large PDFs or just focused on individual unique structures (e.g. tables, diagrams, etc.) these improvements enhance your overall experience and efficiency.  

New enhanced Unified Prompting UI and RAG capabilities

Our new unified prompting interface enables users to construct more freeform prompts, giving you the flexibility and control needed to programmatically operate on your data. Now, you can prompt foundation models for multi-label classification, and batch processes by selecting the batch size, and run the prompt on a subset before scaling to your entire dataset.

This will not only enhance efficiency across all your Snorkel Flow applications but also provide a safer, more controlled environment for testing and iteration.

Additionally, we’ve implemented an all-new RAG integration (Alpha)  into the prompting workflow. RAG is becoming a market standard for enhancing prompting workflows to improve the accuracy of generative AI outputs. If you want to see this functionality in action, sign up for a demo.

Configurable multi-label annotation defaults

Consistent labels are vital for any AI/ML project; without them, model performance suffers. This is especially important when multiple teams work on the same data.

Without a shared understanding of what each label means, team members might misinterpret the data, causing errors in the dataset and any insights derived from it. To prevent this, 2023.R4 will have the option for users to configure multi-label annotation defaults to match individual team preferences.

Image4

Databricks MLflow deployment integration

Snorkel Flow is purposefully engineered to seamlessly integrate into an enterprise’s existing MLOps workflow. We continually work to make it as fast and as easy as possible for our customers to deploy what they’ve built in Snorkel Flow into production.

As of the 2023.R4 release, we’re thrilled to expand our existing Databricks support to incorporate the Databricks MLflow Model Registry. Using this new native integration enables users to deploy their machine learning models directly to the Databricks MLflow Model Registry, streamlining the deployment process and enhancing efficiency.

Image3

Simplified onboarding with enhanced documentation

In this release, we’ve dedicated considerable effort to simplifying and streamlining the critical initial phases of AI development.

Our SDK now boasts improved documentation for both built-in and custom operators, complemented by an intuitive interface for easy retrieval of node data and metadata. These enhancements are designed to make the onboarding process as smooth and efficient as possible while reducing the effort needed to wrangle large amounts of data.

Continuous improvements to enterprise performance stability

In the 2023.R4 release, we’ve implemented numerous enhancements to bolster the overall performance and stability of the platform. These improvements are designed to benefit a diverse range of deployments and infrastructures. Notably, Snorkel Flow Studio and datasets now load in half the time, making the experience significantly faster. The Studio data viewer has been upgraded for a more instantaneous interaction. Additionally, we’ve expanded our sequence tagging support from 10 to 25 classes, broadening the scope and capabilities of our platform to meet your complex needs.

Image2

And that wraps up the 2023.R4 Snorkel Flow release. Until the next one!

Learn More

Follow Snorkel AI on LinkedInTwitter, and YouTube to be the first to see new posts and videos!

Share this article
Nick Harvey author profile
Nick Harvey
Director of Product Marketing

Recommended articles

View all articles
agents-last-exam-thumbnail
Agents’ Last Exam: AI Benchmarking for Real Work
At our latest Snorkel AI Reading Group, Yiyou Sun and David (Xinyang) Han (UC Berkeley, Center for Responsible and Decentralized Intelligence) presented Agents’ Last Exam (ALE) — a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. ALE is a collaboration between Berkeley RDI, Snorkel AI, and 300+ expert contributors across 55 professional subfields. ALE asks a deceptively simple question: can
June 30, 2026
Snorkel Team
continual-learning-bench-featured-image
Continual learning and evaluating how AI agents learn across sequences of tasks
Most agent benchmarks evaluate each task as an independent episode. The agent receives a task, produces an answer, gets scored, and moves on. The next task starts as if the previous one never happened. That setup misses a core requirement for deployed agents. A coding agent, research assistant, data analyst, or workplace assistant should improve as it works across repeated
June 29, 2026
Chris Glaze
Image
Benchtalks #3: We taught AI everything except how to learn
For our third Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with Parth Asawa, a PhD student at UC Berkeley advised by Matei Zaharia and Joey Gonzalez. Parth leads research on continual learning and is the creator of Continual Learning Bench, developed in collaboration
June 25, 2026
Vincent Sunn Chen
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.