New GenAI features, data annotation: Snorkel Flow 2024.R2
We’re excited to announce the release of Snorkel Flow 2024.R2. This release features two new GenAI product suites and the general availability of Multi-Schema Annotation—boosting SME efficiency and supporting complex use cases. Enterprise admins also gain secure and flexible foundation model access with integrations like Azure ML, Azure OpenAI, and AWS Sagemaker. Additionally, we’ve enhanced security features, added a new home page, and improved the user experience.
Learn more below.
This post was a joint effort. Product Manager Piyush Puri worked on the GenAI Annotations and Evaluations features, as well as the multi-schema annotations features. Product Manager Marty Moesta worked on the LLM Fine-tuning and alignment features. Product Manager Daniel Xu worked on enterprise readiness features. Product Manager Kristina Liapchin worked on the PDF and foundation model integration features. Product Manager Venkatesh Rao and Head of Partnerships Friea Berg worked on building native integrations with Amazon AI tools.
Two new GenAI product suites
We’re excited to announce the [Beta] launch of two new GenAI Snorkel product suites.
Generative AI Annotation & Evaluation [Beta]
The Snorkel Flow AI data development platform now supports large language model (LLM) evaluation by providing two tailored data viewers that make it easy for SMEs to annotate the output of LLM systems.
The first, Single Response Review, allows users to view their prompt, prompt prefix, response, and retrieved context items in a clean interface that enables evaluating if the error originated from retrieval or generation. Users can customize the label schemas they want to collect annotations for using the new multi-schema annotation capabilities.
The second, Response Ranking, enables users to view multiple LLM responses at once and rank them compared to each other. This approach includes access to the prompt, prompt prefix, and retrieved context items available in the UI.
Snorkel’s new Evaluation suite also supports aggregating domain expert annotations, LLM-as-judge and other hybrid approaches. View the quality of your outputs across fine-grained data slices in a single view
LLM Fine-tuning & Alignment [Beta]
Use Snorkel to programmatically curate a high quality, diverse training dataset that is passed to an LLM for fine-tuning. The pipeline returns generated responses to the Snorkel Flow platform for immediate response quality labeling, error analysis, and iterative development.
Snorkel supports a diverse set of LLM adaptation strategies including:
- Instruction tuning aka supervised fine-tuning (SFT)
- Alignment via direct preference optimization (DPO)
- With more advanced techniques like pre-training and Kahneman-Tversky Optimization (KTO) to soon follow!
Multi-Schema annotation ships in GA
We are excited to announce upgraded manual annotation capabilities within Snorkel Flow. Now, customers can get started with manual annotation in a far simpler workflow—independent of programmatic labeling.
This re-organization of annotation in our platform enables us to support annotation for multiple-schemas at once. This will unblock customers who seek to set up more complex manual annotation workflows—such as collecting classification and extraction annotations at the same time.
We are eager to invest more in manual annotation capabilities in the coming releases.
Secure and flexible foundation model access and integrations
We have now made a UI-based foundation model (FM) integration and external model management suite generally available in Snorkel Flow.
We recognize the complexity of the current SDK-only workflow and the need for a comprehensive FM overview. With the new release, admins can now gain a clear overview of all FM vendors, and then effortlessly set up FM integrations and manage external models directly within the UI.
This update also introduces new integrations with Azure ML, Azure OpenAI, and Amazon Sagemaker, allowing customers to leverage their preferred FM providers based on their business needs and existing relationships.
Enterprise Readiness Features
Snorkel will provide additional data governance and IAM features to help IT Admins manage their Snorkel Instance.
Admins can now:
- Restrict the ability for users to download data locally from Snorkel Flow
- Enable users to upload PDF and Image Files directly into Snorkel Flow without routing through the MinIO Console, making it easier to create PDF and Image applications as a result.
- Synchronize their entitlement and role information stored in Active Directory with Snorkel Flow through security assertion markup language (SAML) + OpenID Connect (OIDC), and secure sign-on (SSO) integrations.
- Set timeouts so that user sessions are automatically logged out after a period of inactivity
- Easily export support bundle logs within Kubernetes-based installations.
Additionally, Snorkel offers Managed virtual private cloud installation options on AWS and Azure alongside Snorkel Hosted , Private VPC, and on-prem deployments. By providing Snorkel limited privileged access to customer cloud accounts, IT admins can speed up initial infrastructure onboarding while also reducing support and management overhead throughout each upgrade and release cycle.
PDF support for Checkboxes and Tables
We have introduced out-of-the-box checkbox and table detection for PDF documents, empowering users to write labeling functions and enhance their data development on PDF.
Native integration with Amazon AI tools for LLM fine-tuning
Snorkel Flow users now have the ability to iteratively fine-tune LLMs in place. As part of a multi-year strategic collaboration with AWS announced earlier this year, Snorkel has released a new beta integration with Amazon SageMaker that enables AI teams to push datasets to SageMaker Jumpstart to iteratively prompt, fine-tune, and evaluate LLMs.
This delivers a complete end-to-end solution where AI developers can access unstructured data in AWS S3 buckets and rapidly curate high-quality, diverse training datasets using Snorkel Flow. They can configure and connect an open source base model via the Sagemaker Jumpstart SDK, and send a curated dataset from Snorkel to Jumpstart for in-place LLM fine-tuning.
Once models have been customized to reach production quality performance with Snorkel Flow, users can deploy a fine-tuned LLM or distilled, task-specific Small Language Model (SLM) to Amazon SageMaker, capitalizing on AWS’s broad set of capabilities enabling secure, private, responsible AI.
Thank you!
This release is available to all users of Snorkel Flow now. Our top priority is ensuring a smooth and seamless update process.
Thank you for your continued trust in Snorkel Flow to power your AI and data needs. We extend our gratitude to our beta participants who have been instrumental in refining this update. Looking ahead, Snorkel Flow has an exciting roadmap filled with innovative features and improvements designed to enhance your experience even further.
Learn how to get more value from your PDF documents!
Transforming unstructured data such as text and documents into structured data is crucial for enterprise AI development. On December 17, we’ll hold a webinar that explains how to capture SME domain knowledge and use it to automate and scale PDF classification and information extraction tasks.
Jennifer Lei is a Senior Product Manager at Snorkel, where she leads various document intelligence use cases. She has a background in driving cloud and AI projects through her product role at Microsoft Azure, complemented by strategic experience with Microsoft’s Corporate Strategy team and at McKinsey & Company.