Product

Explore the new GenAI Evaluation Suite: Snorkel 2024.R3

October 9, 2024
3 min read

Over the last two years, we’ve been obsessive in our commitment to help our customers get Generative AI (GenAI) into production. We’ve delivered some exciting results, and along the way have built powerful workflows into our product.

Just like traditional machine learning (ML) development, data drives enterprise generative AI development. We’re excited to share the latest product updates to our data-centric workflows for generative AI.

Learn more about:

Snorkel’s GenAI Evaluation Suite is available in public preview

Our customers are moving beyond “vibe checks” and using Snorkel’s new Generative AI Evaluation Suite to ensure their pipelines are ready for production.

We built our Evaluation Suite on three core principles:

GenAI evaluation needs to be specialized in an efficient and flexible way

Snorkel allows users to rapidly onboard and evaluate their data. The platform lets users measure adherence to criteria using out-of-the-box or customized criteria and a mix of ground truth-based and evergreen auto-evaluators. Users can also compare the results of multiple experiments side by side.

GenAI evaluation needs to be fine-grained

In Snorkel, users can now programmatically slice their data via slicing functions. Data slices allow users to identify high-priority subsets of inputs like:

  • Question topics
  • Different languages
  • Specific customer scenarios
  • Jailbreak attempts

GenAI evaluation must enable users to find and fix errors in one platform, in an iterative and programmatic way

AI developers need a platform that transitions them from evaluation to development in a single click. Evaluation dashboards give you direct insight into data errors, and allow users to launch workflows in Snorkel to directly address those errors.

[Screenshot of data hotspot modal]

To complement the new Evaluation Suite, we’ve developed a cookbook, available in our documentation. This provides a guided evaluation experience for your AI teams. If you’re interested in learning how Snorkel’s Evaluation Suite could help your team, please reach out at [insert best way to contact us].

Key enhancements to the LLM fine-tuning workflow

After working closely with our public preview customers, we are thrilled to announce that key improvements to our LLM fine-tuning workflow! As a reminder, our fine-tuning workflow follows five distinct steps:

This release offers core stability improvements, as well as new tooling to enhance your data development efforts when crafting your fine-tuning training sets.

  • AI developers can now safely connect to their LLM providers and leverage freeform prompting in the SDK.
  • Snorkel now supports synthetic data generation techniques in the SDK to address issues with sparse or missing data.
  • We’ve added key improvements to logging and performance with fine-tuning providers, notably Amazon SageMaker.

Snorkel offers guided workflows for fine-tuning and alignment via our LLM-fine tuning and alignment cookbook, available in our documentation.

Key improvements to Generative AI annotation workflows

Our annotation work enables data scientists to seamlessly collaborate with subject matter experts (SMEs) to scale their expertise. To help SMEs share their feedback, we introduced two new views for our annotation studio:

  1. Single response view: SMEs can annotate, per their defined label schema, the LLM response and individual pieces of context used to generate the response.
  2. Ranking view: This view enables SMEs to rank different responses to help create a preference dataset.

Ready to accelerate AI development?

Deploy production AI and ML applications 10-100x faster with Snorkel’s experts, using our proprietary technology.

Request a demo

Share this article
Image
Marty Moesta
Lead Product Manager, Generative AI

Marty Moesta is the lead product manager for Snorkel’s Generative AI products and services, before that, Marty was part of the founding go to market team here at Snorkel, focusing on success management and field engineering with fortune 100 strategic customers across financial services, insurance and health care. Prior to Snorkel, Marty was a Director of Technical Product Management at Tanium.

Recommended articles

View all articles
agentic-in-action
The Standard for Agents You Can Trust: Lessons from the Federal Front Lines
In the first installment of Agentic in Action — a series about real AI deployments, not demos — Snorkel AI’s Kevin Olivieri sat down with three people who have spent their careers where trust isn’t optional: Chris Sniffen, Federal Applied AI Lead at Snorkel AI; John Hickey, President of August Schell; and Mike Baca, CIO of August Schell. The conversation focused on
June 5, 2026
Snorkel Team
collab-gym-thumbnail
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
At our latest Snorkel AI Reading Group, Yijia Shao (Stanford NLP) stopped by our San Francisco office to present Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration. As LLM agents get better at automating tasks on their own, a large class of real-world problems still needs a human in the loop – for their preferences, their domain expertise, or simply for control.
June 4, 2026
Alexis Sobel
Image
Benchtalks #2: The future of coding benchmarks
For our second Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with John Yang, a Stanford PhD student and creator of the SWE-bench franchise, SWE-smith, CodeClash, and most recently ProgramBench. Highlights More on ProgramBench: See the benchmark and the upcoming leaderboard at programbench.com. More from John Yang: Publications and writing at john-b-yang.github.io. Snorkel
June 3, 2026
Vincent Sunn Chen
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.