Product

Explore the new GenAI Evaluation Suite: Snorkel 2024.R3

October 9, 2024
3 min read

Over the last two years, we’ve been obsessive in our commitment to help our customers get Generative AI (GenAI) into production. We’ve delivered some exciting results, and along the way have built powerful workflows into our product.

Just like traditional machine learning (ML) development, data drives enterprise generative AI development. We’re excited to share the latest product updates to our data-centric workflows for generative AI.

Learn more about:

Snorkel’s GenAI Evaluation Suite is available in public preview

Our customers are moving beyond “vibe checks” and using Snorkel’s new Generative AI Evaluation Suite to ensure their pipelines are ready for production.

We built our Evaluation Suite on three core principles:

GenAI evaluation needs to be specialized in an efficient and flexible way

Snorkel allows users to rapidly onboard and evaluate their data. The platform lets users measure adherence to criteria using out-of-the-box or customized criteria and a mix of ground truth-based and evergreen auto-evaluators. Users can also compare the results of multiple experiments side by side.

GenAI evaluation needs to be fine-grained

In Snorkel, users can now programmatically slice their data via slicing functions. Data slices allow users to identify high-priority subsets of inputs like:

  • Question topics
  • Different languages
  • Specific customer scenarios
  • Jailbreak attempts

GenAI evaluation must enable users to find and fix errors in one platform, in an iterative and programmatic way

AI developers need a platform that transitions them from evaluation to development in a single click. Evaluation dashboards give you direct insight into data errors, and allow users to launch workflows in Snorkel to directly address those errors.

[Screenshot of data hotspot modal]

To complement the new Evaluation Suite, we’ve developed a cookbook, available in our documentation. This provides a guided evaluation experience for your AI teams. If you’re interested in learning how Snorkel’s Evaluation Suite could help your team, please reach out at [insert best way to contact us].

Key enhancements to the LLM fine-tuning workflow

After working closely with our public preview customers, we are thrilled to announce that key improvements to our LLM fine-tuning workflow! As a reminder, our fine-tuning workflow follows five distinct steps:

This release offers core stability improvements, as well as new tooling to enhance your data development efforts when crafting your fine-tuning training sets.

  • AI developers can now safely connect to their LLM providers and leverage freeform prompting in the SDK.
  • Snorkel now supports synthetic data generation techniques in the SDK to address issues with sparse or missing data.
  • We’ve added key improvements to logging and performance with fine-tuning providers, notably Amazon SageMaker.

Snorkel offers guided workflows for fine-tuning and alignment via our LLM-fine tuning and alignment cookbook, available in our documentation.

Key improvements to Generative AI annotation workflows

Our annotation work enables data scientists to seamlessly collaborate with subject matter experts (SMEs) to scale their expertise. To help SMEs share their feedback, we introduced two new views for our annotation studio:

  1. Single response view: SMEs can annotate, per their defined label schema, the LLM response and individual pieces of context used to generate the response.
  2. Ranking view: This view enables SMEs to rank different responses to help create a preference dataset.

Ready to accelerate AI development?

Deploy production AI and ML applications 10-100x faster with Snorkel’s experts, using our proprietary technology.

Request a demo

Share this article
Image
Marty Moesta
Lead Product Manager, Generative AI

Marty Moesta is the lead product manager for Snorkel’s Generative AI products and services, before that, Marty was part of the founding go to market team here at Snorkel, focusing on success management and field engineering with fortune 100 strategic customers across financial services, insurance and health care. Prior to Snorkel, Marty was a Director of Technical Product Management at Tanium.

Recommended articles

View all articles
agents-last-exam-thumbnail
Agents’ Last Exam: AI Benchmarking for Real Work
At our latest Snorkel AI Reading Group, Yiyou Sun and David (Xinyang) Han (UC Berkeley, Center for Responsible and Decentralized Intelligence) presented Agents’ Last Exam (ALE) — a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. ALE is a collaboration between Berkeley RDI, Snorkel AI, and 300+ expert contributors across 55 professional subfields. ALE asks a deceptively simple question: can
June 30, 2026
Snorkel Team
continual-learning-bench-featured-image
Continual learning and evaluating how AI agents learn across sequences of tasks
Most agent benchmarks evaluate each task as an independent episode. The agent receives a task, produces an answer, gets scored, and moves on. The next task starts as if the previous one never happened. That setup misses a core requirement for deployed agents. A coding agent, research assistant, data analyst, or workplace assistant should improve as it works across repeated
June 29, 2026
Chris Glaze
Image
Benchtalks #3: We taught AI everything except how to learn
For our third Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with Parth Asawa, a PhD student at UC Berkeley advised by Matei Zaharia and Joey Gonzalez. Parth leads research on continual learning and is the creator of Continual Learning Bench, developed in collaboration
June 25, 2026
Vincent Sunn Chen
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.