Today we’re launching two new products on our AI Data Development Platform that together create a complete solution for enterprises to specialize AI systems with expert data at scale. We are also announcing our $100M Series D financing, led by Addition.

The first wave of generative AI gave us powerful chatbots and co-pilots, using the vast amount of data contained on the public internet and alignment with generalist annotator crowds.

Now, we’re approaching a new frontier of agentic AI systems that reason, use tools, and act autonomously in specialized, high-impact settings.

However, agentic systems will not be deployed unless they become as trusted as human experts—and learning only from public internet data can’t possibly get us to this level of accuracy and trust, no matter how advanced the underlying LLMs become.

Building expert-level agentic AI requires expert-level data–and the next frontier of AI will be driven by using expertise to develop this kind of data for evaluating and tuning specialized AI at scale.

At Snorkel, our mission is exactly this—to enable every enterprise to turn expert knowledge into specialized AI at scale. Today, we’re excited to announce the general availability of two new products within Snorkel’s AI Data Development Platform, creating a complete solution to help enterprises actually deploy agentic AI to production in mission-critical settings:

  1. Snorkel Evaluate – which enables enterprises to evaluate their AI systems with trust and accuracy at scale using our programmatic data development technology.
  2. Snorkel Expert Data-as-a-Service – which is already powering leading LLM developers with frontier AI datasets.

We built these new offerings based on our learnings working side-by-side with our amazing customers to help get real AI systems into production, including with 7 of the top 10 US banks, Fortune 500 companies, federal agencies, and leading LLM providers.

Increasingly, we see enterprise leaders in AI using the differentiated edge of their internal expertise and data, and the accelerant of proprietary datasets built with scaled external expertise. We’re excited to now support both of these approaches, together in one unified platform for bringing sophisticated AI models and agentic systems to production.

We’re also excited to release a new enterprise-inspired agentic AI benchmark dataset and walkthrough to showcase the power of what can be done with Snorkel Evaluate and Expert Data-as-a-Service together. We believe new, domain-specific benchmarks and evaluation tools are critical to drive agentic system development in a safe and successful way, and will be releasing new industry-leading benchmarks regularly—several more to come soon!

Finally, we’re incredibly excited to announce Snorkel’s $100M Series D at a $1.3B valuation, led by Addition with participation from Prosperity7 Ventures and QBE Ventures, existing investors Greylock and Lightspeed, and others to support our continued research and innovation pushing the frontiers of specialized AI.

To learn more, join Snorkel AI and innovators from Accenture, BNY, Comcast, Stanford University, QBE, University of Wisconsin-Madison, and more on June 26 for an exclusive virtual live event, Developing Specialized Enterprise AI Agents.

Now, let’s dive in and briefly explore the two newly launched products!

Snorkel Evaluate

First: we’re incredibly excited to announce Snorkel Evaluate, our AI evaluation platform for specialized data development and labeling in enterprise settings where vibe checks and out-of-the-box metrics driven by simple LLM prompts are just not enough.

Evaluation is the new entrypoint to the AI development cycle—but there’s a major gap between what’s available on the market today and what it actually takes to develop specialized, trustworthy evaluations for real enterprise applications.

Imagine—and this is barely a metaphor–that you were running a standardized test for students, like the SAT–but asked the students to write their own exam questions, grade their own tests, and figure out where to improve their performance on their own! This is largely the state of AI evaluation today.

In Snorkel Evaluate, we’re bringing our unique programmatic data development and labeling technology to close these gaps around defining what your AI system needs to know, how its performance is graded, and where to go next in highly specialized, aligned ways, with workflows for:

  • Scalably generating and curating benchmark evaluation datasets—the collection of prompts and expected responses or actions that define what your AI system is supposed to do.
  • Developing specialized evaluators that label or grade an AI system’s output and actions, defining how your AI system is supposed to perform, and aligning them to unique enterprise objectives and standards—going beyond off-the-shelf LLM-as-a-judge approaches that fail to be accurate enough for specialized tasks.
  • Labeling the fine-grained slices of your benchmark dataset that correspond to meaningful subtasks or error modes, and that give actionable guidance on where an AI system needs to be improved.

With Snorkel Evaluate, enterprises can rapidly build specialized, scalable evaluations that are tightly aligned to their unique use cases, settings, and standards—leading to real production value. Our early design partners are seeing high-impact results:

  • Rox—a leading agentic AI startup for revenue organizations–built specialized evaluators with 99%+ accuracy, up from 75% accuracy with a basic LLM-as-a-judge approach—enabling sufficient trust to ship a critical email outbound feature.
  • A top-5 telecommunications company built specialized evaluators that averaged 88% accuracy in under a week for their agentic CSR system, up from an average of 55% accuracy using basic LLM-as-a-judges.

“We’re at an inflection point where AI agents must deliver real enterprise value. To unlock Claude’s full potential, we need new evaluation approaches with domain expertise and human feedback. Anthropic is committed to working with innovators like Snorkel to ensure AI systems are refined, reliable, and aligned to enterprise needs.”

—Kate Jensen, Head of Revenue, Anthropic

Snorkel Expert Data-as-a-Service

Next: we’re incredibly excited to announce Snorkel Expert Data-as-a-Service—a white-glove service for AI evaluation and post-training datasets, built from the ground up for specialized, expert-level data.

As we move to the current wave of specialized, high-impact agentic AI, LLM developers have realized that the key to success is not about getting more data, but getting the right high-quality expert datasets, with the right distributions for specific domains and use cases.

Sometimes, this expert data lives in your organization, or in the minds of your own experts; we’ve long been focused on this setting at Snorkel. But often, scaled expertise outside of your organization is a critical and complementary accelerant for achieving the breadth, depth, and speed needed in AI today.

With Snorkel Expert Data-as-a-Service, we’re excited to now support this latter approach as well through a global network of experts across 1000’s of domains in STEM, vertical and professional, and consumer and lifestyle areas, building specialized datasets for leading LLM developers in frontier areas like:

  • Agentic – including multi-turn with users, multi-step with reasoning, and multi-tool across various domain-specific and consumer settings
  • Expert knowledge and reasoning – across thousands of subdomains in STEM/academic, vertical/professional, and consumer domains
  • Coding – across a variety of languages, frameworks, and tasks, and including unit tests and more nuanced rubrics for evaluation, complex long-sequence and multi-turn software engineering tasks, and more
  • Multi-modal – including text, PDF, image, video, code

With Snorkel Expert Data-as-a-Service, we’re able to deliver custom, expert-level datasets with higher quality, more precise distributional control, and greater delivery speed by leveraging the same programmatic data development and labeling technology in our AI Data Development Platform, pioneered over our past decade of research and development.

Scaled, expert data represents the new rocket fuel for specialized AI systems and agents–and increasingly, enterprises will mix their own unique, in-house expertise and data with proprietary datasets they develop using outsourced expertise, in order to achieve the acceleration they need in today’s AI market. In both cases, it’s about the right data, not just more data–and we’re excited to support this with our unique technology platform and new Expert Data-as-a-Service.

Expert Data as the Key to Durable Differentiation in AI

As models and infrastructure in the AI space continue to standardize, developing unique proprietary expert data for evaluation and tuning will become the centerpoint of AI development—and the key to a differentiated edge.

We believe the leaders in enterprise agentic AI will combine their unique internal expertise and data with the accelerant of scaled external expert data, in a rapid, iterative cycle of evaluation and tuning—and now, you can do this all in one unified, enterprise-grade platform.

If this seems relevant to something you’re building with AI models or agentic systems, let us know—we’re excited to talk!  For more detail on what we’ve just launched:

  • Check out the open source benchmark dataset preview and walkthrough we just released around building an enterprise AI agent;
  • And mark your calendar for June 26th to join our event on developing specialized enterprise AI agents. 

We’re excited to see what we can build together!