Six months ago, Snorkel AI announced Snorkel Foundry, an offering for building specialized generative AI solutions using our state-of-the-art programmatic data development technology in Snorkel Flow. The message resonated with enterprises focused on getting generative AI out of the demo stage and into production, and we quickly found ourselves collaborating on high-value use cases.

Our generative AI (GenAI) customers span verticals including the banking, online retail, and telecommunications industries. They employ world-class data science teams and domain experts but realize that successful GenAI initiatives demand that they sharpen their focus on data.

In this post, we highlight a few of our customers’ successes.  

Let’s dive in.

Data development operations required for state-of-the-art generative AI.

Topline results: up to 54 points of improvement!

With Snorkel Foundry, we aim to help our customers build production-ready generative AI (GenAI) as fast as possible. The primary workhorse in these engagements is the Snorkel Flow platform, which enables rapidly curating training data for model fine-tuning, prompting, and RAG.

Here are some of the top-line results from a selection of GenAI projects conducted over the last six months.

How Snorkel Foundry helps customers

Each Foundry customer tried to build a GenAI application in-house and found that “out of the box” solutions fell short of their needs. Building production-grade GenAI, they found, requires a systematic approach to developing data—which is why they came to Snorkel.

The Snorkel Foundry team helped customers label, clean, slice, sample, filter, and/or augment their data. These efforts improved mission-critical aspects of our customers’ application pipelines—which often contained generative models, non-generative models, knowledge bases, and third-party tools.

Recurring themes

Recurring themes among our Foundry customers include:

Subject matter experts (SMEs) must be in the Loop

Domain-specific AI requires consistent SME guidance. Much like domain experts, domain specific AI needs on-the-job training. Your SMEs need to write the curriculum, grade the exams, and provide constructive feedback. Snorkel Flow offers a single application where data scientists and SMEs collaborate. Every Foundry engagement begins and ends with SME interaction.

Many components can (and often should) be fine-tuned

We’ve seen large performance gains from improving not just one, but multiple components in an ML system.

For example, RAG-powered Q&A systems struggle to compose accurate and helpful responses if they’re not fed the relevant supporting information—no matter how capable the user-facing LLM is. Even within the RAG portion of the system, fine-tuning can be applied to the chunking model, the embedding engine, the chunk re-ranker, etc.

Mileage will vary depending on the use case, but for high-value applications with high production bars, fine-tuning multiple components typically yields best results.

Systematic, high-quality synthetic data generation is increasingly relevant

Our customers recognize that their customer data is critical for fine-tuning custom AI. However, privacy restrictions often prevent them from using it. Snorkel’s data development suite helps our customers programmatically generate high-quality synthetic data aligned to their objectives and ready for downstream fine-tuning.

What’s next for Snorkel Foundry?

The real-world results delivered in the first six months of the Snorkel Foundry initiative have exceeded even our expectations. We are grateful for the opportunity to partner with some of the world’s largest organizations and to help them build high-quality custom AI. 

If you’re excited by data development for Generative AI, see our open roles.

If your company is struggling with getting your GenAI initiatives into production, and you believe that data is the bottleneck, reach out to us. We’d love to hear from you!