Today, I’m incredibly excited to announce our new offering, Snorkel Custom, to help enterprises cross the chasm from flashy chatbot demos to real production AI value. Combining our programmatic data development platform, Snorkel Flow, with hands-on support from our team of AI experts, Snorkel Custom engagements start with co-development of custom, use-case specific evaluation benchmarks, and end with a production quality LLM tuned on your unique data, and optimized for your unique use case.

Enterprises are under mounting pressure to show value from AI investments, but are realizing that off-the-shelf LLMs are rarely enough to meet production requirements. Instead, LLMs have to be tuned for enterprises’ unique use cases–and success here is all about the quality of the labeled, curated data this relies on.

The Snorkel team has spent the last decade pioneering the practice of AI data development and making it programmatic like software development. Today, we help some of the world’s most sophisticated enterprises label and develop their data for tuning LLMs with our flagship platform, Snorkel Flow. Now, we’re excited to widen the aperture and support the entire LLM development process, from initial benchmark definition and evaluation to distillation and serving optimization, based on our unique technology and experience with the most critical element of all these steps: the data.

Accelerating AI development as a partner

The launch of Snorkel Custom is in response to three key trends we’ve seen in the enterprise space:

  1. Sky-high expectations and the gap to production: Modern LLMs create mind-blowing demos, and many of these demos went straight to the C-suite in 2023.  Expectations were set sky-high, budgets were locked, and now teams are realizing that converting flashy chatbot demos into production AI is a much longer road than it seemed.
  2. Overwhelming demand and limited bandwidth: Enterprise AI teams are flooded with new use cases to take to production. With limited bandwidth, competition for talent and a ticking clock to get to production, teams are struggling to keep pace.
  3. Data-centric challenges throughout the development lifecycle: While we’ve focused on training and tuning models for years, new generative use cases introduce unique challenges around key steps like benchmark definition, evaluation, and model optimization–all of which depend on data labeling and curation.

With Snorkel Custom, we support the entire demo-to-production pipeline, using the combination of our platform, Snorkel Flow, and our team of AI experts.  The end result is a faster, customized, production-ready LLM that is tuned using unique data, resulting in greater accuracy and cost effectiveness.

The five key stages of crossing the chasm

Snorkel Custom engagements are structured around what we see as the five steps to production LLM development, each driven by our programmatic data development platform, Snorkel Flow, and with the support of our team:

  1. Custom Benchmark and Evaluation Development: Most public benchmarks are irrelevant for actual enterprise use cases. We work with customer subject matter experts (SMEs) and product managers to define a custom, use case-specific benchmark.
  2. Data Labeling and Development: We use Snorkel Flow to programmatically label, curate, and develop data for LLM fine-tuning and alignment with state-of-the-art efficiency and accuracy, as well as to support human review and annotation for evaluation.
  3. LLM Fine Tuning and Alignment: Our platform works as a neutral layer on top of all available open- and closed-source LLMs, so that you can tune the base LLM of your choice and then easily swap in new ones as frontiers advance.
  4. LLM Distillation and Cost Optimization: Most enterprise use cases do not need a massive generalist model. In fact, using an LLM with generalized scope is a heavy burden on compute resources and creates unnecessary risk around out-of-scope interactions. Snorkel’s advanced model distillation techniques train small specialized LLMs which improve use case-specific accuracy and significantly lower cost of ownership.
  5. Model Serving and Maintenance: We deploy the optimized models to a production environment on our SOC-2 certified cloud infrastructure, or to an organization’s internal infrastructure. Importantly, our unique programmatic data development platform enables rapid adaptation of datasets and models, enabling enterprises to easily keep pace with changing conditions and objectives.

These stages are underpinned by our platform, Snorkel Flow, facilitating a seamless transition towards self-sufficiency in AI model development and maintenance.

Snorkel Custom: formalizing a proven model

Snorkel Custom grows out of our experiences partnering closely with large enterprises across multiple verticals to help them get provable value from Gen AI. For example, a top-10 US bank customer began a project with GPT-4 and retrieval augmented generation (RAG), and soon found that this off-the-shelf LLM system had 25% accuracy on key business queries. Our team collaborated with the bank’s experts, deploying Snorkel Flow to programmatically label and curate data. In less than two months, the result was a boost from 25% to 90%+ model accuracy.

In another example, Wayfair and Snorkel’s team used Snorkel Flow to develop custom LLMs with a programmatic data development approach. This led to a 10x faster development cycle and improved model precision, enabling Wayfair to surface more relevant products for customers –improving cart performance and conversion rates.

See how Wayfair labeled millions of images in months vs. years and improve the accuracy of their models by 20+ points.

Forward together

This is the year when enterprises have to turn AI hype into real production value. The key to this is using their unique data, at all stages of AI development, to evolve off-the-shelf LLMs into custom LLMs that actually work in their production settings.

We are extremely excited to support all stages of this data-centric development journey with Snorkel Custom, and to ensure that our customers cross the demo-to-production chasm with a strong, repeatable model of AI development that leverages their data’s unparalleled value.