Label data in minutes, not months, with Snorkel Flow

Improve model quality 10-100x faster
Snorkel Flow enhances the performance of large language models on custom tasks while reducing model development time and resources.

Programmatically label and correct
Snorkel Flow enables data science and subject matter expert users to rapidly and programmatically label training data and correct model errors. Express all of your rich domain knowledge and organizational resources as labeling functions, including via no code interface and natural language prompts, to rapidly label data and correct model errors, either to improve and adapt large language or "foundation" models on complex domain-specific tasks or to create large training sets.

Discover errors through guided analysis
Rapidly discover important slices of your datasets to label, whether labeling from scratch or correcting the errors, biases, and 'hallucinations' of large language or foundation models. Snorkel Flow's guided analysis suite employs state-of-the-art active learning and data quality analysis tools to enable efficient iteration on your data.
Commercial and open-source models supported:

Export labeled data, fine tune, or distill
Deploying LLMs can be complex, challenging, and expensive. Snorkel Flow gives you complete control to deploy on the infrastructure of your choice, eliminating any constraints or lock-in. Choose the deployment strategy that suits your needs: fine-tune LLMs, distill them into smaller train-specific models, or export labeled data.
Iterating on training data and models took months before. It literally takes minutes using Snorkel Flow.
Iterating on training data and models took months before. It literally takes minutes using Snorkel Flow.
Swaroop Kalasapur
Head of Software Technology Innovation Center
Customize LLMs and foundation models
Unlock the full potential of your AI with Snorkel GenFlow and Snorkel Foundry to build custom large language models and generative AI applications powered by your proprietary enterprise data and expert knowledge.
Instruction-tuning and RLHF with Snorkel GenFlow
Rapidly build, manage, and deploy generative AI applications (e.g. summarization, question answering, chat) with Snorkel GenFlow by programmatically curating, scoring, filtering, and sampling instructions and responses for instruction tuning with RLHF and other methods. Improve performance and reliability on specific tasks using you proprietary data.
Pre-train domain-specific LLMs with Snorkel Foundry
Build custom LLMs with Snorkel Foundry by programmatically sampling, filtering, cleaning, and augmenting proprietary data for domain-specific pre-training. Use your data as a differentiator by adapting powerful but generic base models into domain-specific specialist models that can serve as a base for all internal AI applications—predictive and generative.

Interoperable with your AI stack
Snorkel Flow supports cloud and on-premises infrastructure, providing enterprise-grade security, governance, and integrations with popular platforms and tools.
Data ingest
Quickly and securely integrate to data pipelines or upload data locally.
Model training
Train custom models or choose from leading model frameworks with optional AutoML.
Production serving
Deploy your models within Snorkel Flow or export to the service of your choice.

Infrastructure
Host Snorkel Flow within the secure infrastructure of your choice.
Power specialized NLP and computer vision
Build unique and mission-critical NLP and computer vision applications tailored for your domain, across your organization.
Supported ML tasks
Snorkel offers comprehensive support for a multitude of ML tasks, including classification, extraction, summarization, question answering, and more.
Text classification
Information Extraction
Conversation analysis
Entity linking
Relation extraction
Image classification
Supported data types
Snorkel seamlessly supports diverse data types, to build AI applications for unstructured text, semi-structured text and images.
Conversational text
Text documents
Native PDFs
Web data
Images
Our research
Our team has published 80+ open-access publications and has been featured in top peer-reviewed journals.

Multitask Prompted Training Enables Zero-Shot Task Generalization

Language Models in the Loop: Incorporating Prompting into Weak Supervision

Distilling Step-by-Step! Outperforming LLMs with Less Training Data & Smaller Models
