Q4 LTS Release of Snorkel Flow
We’re excited to announce the Q4 2021 LTS release of Snorkel Flow, our data-centric AI development platform powered by programmatic labeling. This latest release introduces a number of new product capabilities and enhancements, from a streamlined programmatic data development interface, to enhanced auto-suggest for labeling functions, to new machine learning capabilities like AutoML, to significant performance enhancements for PDF data and more. Many of our users have already delivered impact using preview versions of these features, contributing to tens-of-millions of dollars and person-years of time saved using Snorkel Flow’s data-centric approach to AI development.
Try out a whole new programmatic AI development experience
The next generation of Snorkel Flow Studio — our low-code interface for data scientists and ML engineers to programmatically develop training data sets — is here! The new Studio experience is now available in Beta, and introduces a new data exploration, programmatic labeling, model training, and analysis workflow to Snorkel Flow. The new interface makes the data-centric development process even more iterative and feedback-driven, emphasizing users’ favorite features of the original Studio interface while letting you develop models more efficiently than ever before. The new Studio interface includes:
- Search-based interface for labeling function development with text data: quickly try out and see the results of new labeling patterns with the new, expressive search experience, complementing the existing templated labeling function builders
- Integrated labeling function auto-suggest: Snorkel Flow notifies you when it identifies suggested search-based labeling functions
- Rapid model performance feedback: with each new labeling function and model configuration change, Snorkel Flow will automatically retrain and update your model to give rapid feedback on training data development progress
- Guided error analysis: identify error modes with Snorkel Flow’s auto-generated model analyses to drive iteration on training data and models, all from the same interface.
Advanced labeling function auto-suggest
The Q4 LTS release includes a Beta of our enhanced labeling function auto-suggest engine. This brings well-studied research techniques for automatically generating labeling functions with natural language processing and structured data into the Studio interface [Varma 2018, Chen 2019]. High quality labeling functions suggestions efficiently surface previously unseen patterns in your data while still keeping human experts in the loop.With the new engine, you can generate high quality labeling function suggestions using text n-grams, numerical values, and even train model-based labeling functions over multiple features. You can then accept or reject labeling functions based on your domain knowledge. We’ve also added advanced features for setting precision and coverage thresholds, letting you more effectively complement human-written labeling functions.This powerful approach of integrating automated insights and domain knowledge to efficiently label training data with high quality is a core tenet of the Snorkel Flow platform, and we’re excited to be expanding this feature set in the coming releases.
AutoML for one-click model search
We’re excited to announce the release of Snorkel Flow’s AutoML suite. One of the most important components of a data-centric development workflow is rapid feedback about model performance to guide iterative training data development. Snorkel Flow now automatically searches over model architectures, featurization techniques, and hyperparameters using the state-of-the-art optimization techniques available out-of-the-box. We’re excited to bring even more ML modeling frameworks and tools directly into the platform with each LTS release, and now make them even easier to leverage with our AutoML suite.
Expanded feature set for the annotation workspace
The Snorkel Flow annotation workspace is our streamlined interface for non-technical subject matter experts to efficiently add ground truth labels, comments, and custom tags to data points. This drives more efficient collaboration between data science and subject matter expert teams to create ground truth labels for model evaluation, inject domain knowledge for labeling function development, and more. You can learn more about our vision for collaborative development between data scientists and subject matter experts in Snorkel Flow in our recent blog post.In the Q4 LTS release, we’ve added new tools for both annotators and administrators of the annotation workspace.
- Support for conversational (e.g. chat bot) use cases: add ground truth labels, comments, and custom tags directly to utterances and conversations using a specialized conversation viewer
- Support for information extraction use cases: add ground truth labels, comments, and custom tags directly to spans using a specialized document viewer
- Visualizations for per-class inter-annotator agreement: visualize inter-annotator agreement matrices over specific classes to identify classes that are difficult for annotators to separate
- Create batches from filters: send custom groups of data points to annotators based on attributes from Snorkel Flow’s filtering interface, including data point content, model predictions and confidence, labeling function votes, tags, and more
- Statistics for subsets of annotators: measure annotation rate and inter-annotator agreement for custom subsets of annotators
In coming LTS releases, we plan to add further enhancements, including tools for more efficient batch tagging and search-based interfaces similar to the Studio workspace for data scientists.
Additional features and enhancements
- Enhanced compute engine performance and scalability: We’ve improved memory management, horizontal scaling, and overall throughput across the platform. Our performance benchmarks demonstrate over 4x speedups for large data points like PDF documents, and significantly larger speedups when using improved horizontal scaling.
- Additional administrative tools: Admins can now create invite links (with configurable seat limits, roles, and expiration times) and send them to new users for self-registration, making it far more convenient to add users to Snorkel Flow in a secure manner. In addition, we’ve added license expiration warnings so administrators can take action to update license keys when needed.
Request a Demo
Interested in finding out more about Snorkel Flow and how your organization can adopt data-centric AI? Request a demo to see how your organization can build and deploy AI applications to production faster and cheaper, while maintaining privacy and model quality.
Join Our Team
Interested in helping build Snorkel Flow? If you’re passionate about solving problems nearly every data science and developer team struggles with and want to shape the future of AI, we want to hear from you! We’re hiring for engineering, SRE, product, design, marketing, sales, solution engineering, and many other roles. Check out our careers page for more details.
Henry Ehrenberg is a co-founder of Snorkel AI, focused on technical strategy and engineering. He has been a core Snorkel team member since the project's origins in the Stanford AI Lab, building the open-source research library and conducting research on programmatic data labeling and augmentation.
Before Snorkel AI, Henry was the tech lead for Facebook Applied AI's representation learning team. Henry earned his master's degree in computational and mathematical engineering from Stanford University, and his bachelor's degree in applied mathematics from Yale University.