Deepening Snorkel AI’s partnership with Microsoft Azure AI
Many organizations adopting data-centric AI rely on Microsoft Azure for compute, data, and machine learning infrastructure. We’re excited to build on our partnership with Microsoft to help enterprises and government agencies solve their most impactful problems and unlock value from their data using AI. Azure customers can easily deploy Snorkel Flow on their Azure cloud infrastructure to accelerate AI application development with data-centric workflows and programmatic labeling. Snorkel Flow also complements Azure AI services and the Azure Machine Learning platform to build solutions for document intelligence, conversational AI, and more.
We have been fortunate to partner with Snorkel AI to leverage their amazing labeling capability … to automatically label fields with pipeline management fully integrated so new, custom AI can be easily trained and created for our customers. These can be deployed with Snorkel Flow’s simplified end-to-end automation, and the model development cycle can be fully improved.
Xuedong Huang, Technical Fellow and CTO, Azure AI, discusses Snorkel AI and Azure AI partnership in a recent talk.
With Snorkel Flow, Microsoft Azure customers can scale automation to the “iceberg under the surface” of valuable but unstructured and unlabeled enterprise data. This includes raw text, PDF documents, conversation transcripts, and more. Snorkel Flow integrates with a range of Azure AI Services for users to quickly generate high-quality training sets over complex, highly variable data, train custom ML models or fine-tune prebuilt Azure models, and deploy solutions into production. Organizations can quickly and securely get Snorkel Flow running. Our platform runs on Azure Kubernetes Service (AKS) within customer clouds and consumes unstructured data from various Azure data services, including Azure Blob Storage and Azure Data Lake Storage (ADLS).
A powerful example of how Snorkel AI is helping organizations leverage Azure AI services for their proprietary data and custom objectives is the new Snorkel Flow integration for Microsoft Azure Form Recognizer (currently in private preview). Azure Form Recognizer is an AI service that provides pre-built and customizable models for analyzing forms and PDFs. In addition to pre-built models supporting standard documents like W-2s, invoices, receipts, business cards, etc., Form Recognizer includes custom training support to fine-tune their powerful neural document models to support proprietary datasets, non-standard formats, and custom objectives.
Of course, as with most supervised learning approaches, this process requires high-quality labeled training data. Creating a training set of documents through traditional manual data labeling by subject matter experts is often prohibitively slow and expensive, significantly delaying the time to value for these projects and often making it challenging to achieve production-grade accuracy.
This is where Snorkel Flow comes in. Rather than annotating thousands of documents by hand, users programmatically label training data by encoding human and enterprise knowledge (e.g., pattern matches, knowledge base lookups, business logic, and more) using labeling functions. Snorkel Flow also orchestrates document content preprocessing (including OCR, layout information, and more through Form Recognizer), allows you to kick off custom Form Recognizer training jobs directly from the UI, and auto-generates performance analyses over custom Form Recognizer models to guide your next steps.
The integration between Snorkel Flow and Azure Form Recognizer unlocks rapid development for document-based use cases across many industries. Top US banks, healthcare, insurance, and other Fortune 500 organizations have used Snorkel Flow to extract information from complex documents such as 10-K reports, clinical trial protocols, technical manuals, rent rolls, legal contracts, and more. For example, Snorkel Flow with Form Recognizer can rapidly and accurately extract dollar amounts from large and varied real estate contract data sets. Our customers have accelerated their AI development by 10-100x while dramatically reducing the cost and effort required to label data. Now, enterprise data science and machine learning teams can leverage Azure Form Recognizer in their workflows to take advantage of state-of-the-art OCR, layout recognition, and neural document model training along with automated labeling, data-centric development, and foundation models using Snorkel Flow.
We’re excited to continue deepening our partnership with Microsoft Azure AI to accelerate AI development for enterprises. Schedule a custom demo tailored to your use case and Azure stack with our ML experts today.
Henry Ehrenberg is a co-founder of Snorkel AI, focused on technical strategy and engineering. He has been a core Snorkel team member since the project's origins in the Stanford AI Lab, building the open-source research library and conducting research on programmatic data labeling and augmentation.
Before Snorkel AI, Henry was the tech lead for Facebook Applied AI's representation learning team. Henry earned his master's degree in computational and mathematical engineering from Stanford University, and his bachelor's degree in applied mathematics from Yale University.