Artificial intelligence is reshaping virtually every industry. Within the financial services sector, for example, McKinsey estimates that AI has the potential to generate an additional $1 trillion in annual value while Autonomous Research predicts that by 2030 AI will allow operational costs to be cut by 22%. And yet – despite AI’s potential – Gartner predicts that by 2024 half of current AI deployments will be either delayed or canceled outright.

The unfortunate reality is that despite spending millions on AI initiatives and assembling incredibly talented teams, most teams are struggling to achieve meaningful AI outcomes.

Snorkel AI and Google Cloud have partnered to help organizations successfully transform raw, unstructured data into actionable AI-powered systems. The combination of Google Cloud services with Snorkel AI’s data-centric AI platform accelerates training data curation for ML development 10-100x [1] and empowers enterprises to solve some of their most critical challenges by accessing all of their knowledge and data to build AI systems.

Snorkel Flow easily deploys on Google Cloud infrastructure, ingests data from Google Cloud data sources, and integrates with Google Cloud’s AI and Data Cloud services. Snorkel AI and Google Cloud are extending our partnership to include high-value integrations that further streamline MLOps workflows—for example, Snorkel Flow’s native integration with Google BigQuery and a new integration with Vertex AI to train and deploy powerful AI applications faster and at scale.

Snorkel AI redefines AI application development with a data-centric approach

Snorkel AI addresses the biggest blocker to AI deployment: the massive hand-labeled training datasets needed to train ML models. Enterprises have a treasure trove of valuable insights embedded in files, contracts, conversation transcripts, emails, and other unstructured formats. Labeling unstructured data for ML projects has traditionally involved humans tagging each data point manually. This time-consuming, labor-intensive process is costly – and often infeasible – when enterprises need to extract insights from volumes of complex data sources or proprietary data requiring specialized knowledge from clinicians, lawyers, financial analysis or other internal experts. Worst of all, when data drifts or business requirements inevitably change, the process restarts from scratch and experts have to spend their time relabeling massive amounts of data.

Snorkel AI solves this bottleneck with Snorkel Flow, the data-centric AI platform. Data science and machine learning teams use Snorkel Flow’s programmatic labeling technology to encode and combine knowledge from sources such as previously labeled data (even when imperfect), heuristics from subject matter experts, business logic, knowledge bases, and even foundation models and then scale it to label large quantities of data at machine speed. Users are able to rapidly improve training data quality and model performance using integrated error analysis and model-guided feedback to develop highly accurate and adaptable AI applications.

Fast-track production AI applications with Vertex AI

With training data creation unblocked, data scientists can harness the full power of Google Cloud’s end-to-end platform to fast-track AI applications and analytics development. Traditionally data scientists have had to engage other teams to set up infrastructure and serve models, complicating and delaying the process. Vertex AI accelerates the training and deployment of ML models in production by abstracting the most technically complex processes, empowering data scientists to focus on building world-class ML models without having to be involved in underlying infrastructure elements.

In addition to providing data scientists with the autonomy to productionize their models for batch or online serving, Vertex AI enables data scientists to continuously monitor data and models in production using Model Monitoring to detect training-serving skew or feature drift.  

Snorkel Flow + Google Cloud Vertex AI

Snorkel AI has partnered with Google Cloud to enable data scientists to quickly generate high-quality training data over complex, unstructured data sources, train custom ML models or fine-tune pre-built models including latest foundation models and LLMs, and rapidly deploy ML models into production. Snorkel AI is now making it even easier for organizations to train, deploy, and monitor models with the new Snorkel Flow integration for Vertex AI (currently in private preview).

Snorkel Flow’s integration with Vertex AI streamlines and accelerates the MLOps process:

  • Snorkel Flow consumes unstructured data from Google Cloud data services such as Google Cloud Storage (GCS) and BigQuery. Snorkel Flow integrates natively with BigQuery, enabling data scientists to access data with just a few clicks.
  • Data is labeled programmatically using a data-centric AI workflow in Snorkel Flow. Snorkel Flow includes templates to classify and extract information from unstructured text, native PDFs, richly formatted documents, HTML data, conversational text, and more.
  • Data scientists can use high-quality training datasets created with Snorkel Flow to train AutoML models for text classification or custom use cases in Vertex AI. Alternately, Vertex AI Endpoints can be used to rapidly deploy models trained in Snorkel Flow. 
  • Vertex AI Model Monitoring helps maintain model performance by detecting data and model drift.
  • When data drift is detected, input feature values can be submitted to Snorkel Flow as signal. It’s easy to quickly adjust parameters as part of Snorkel AI’s workflow and rapidly regenerate the entire training set so models can be retrained in minutes using Vertex AI.
Untitled

Real-World Impact

Top US banks, healthcare, insurance, and other Fortune 500 enterprises have used Snorkel Flow to extract information from complex documents such as 10-K reports, clinical trial protocols, technical manuals, rent rolls, legal contracts, and more.

The combination of Snorkel AI and Vertex AI equip organizations to address challenges specific to their business requirements using proprietary unstructured data. Example use cases include:  

  • Customize patient treatments using EHR data. Electronic Health Records (EHRs) contain rich information – clinical notes, laboratory results, diagnoses, etc. – that can be utilized to tailor specific treatments for each patient. Traditionally, training classifiers for named entity recognition (NER) and cue-based entity classification have relied on hand-labeled training data, which requires considerable domain expertise. Academic research and customer experience have proven that Snorkel can outperform hand labeling with much faster, explainable results.  
  • Save costs with predictive well maintenance. Oil companies generate massive volumes of unstructured data in daily drilling reports, well maintenance logs, and other files. Rich information is buried within tabular PDFs with variable formatting. With Snorkel Flow, one leading energy provider built an AI application in 3 days that reduced the time to extract information from oil well drilling reports from up to 3 hours per report to a few seconds.  

Better Together: Snorkel AI + Google Cloud

Together, Snorkel AI and Google Cloud enable Fortune 500 enterprises to operationalize unstructured data and accelerate AI to keep pace with rapidly evolving needs of the business.

“The accuracy of any machine learning model is only as good as the data it was trained on, which is why we are delighted to partner with Snorkel AI to help data science teams eliminate the bottleneck of manual labeling. The combination of Snorkel AI’s programmatic, data-centric approach to labeling with Google Cloud’s ability to help organizations build, deploy and scale efficient AI models is a game-changer. Together, we are helping enterprises capitalize on the promise of AI and large language models to improve business processes, innovate, and ultimately transform their businesses.”

Dr. Ali Arsanjani, Director of Cloud Partner Engineering at Google Cloud

Learn More

We’re excited to team with Google Cloud to help accelerate AI development across industries. Schedule a custom demo tailored to your use case with our ML experts today.

References

[1] Snorkel AI documented customer results reflect 45x, 52%, 98% and similar improvements vs land-labeling https://snorkel.ai/case-studies/