A global insurance leader with a strong history of using data and statistics to inform policy decisions wanted to use AI to unlock leading indicator analysis. However, before they could build a solution, they were blocked by the prohibitively time-intensive data labeling required of these same teams.

Thanks to Snorkel Flow, the data science team was able to unblock data labeling and build a document intelligence application that accurately performs root-cause classification tasks. By improving claims analysis, the application reduces the number of required onsite visits and increases analyst productivity by 20%.

Using Snorkel Flow to automate root-cause analysis of insurance claims
This global insurance leader provides its customers with a portfolio of policies, including coverage for potential “large loss” events. In addition to the coverage, the insurer also provides high-value loss review and consulting services to help customers avoid future loss events and litigation costs. To improve this service, the AI/ML team wanted to build a solution to automate root-cause analysis. 


Insurance claims are very detailed documents, and each one is different from the next. In order to sort, process, and extract critical root-cause data from these claims, this global insurance leader has a large team of analysts and adjusters. In addition to document review, these experts often need to go on-site to gather more information. The team spends an average of 36,000 hours each year analyzing over 70,000 claims. This limits their ability to deliver timely guidance for customers on how to avoid large loss events.

Knowing that reducing the effort required for root-cause analysis would deliver significant time and cost savings potential for the company, the AI/ML team turned to automation. Before Snorkel Flow, they tried to automate root cause prediction using an external database, but the data was lagging, and claims information was incomplete.

NLP task: Loss source classification

To overcome this limitation and build a machine learning solution, the team needed high-quality training data. However, they were blocked at the start when they determined data labeling would require well over a thousand hours from high-value experts (including VP-level contributors), representing an estimated $500,000 expense. 

  • Lacked efficient tooling and relied largely on Jupyter notebooks with ineffective model analysis.
  • Rich information was buried within verbose free-form text documents and difficult to normalize.
  • Poor collaboration with domain experts across lines of business blocked the data science team from making progress.


Significantly reduce the time it takes to generate the necessary volume of ground-truth data needed for customized NLP models that are easily adaptable and, at the same time, reduce the time required by SMEs to generate that data.


Using Snorkel Flow’s unmatched programmatic data labeling, model analysis, and guided data iteration, the team built a previously impossible document intelligence application in just a few days. The application automatically classifies the claim root cause as either “human behavior” or “physical control.” This saves the claims team thousands of hours, drastically reduces the number of on-site visits to gather additional information, and ultimately improves analyst productivity by 20%.

The data science team was able to collaborate with the loss control experts in Snorkel Flow’s user-friendly Annotator Suite to create ground truth data for analysis and write 38 labeling functions to capture domain experts’ knowledge and existing resources. Under the hood, a Snorkel Flow label model then intelligently reconciled these labeling functions to accurately auto-label over 29,000 training data points. The team also noticed that the data labeled by Snorkel Flow was on average 20% more accurate than their previous hand-labeling attempts, while reducing the required time from 1,400 to just 25 hours.

NLP Snorkel Flow solution: Loss source classification

Next, the data science trained a machine learning model in-platform using the training data. This model generalized beyond the training data set labeled by Snorkel Flow’s label model and provided analysis on where the model was confused, and what action to take to improve. With this guided error analysis, the team built a model that accurately identifies claim root cause with an error rate of less than 10%.  

  • Unified platform for training data creation, including guided error analysis for efficient, effective iteration.
  • Support for complex data allowed the team to activate rich, previously difficult to obtain data buried in unstructured text
  • Improved collaboration between domain experts and data scientists across labeling, troubleshooting, and iteration.

Now that the company has a repeatable process for data-centric AI development using Snorkel Flow, they can use leading indicator insights to help their customers proactively avoid future loss events. In doing so, they’re able to increase customer value and create competitive differentiation.



Saved in data labeling costs


Faster data labeling with 20% better accuracy than their previous hand-labeled approach


Reduction in loss cause analysis time