A global consulting firm whose roots go back over a century has earned its reputation for rigorous analysis and anticipating change. To improve its ability to provide on-demand insights for its worldwide auditing team, the company wanted to use artificial intelligence but quickly ran into problems creating training data for the machine learning algorithms. 

Using Snorkel Flow, the team was able to automate data labeling and build an NLP application in a matter of days instead of months. With the platform’s data-centric development lifecycle and programmatic labeling, they were able to label 16x more data in a matter of hours and use this to build an NLP application with 14-point higher F1 score. In all, the time-to-value for this project was 3x faster compared to a previous solution. 

Using AI to support audit relevance and improve operational efficiency

This global “big four” consulting company strives to provide its diverse team of experts with the most current and relevant accounting, auditing, and industry information. Over the last 170 years, the company has learned that to anticipate shifts in regulations and proactively help its clients adapt, it must stay up to date on the latest trends and policy changes. To keep this pulse, the firm’s more than 300K experts, spread across 150 countries, spend hours each day in the firm’s proprietary system manually reviewing various accounting, auditing, and industry information. The data science team set out to leverage AI/ML to both reduce this costly effort and to innovate by helping to identify which of their clients are most likely to be audited. 

Challenge

The consulting company’s reputation depends on its ability to quickly and accurately navigate, coordinate, manage, and drive consistent audits globally, irrespective of size, complexity, or location. Maintaining that leading-edge advantage requires significant manual effort and deep domain expertise. Their experts use the company’s internal system to scour news outlets and parse through dense documents looking for signals, like the mention of a fraud scheme or liquidity issues, that might trigger a client to be audited. The team estimated that each auditor search lasts 10 min and costs $50-60 on average. If the team could reduce the time spent searching by surfacing more relevant articles sooner, it would save the company millions annually. 

For one project, the data science team was tasked with streamlining news monitoring to anticipate change—be that movements in the capital markets, regulatory trends, or technological innovation—so the firm can proactively help clients adapt. They saw an opportunity to use custom NLP models to automatically analyze, categorize, and extract key client information from various sources, making it easily accessible to the auditor.

However, as they started working on the project, they quickly ran into problems. First, it took three experts a week to label 500 training data points; to achieve the volume required, they were going to need more than 52 person-weeks. After a few iterations of manually labeling data, they found it was nearly impossible to adapt to changes in data or business goals on the fly; instead, they would have to start labeling training data from scratch. These were some of the challenges that stood in their way:

  • Rich information was buried within verbose free-form text documents and difficult to normalize.
  • Time to label training data was prohibitive given the amount of manual effort involved.
  • Lack of adaptability to changing market conditions mentioned in news articles and how they affect audits.

Goal

Significantly reduce the time required of SMEs to generate the training data needed for customized NLP models, while also ensuring these models are easily adaptable. 

Solution

Within a few days, the firm and Snorkel AI team built an NLP application to classify whether news articles are relevant to the original search term. The team harnessed Snorkel Flow’s programmatic labeling capabilities to break through previous bottlenecks and label over 10,000 news articles in just a few hours—a task that would have taken an expensive human audit team nearly a year

NLP information extraction with Snorkel Flow

Not only was the data labeled using Snorkel Flow 32% more accurate than their previous hand-labeling methods, but the team used Snorkel Flow’s guided error analysis to rapidly improve model quality and drive fine-grained corrections, bringing their f1 score up from 70 to 85 in just a matter of days. This greatly accelerated their development cycle, and provided a more trustworthy model for the auditing team. During these rapid innovation cycles, the team also saw first-hand how Snorkel Flow made it possible to quickly react to changing market conditions and business needs by adjusting labeling functions rather than relabeling manually.

Using Snorkel Flow, this consulting firm realized a number of benefits.

  • Support for complex data allowed the team to activate rich, previously difficult to obtain data buried in unstructured text.
  • Auto-labeled by capturing labeling expertise as labeling functions which Snorkel Flow applied intelligently en-masse.
  • Ensured adaptability with rapid code edits to labeling functions, not wholesale manual relabeling.

Collaboration between the SMEs and data scientists was another massive improvement seen by the team. Using Snorkel Flow’s Annotator Suite, the audit managers efficiently lent their expertise to write labeling functions, provide ground truth labels, and troubleshoot errors during iteration. By fundamentally enhancing the workflow and no longer relying on tedious manual labeling by trading CSV files back and forth, the team made better use of auditors’ expertise while reducing the time required of these SMEs by one-third

Results

10,000

News articles auto-labeled in days vs months

15%

Increase in model accuracy over hand labeling

3X

Faster development while requiring ⅓ less SME time