Consulting giant eliminates a year of labeling time with Snorkel
News articles auto-labeled in days vs. months
Increase in model accuracy over hand labeling
Faster app development with 50% less SME time

A global consulting firm with a 170+ year history and a reputation for rigorous analysis aimed to use AI to improve its ability to provide on-demand insights for its worldwide auditing team. The project quickly stalled, however, when the company encountered difficulty creating training data for the machine learning algorithms.
Partnering with Snorkel, the firm was able to automate data labeling and build an NLP application in days instead of months. Snorkel’s industry-best experts, using our proprietary technology, collaborated with the firm’s team to label a trove of data in a matter of hours and used this to build an NLP application with a 14-point higher F1 score.
In all, the time-to-value for this project was 3x faster compared to a previous solution.
Using AI to support audit relevance and improve operational efficiency
This global “big four” consulting company strives to provide its diverse team of experts with the most current and relevant accounting, auditing, and industry information. Over the last 170 years, the company has learned that to anticipate shifts in regulations and proactively help its clients adapt, it must stay up to date on the latest trends and policy changes.
To keep this pulse, the firm’s more than 300K experts, spread across 150 countries, spent hours each day manually reviewing various accounting, auditing, and industry information. The data science team set out to leverage AI/ML to both reduce this costly effort and to innovate by helping to identify which of their clients are most likely to be audited.
Challenge
The consulting company’s reputation depends on its ability to quickly and accurately navigate, coordinate, manage, and drive consistent audits globally, irrespective of size, complexity, or location. Maintaining that leading-edge advantage requires significant effort and deep domain expertise. Their experts use the company’s internal system to scour news outlets and parse through dense documents looking for signals, like the mention of a fraud scheme or liquidity issues, that might trigger a client to be audited. The team estimated that each auditor search lasts 10 minutes and costs $50–60 on average. If the team could reduce the time spent searching by surfacing more relevant articles sooner, it would save the company millions annually.
For one project, the data science team was tasked with streamlining news monitoring to anticipate change—be that movements in the capital markets, regulatory trends, or technological innovation—so the firm can proactively help clients adapt. They saw an opportunity to use custom NLP models to automatically analyze, categorize, and extract key client information from various sources, making it easily accessible to the auditor.
However, as they started working on the project, they quickly ran into problems. First, it took three experts a week to label 500 training data points; to achieve the volume required, they were going to need more than 52 person-weeks. After a few iterations of manually labeling data, they found it was nearly impossible to adapt to changes in data or business goals on the fly; instead, they would have to start labeling training data from scratch. These were some of the challenges that stood in their way:
- Rich information was buried within verbose free-form text documents and difficult to normalize.
- Time to label training data was prohibitive, given the amount of manual effort involved.
- Lack of adaptability to changing market conditions mentioned in news articles, and how they affect audits.
Goal
Significantly reduce the time required of SMEs to generate the training data needed for customized NLP models, while also ensuring these models are easily adaptable.
Solution
Within a few days, the firm’s data science team and Snorkel’s experts, using our proprietary technology, built an NLP application to classify whether news articles are relevant to the original search term. The team harnessed Snorkel’s programmatic labeling capabilities to break through previous bottlenecks and label over 10,000 news articles in just a few hours—a task that would have taken an expensive human audit team nearly a year.
Not only was the data labeled using Snorkel’s proprietary technology 32% more accurate than their previous hand-labeling methods, but Snorkel’s experts also used guided error analysis features to rapidly improve model quality and drive fine-grained corrections, bringing the F1 score up from 70 to 85 in just a matter of days. This greatly accelerated the development cycle and provided a more trustworthy model for the auditing team. During these rapid innovation cycles, the team also saw first-hand how Snorkel’s technology made it possible to quickly react to changing market conditions and business needs by adjusting labeling functions rather than relabeling manually.
By working with Snorkel, this consulting firm realized a number of benefits:
- Snorkel’s support for complex data allowed the team to activate rich, previously difficult-to-obtain data buried in unstructured text.
- Automated labeling captured SME expertise as labeling functions, which Snorkel’s technology applied intelligently at scale.
- Improved adaptability with rapid code edits to labeling functions, eliminating the need for manual relabeling.
Collaboration between the SMEs and data scientists was another massive improvement. With Snorkel’s annotation suite, audit managers efficiently lent their expertise to write labeling functions, provide ground truth labels, and troubleshoot errors during iteration. By fundamentally enhancing the workflow and no longer relying on tedious manual labeling or trading CSV files back and forth, the team made better use of auditors’ expertise while reducing the time required of these SMEs by one-third.
Ready to get started?
Take the next step and see how you can accelerate AI development by 100x.