To meet the requirements of unexpected regulatory changes brought on by the pandemic, a top-10 US. bank needed to urgently adapt its underperforming model-centric artificial intelligence and machine learning development approach to a data-centric one. The team used Snorkel Flow to automatically classify thousands of loan documents and extract critical clauses in just 24 hours, saving loan managers thousands of hours of manual document review.

Triaging loan documents based on risk exposure with AI/ML
In March 2020, the benchmark interest rates at which major global banks borrow from one another plummeted due to the pandemic. LIBOR, which stands for London Interbank Offered Rate, fell from 1.8% in April to 0.7% in March. Banks all over the globe had to reevaluate their risk exposure as many went from earning interest to owing interest practically overnight.


When faced with the sudden onset of the pandemic, this top-10 US bank had to reassess its lending policies and risk exposure quickly. The bank needed to review loan contracts to determine the impact of rate changes. Given the urgency and scale of loans to review, the bank couldn’t rely on humans to triage risk. They knew they needed to use a machine-learning model. However, based on their experience with a recent project (before Snorkel Flow), that had a few blocking challenges to address:

  • Time to label training data for ML solution was prohibitively slow, given the reliance on manual labeling carried out by domain experts and the inability to outsource.
  • Lack of adaptability to various document structures, including unseen PDF and tabular formats.
  • Poor collaboration between domain experts and data scientists made it difficult to solve for ambiguous labels.
Sample loan document templates to illustrate the diversity of formats


Reduce data labeling and development time by auto-labeling—without reducing data or AI application quality.


Leveraging Snorkel Flow, the data science team, working closely with Snorkel AI experts, built an ML model that achieved better-than-human accuracy for more than 250,000 documents in under 24 hours. They worked alongside finance and tax subject matter experts within the bank to capture their heuristics as labeling functions. The Snorkel Flow platform intelligently combined these to auto-label a high-quality training data set. This labeled training data was then used to train a model which successfully extracted the key information from 250k documents, greatly expediting the bank’s ability to review loans. Moving forward, any time a central bank’s interest rate changes or if they need to process a new loan format or document structure, the team can quickly change a few labeling functions instead of going back through the slog of relabelling data by hand. 

Using Snorkel Flow, this top US bank overcame its original challenges with:

  • Auto-labeled training data by capturing labeling expertise as labeling functions and applying intelligently 250k documents with better-than-human quality (99.1%). 
  • Ensured adaptability with rapid code edits to labeling functions, not wholesale manual relabeling. 
  • Improved collaboration between domain experts and data scientists across labeling, troubleshooting, and iteration

Instead of spending six months labeling data by hand to get the training datasets they need to improve their model, the team relies on Snorkel Flow. Shifting from a model-centric approach to a data-centric one had significant performance improvements, both in terms of productivity as well as model performance.


documents processed with better-than-human quality


accuracy achieved with the extraction mode