Image

From hours to seconds on CLO contract review with 94% end user acceptance

Impact
25% → 94%

SME acceptance rate

3 weeks

to measurable outcomes

48

high-value topics with hallucinations eliminated

The challenge

A top 10 US bank manages CLO portfolios totaling billions in assets, each governed by contracts up to 500 pages. Portfolio managers, risk teams, and client-facing bankers need fast, accurate answers: which loans are eligible, whether coverage tests are at risk, how cash flows distribute. A missed coverage threshold triggers automatic payment changes while loan eligibility windows close in hours. Manually searching documents was taking hours per query.

The bank built an early prototype using GPT-4 with prompting and RAG that earned executive buy-in. Pre-deployment, when expert analysts reviewed outputs on complex queries involving nested definitions, cross-referenced terms, and layered conditions, only 25% of responses were acceptable. Improving accuracy was a critical challenge, but the deeper issue was trust. For analysts to rely on a system for decisions with real financial consequences, they needed confidence that the AI system would perform at the level of the best human expert.

The solution

Snorkel’s applied AI engineers worked alongside the bank’s subject matter experts to build the data and eval foundation the OpenAI-powered system needed. Rather than asking experts to manually label large volumes of data, Snorkel worked with experts to encode their judgment programmatically into high-quality expert datasets at scale. To expand coverage further, the Snorkel team generated synthetic training data curated to reflect the right distribution of difficulty, edge cases, and failure modes.

The team also redesigned how contracts were processed. Drawing on Snorkel’s research into document structure and retrieval optimization, the team applied semantic similarity and document structure analysis rather than fixed token windows, keeping each logical section of a contract intact. Custom classifier models added metadata tagging sections by date references, defined terms, and structural elements, so retrieval surfaced the right context for each query. The embedding model was fine-tuned on domain-specific data to better distinguish between legally similar but semantically distinct passages.

Snorkel encoded the bank’s definition of correctness into an eval dataset, enabling 40+ experiments in the first sprint to identify exactly what was holding performance back. Snorkel continued iterating after hitting the initial bar, progressively eliminating hallucinations across the bank’s top 48 high-value topics. The custom eval harness empowers the bank to adopt new capabilities and update models without rebuilding from scratch or losing the encoded expertise that makes the system trustworthy.

The outcome

The improvement happened in documented sprints. The first three-week sprint moved SME acceptance from 25% to 79%. Each experiment was guided by the eval harness’s signal on exactly where the system was failing. Sprint 2 reached 89% acceptance. Phase 2 extended coverage to the bank’s full 48 high-value CLO topics, achieving a 94% high-quality response rating with hallucinations eliminated across the entire test set.

For a bank managing hundreds of CLO structures, the compounding effect is significant. Every hour saved per query across dozens of daily document questions translates directly into faster portfolio decisions, more responsive client service, and reduced operational risk at scale. 

Share this customer story

More customer stories

View all stories
Image
Conversational, decision-grade
responses in 15 seconds
A global media intelligence firm analyzes hundreds of millions of sources daily – from public news, social, and broadcast to proprietary analyst-curated databases – to help large enterprise clients manage communications, reputation, and strategic decision-making. Their competitive advantage is the layer on top of publicly available data: in-house human editorial teams, proprietary scoring and analytics frameworks, and years of analyst judgment refined into decision-grade intelligence. When a crisis signal is building or a competitor’s narrative is gaining traction, speed and accuracy matter enormously. Historically, getting an answer meant waiting for a human analyst to manually aggregate across those sources: a process measured in hours, not seconds.
Leading Global Firm-case study banner image
Deploying production AI in <60 days to accelerate claims review 67%
A leading global firm transforming insurance subrogation operations with AI found that manual review processes capped their throughput to ~30% of available claims. This bottleneck left significant revenue on the table and froze their ability to scale. The path to automation was further blocked by severe data imbalances where the critical signals for coverage appeared in only a small fraction of claims, making traditional AI models unreliable.
DIU-case study banner image
DIU enhances decision-making resilience with Snorkel AI
Strategic dominance in the Indo-Pacific relies on the ability to track and coordinate friendly forces — ”blue objects” — with absolute precision. To maintain operational awareness in dynamic and contested environments, the Department of War identified a requirement for adaptable, dual-use technologies that enhance logistics and decision-making resilience.

For models that need to be right. Not just good enough.