SME acceptance rate
to measurable outcomes
high-value topics with hallucinations eliminated
The challenge
A top 10 US bank manages CLO portfolios totaling billions in assets, each governed by contracts up to 500 pages. Portfolio managers, risk teams, and client-facing bankers need fast, accurate answers: which loans are eligible, whether coverage tests are at risk, how cash flows distribute. A missed coverage threshold triggers automatic payment changes while loan eligibility windows close in hours. Manually searching documents was taking hours per query.
The bank built an early prototype using GPT-4 with prompting and RAG that earned executive buy-in. Pre-deployment, when expert analysts reviewed outputs on complex queries involving nested definitions, cross-referenced terms, and layered conditions, only 25% of responses were acceptable. Improving accuracy was a critical challenge, but the deeper issue was trust. For analysts to rely on a system for decisions with real financial consequences, they needed confidence that the AI system would perform at the level of the best human expert.
The solution
Snorkel’s applied AI engineers worked alongside the bank’s subject matter experts to build the data and eval foundation the OpenAI-powered system needed. Rather than asking experts to manually label large volumes of data, Snorkel worked with experts to encode their judgment programmatically into high-quality expert datasets at scale. To expand coverage further, the Snorkel team generated synthetic training data curated to reflect the right distribution of difficulty, edge cases, and failure modes.
The team also redesigned how contracts were processed. Drawing on Snorkel’s research into document structure and retrieval optimization, the team applied semantic similarity and document structure analysis rather than fixed token windows, keeping each logical section of a contract intact. Custom classifier models added metadata tagging sections by date references, defined terms, and structural elements, so retrieval surfaced the right context for each query. The embedding model was fine-tuned on domain-specific data to better distinguish between legally similar but semantically distinct passages.
Snorkel encoded the bank’s definition of correctness into an eval dataset, enabling 40+ experiments in the first sprint to identify exactly what was holding performance back. Snorkel continued iterating after hitting the initial bar, progressively eliminating hallucinations across the bank’s top 48 high-value topics. The custom eval harness empowers the bank to adopt new capabilities and update models without rebuilding from scratch or losing the encoded expertise that makes the system trustworthy.
The outcome
The improvement happened in documented sprints. The first three-week sprint moved SME acceptance from 25% to 79%. Each experiment was guided by the eval harness’s signal on exactly where the system was failing. Sprint 2 reached 89% acceptance. Phase 2 extended coverage to the bank’s full 48 high-value CLO topics, achieving a 94% high-quality response rating with hallucinations eliminated across the entire test set.
For a bank managing hundreds of CLO structures, the compounding effect is significant. Every hour saved per query across dozens of daily document questions translates directly into faster portfolio decisions, more responsive client service, and reduced operational risk at scale.
More customer stories









