Image

Conversational, decision-grade
responses in 15 seconds

Impact
15.2

second responses vs hours 

98.6%

safety & governance score

+5 pts

decision usefulness with GPT-5.4-mini upgrade

The challenge

A global media SaaS company analyzes hundreds of millions of sources daily from public news, social, and broadcast sources to proprietary analyst-curated databases. Their competitive advantage is the layer on top of publicly available data: in-house human editorial teams, proprietary scoring and analytics frameworks, and years of analyst judgment refined into decision-grade intelligence. When a crisis signal is building or a competitor’s narrative is gaining traction, speed and accuracy matter enormously. Historically, getting an answer meant waiting hours for a human analyst to manually aggregate across multiple sources.

The company’s AI team set out to make that synthesis conversational and instant. The hard part was encoding the institutional expertise that makes their output decision-grade and informs decisions that can run into tens or hundreds of millions of dollars.

The solution

Snorkel designed and built a multi-agent conversational intelligence system which orchestrates specialized agents across the company’s data sources, returning grounded, decision-ready answers in seconds. This system includes an evaluation harness customized with the client team’s own institutional knowledge about what makes answers useful for decision-makers, what counts as properly grounded, and which safety and governance boundaries matter for individual use cases.

When Snorkel GPT-5.4-mini was released, Snorkel was able to easily assess the impact of upgrading from GPT-4.1-mini. The harness showed a 5-point lift in decision usefulness, a 100% pass rate on safety-critical refusal checks, and an improvement from 82.6% to 98.6% on broader governance checks for avoiding internal jargon and keeping unrelated details out of responses. This provided a clear, data-backed case to upgrade to GPT-5.4-mini. 

The outcome

The agent replaces a process which used to take hours, delivering answers in an average of 15 seconds with safety scores that meet client requirements. As models continue to evolve, the eval-first foundation lets the client test, compare, and swap models without rebuilding the agent or losing the expert judgement that makes it trustworthy. 

Share this customer story

More customer stories

View all stories
Image
From hours to seconds on CLO contract review with 94% end user acceptance
A top 10 US bank manages CLO portfolios totaling billions in assets, each governed by contracts up to 500 pages.
Leading Global Firm-case study banner image
Deploying production AI in <60 days to accelerate claims review 67%
A leading global firm transforming insurance subrogation operations with AI found that manual review processes capped their throughput to ~30% of available claims. This bottleneck left significant revenue on the table and froze their ability to scale. The path to automation was further blocked by severe data imbalances where the critical signals for coverage appeared in only a small fraction of claims, making traditional AI models unreliable.
DIU-case study banner image
DIU enhances decision-making resilience with Snorkel AI
Strategic dominance in the Indo-Pacific relies on the ability to track and coordinate friendly forces — ”blue objects” — with absolute precision. To maintain operational awareness in dynamic and contested environments, the Department of War identified a requirement for adaptable, dual-use technologies that enhance logistics and decision-making resilience.

For models that need to be right. Not just good enough.