Image

How Rox achieved 99% accuracy with Snorkel Evaluate

Impact
99%

Achieved accuracy with specialized evaluators

+24

Point improvement in shipped critical outbound email feature

The challenge

Rox’s ability to ensure outbound emails are fully accurate and aligned with each customer’s brand and objectives is a key differentiator. However, when developing models Rox found that off-the-shelf evaluation approaches were not able to deliver the required quality for critical custom evaluation tasks. Initially Rox wrote it’s own LLM-as-a-judge. While the model seemed to score well, the Rox team wanted higher confidence for production deployment.

The solution

Using the Snorkel Evaluation Suite, Rox scored the judge against human experts and found it aligned only around 75% of the time. The team used Snorkel to iterate on the judge to increase alignment. The aligned judge surfaced an issue with the prototype outbound model, which used the wrong recipient name around 11% of time, enabling Rox to correct the model’s behavior.

The outcome

Achieved 99%+ accuracy with specialized evaluators enabling sufficient trust to ship a critical email outbound feature.

Image

Rox is redefining the revenue stack with our AI-powered sales platform. Off-the-shelf models aren’t capable of delivering the quality we need to ensure our agents are accurately personalizing outbound emails. With Snorkel Evaluate we have been able to confidently assess our outbound email agent, then identify and fix issues to achieve human-level accuracy. The level of visibility and control Snorkel delivers is a huge advantage as we build trustworthy, agentic AI at scale.


Shriram Sridharan, co-founder, Rox

Enterprises facing aggressive revenue targets without more headcount are turning to agentic AI innovator Rox. Rox is redefining the revenue stack with it’s AI-powered sales productivity platform, starting with the Rox sales agent swarm which provides agents that can perform at the level of top sales reps.

Share this customer story

More customer stories

View all stories
Leading Global Firm-case study banner image
Deploying production AI in <60 days to accelerate claims review 67%
A leading global firm transforming insurance subrogation operations with AI found that manual review processes capped their throughput to ~30% of available claims. This bottleneck left significant revenue on the table and froze their ability to scale. The path to automation was further blocked by severe data imbalances where the critical signals for coverage appeared in only a small fraction of claims, making traditional AI models unreliable.
DIU-case study banner image
DIU enhances decision-making resilience with Snorkel AI
Strategic dominance in the Indo-Pacific relies on the ability to track and coordinate friendly forces — ”blue objects” — with absolute precision. To maintain operational awareness in dynamic and contested environments, the Department of War identified a requirement for adaptable, dual-use technologies that enhance logistics and decision-making resilience.
Top 5 Global Telco-case study banner image
From stalled pilot to $43M annual ROI and 95% accuracy
This Top 5 Global Telco aimed to evolve its internal billing co-pilot into a customer-facing chatbot capable of serving its global customer base.

For models that need to be right. Not just good enough.