CUSTOMER STORY

How Rox achieved 99% Accuracy with Snorkel Evaluate

99%

Achieved accuracy with specialized evaluators

+24

Point improvement in shipped critical outbound email feature

Image

Enterprises facing aggressive revenue targets without more headcount are turning to agentic AI innovator Rox.  Rox is redefining the revenue stack with it’s AI-powered sales productivity platform, starting with the Rox sales agent swarm which provides agents that can perform at the level of top sales reps.

Rox is redefining the revenue stack with our AI-powered sales platform. Off-the-shelf models aren’t capable of delivering the quality we need to ensure our agents are accurately personalizing outbound emails. With Snorkel Evaluate we have been able to confidently assess our outbound email agent, then identify and fix issues to achieve human-level accuracy. The level of visibility and control Snorkel delivers is a huge advantage as we build trustworthy, agentic AI at scale.

– Shriram Sridharan, co-founder, Rox

Challenge

Rox’s ability to ensure outbound emails are fully accurate and aligned with each customer’s brand and objectives is a key differentiator. However, when developing models Rox found that off-the-shelf evaluation approaches were not able to deliver the required quality for critical custom evaluation tasks. Initially Rox wrote it’s own LLM-as-a-judge. While the model seemed to score well, the Rox team wanted higher confidence for production deployment. 

Solution

Using the Snorkel Evaluation Suite, Rox scored the judge against human experts and found it aligned only around 75% of the time. The team used Snorkel to iterate on the judge to increase alignment. The aligned judge surfaced an issue with the prototype outbound model, which used the wrong recipient name around 11% of time, enabling Rox to correct the model’s behavior. 

Outcomes

Achieved 99%+ accuracy with specialized evaluators enabling sufficient trust to ship a critical email outbound feature.

Snorkel Logo

Ready to get started?

Take the next step and see how you can accelerate AI development by 100x.