CUSTOMER STORY

How an F500 telecom uses Snorkel AI to measure and improve virtual assistant CX

Industry:
Technology
7x

Faster evaluation

79

F1 score on conversation rating

81%

Accuracy in predicting turn-level satisfaction


Our client, one of the largest telecommunications companies in the U.S., engages with millions of customers annually through its digital support agent. The agent handles inquiries ranging from billing to troubleshooting to scheduling service appointments. The enterprise faces challenges in measuring how these interactions impact customer satisfaction. Historically, their most reliable data collection method, post-interaction surveys, yielded sparse and skewed data, and efforts to supplement surveys with manual evaluations proved slow and expensive.

The company partnered with Snorkel AI to develop AI-driven solutions that could predict customer satisfaction at both the conversation and individual message levels. The project produced real-time monitoring tools that filled in the gaps left by surveys and helped pinpoint where customer sentiment shifted, enabling the organization to intervene proactively.


Our client values customer satisfaction as a core business goal, but their ability to track it with their virtual assistant was limited. The company typically tried to collect experience ratings at the end of interactions. Customers answered these surveys infrequently—and typically when dissatisfied—making the data unreliable in its raw form.

Furthermore, manually labeling interactions to analyze satisfaction trends presented slow turnarounds and high costs. Our client needed a way to predict satisfaction scores across unlabeled conversations and understand how specific agent actions impacted those scores turn by turn.

Challenges

Our client faced three primary challenges:

  • Sparse and unbalanced feedback. Our client could only collect survey responses for a small percentage of virtual assistant conversations. Moreover, it skewed negative, making it hard to evaluate which specific actions led to good or poor customer outcomes.
  • Lack of granularity. While users reported overall satisfaction for some conversations, our client lacked a way to assess agent performance within a conversation—i.e., which specific responses helped or hurt the customer experience.

Goal

The project aimed to create two models:

  1. Conversation-level satisfaction model: To score unlabeled conversations for batch analysis and dashboard integration.
  2. Turn-level agent quality model: To evaluate each agent response in real time, enabling proactive correction or intervention.

Our client aimed to use these models to analyze historical performance and create guardrails to improve the quality of ongoing interactions.

Solution

Our team partnered with the client’s subject matter experts (SMEs) to label and generate valuable training data. Here’s how that played out.

SME-aligned LLM-as-a-judge

As a starting point, our team worked with the client’s SMEs to build an effective and reliable “LLM-as-a-judge.” They tasked their “judge” with rating each assistant response in historical conversations on a scale of -1 (poor), 0 (neutral), and +1 (good). They also instructed the judge to offer a written rationale for each.

The team’s baseline prompt achieved a 54.8% alignment rate with SMEs. Over several iterations, the team gathered feedback from SMEs, updated guidelines, and added few-shot learning examples to the prompt, boosting its alignment rate by 12 points to 67.7%.

With a better-aligned LLMAJ, the team was ready to move to step two.

Synthetic training data

With a scalable and reliable proxy for SME judgement, the team identified good and bad responses from historical data. They used these to construct “preference pairs” by asking an LLM to write bad responses to prompts for which they already had good ones, and vice versa. These pairs served as the basis for a process-based reward model (PRM) trained using a lightweight 1.5B parameter Qwen model.

This distilled 1.5B Qwen model outperformed both a naive GPT-4o prompted baseline and a separate 27B parameter reward benchmark. It also returned predictions fast enough to be used in real time.

Conversation-level model

Separately, the team trained a classifier to predict satisfaction for entire conversations. It incorporated multiple features:

  • Intent/action features (tool-use tokens)
  • Turn-by-turn agent quality features
  • Turn-by-turn sentiment features
  • Agent tokens
  • Customer tokens

Using this approach, the model achieved a macro F1 score of 79, a 39-point lift over the baseline.

Results: granular insight, better predictions, and scalable impact

The models our client built together with our team yielded many benefits.

Scaling coverage

Our client had an enormous quantity of unlabeled, unscored conversations. The new models allow them to retroactively score all of them and all those going forward. This dramatically improves our client’s customer experience measurement coverage while reducing reliance on manual labeling and biased, opt-in surveys.

Better operational dashboards

The conversation-level model allows our client to feed up-to-date data into its existing dashboards, making them more comprehensive and actionable.

Real-time virtual assistant agent quality monitoring

The turn-level model lets our client catch low-quality agent responses immediately, allowing real-time corrections before customers become dissatisfied.

A/B testing for virtual assistant updates

Our client previously had no good way to predict the impact of changes to their assistant. The new models allow them to simulate how product changes might impact customer satisfaction before rolling them out.

What’s next

Our client initially developed the models to monitor conversations about appointment scheduling, but they intend to scale the framework to other common support categories, including billing and technical troubleshooting.

Ultimately, this project shifts the focus from retrospective customer satisfaction analysis to a proactive system for managing—and improving—customer experience at scale.

Learn more about what Snorkel can do for your organization

Snorkel AI offers multiple ways for enterprises to uplevel their AI capabilities. See what Snorkel option is right for you. Book a demo today.

Snorkel Logo

Ready to get started?

Take the next step and see how you can accelerate AI development by 100x.