Image

How a global telecom scaled agentic AI with synthetic data

Impact
20+

Task-specific data quality evaluators

+35 point

Function calling accuracy (according to MMLU)

2 months

To design and build custom data curation and evaluation frameworks

The challenge

An Asian telecom leader aimed to expand its offerings with a flagship AI personal assistant. However, the team faced critical roadblocks:

  • Poor personalization with unreliable outputs
  • Fragmented, manual development workflows
  • Lack of scalable, metrics-driven evaluation systems

These gaps made it challenging to iterate quickly, inflating development costs and stalling deployment

The company partnered with Snorkel AI to radically improve how it created and evaluated data for agentic systems to overcome these issues. In under two months, we built modular, scalable data curation pipelines. These pipelines enabled high-volume, high-quality training data for key planning and reasoning use cases—delivering a model performance boost and laying the groundwork for production-ready AI systems that are faster, cheaper, and more effective.

Turning AI ambition into reality

Our client, an Asian telco giant, serves 30M+ subscribers and operates across broadband, digital content, and enterprise services. Recently, it began expanding into AI infrastructure and applications, including a “do-it-all” personal assistant app.

Despite major investments, the telco giant’s early agentic AI prototypes struggled with:

  • Context retention: Models couldn’t maintain context across multi-turn conversations
  • Generic responses: Plans were vague and impersonal
  • Tool use: Agents either didn’t call tools or used them incorrectly
  • Vague evaluation: Feedback relied on manual “vibe checks” with no hard metrics to measure progress
  • Slow iteration: Manual reviews and a rigid system design slowed improvement

These challenges stemmed from a lack of scalable data development and evaluation infrastructure. Without high-quality training or benchmark datasets, progress was slow, models underperformed, and iteration cycles stalled.


The goal

The goal was to build a best-in-class AI personal assistant powered by open-source models. The company explored building upon proprietary APIs, but wanted a reliable internal model that they could control. The project initially focused on use cases such as meal and trip planning, which required agentic reasoning, tool calling, and constraint handling.

To create models that could reliably complete these tasks, the company needed:

  • Scalable data pipelines for generating and curating training and evaluation data
  • Custom evaluation rubrics for advanced behaviors (e.g., multi-turn planning, tool chaining)
  • Task-specific models fine-tuned to reliably perform on the app’s core applications

The solution

Our client worked with us to create a reusable, modular data pipeline that spanned the full AI development lifecycle—from data creation to evaluation.

Data generation for agentic use cases

norkel’s team helped the telco build infrastructure to programmatically generate high-quality, multi-turn conversations that included:

  • Persona and scenario creation: to simulate diverse user profiles and intents
  • Tool use modeling: including validation and formatting of tool calls
  • Constraint-driven planning: to enforce adherence to user goals and preferences
  • Scenario diversity: to balance representations across user types and intents

In service of this data, Snorkel’s experts built a suite of more than 20 task-specific data quality evaluators that could automatically assess:

  • Tool call correctness and format
  • Constraint adherence
  • Plan quality and coherence
  • Action sequencing and reasoning behavior

This new infrastructure supplemented the team’s existing manual review process—reducing reliance on “vibe checks” and academic benchmarks, and enabling faster iteration, continuous evaluation, and integration of real-world feedback loops.

Rapid, scalable impact

In just two months, we built:

  • A custom evaluation framework for advanced agentic tasks
  • Scalable, reusable pipelines for generating 60K+ training and evaluation datapoints
  • Fine-tuned OSS models with 8% higher performance over Llama base models

The results

Better models, faster development, lower costs

Working with Snorkel, the telco accelerated iteration cycles and unblocked development by automating training and evaluation pipelines. Through fine-tuning with curated synthetic data, the project increased the performance of their chosen open source LLM by 8% above baseline on internal evaluation metrics—and more than that on some tasks.

This move away from proprietary models reduced API costs and gave the team greater control over deployment. With reusable pipelines now in place, the team can rapidly spin up new datasets and expand its agentic assistant capabilities to more use cases.

Share this customer story

More customer stories

View all stories
Leading Global Firm-case study banner image
Deploying production AI in <60 days to accelerate claims review 67%
A leading global firm transforming insurance subrogation operations with AI found that manual review processes capped their throughput to ~30% of available claims. This bottleneck left significant revenue on the table and froze their ability to scale. The path to automation was further blocked by severe data imbalances where the critical signals for coverage appeared in only a small fraction of claims, making traditional AI models unreliable.
DIU-case study banner image
DIU enhances decision-making resilience with Snorkel AI
Strategic dominance in the Indo-Pacific relies on the ability to track and coordinate friendly forces — ”blue objects” — with absolute precision. To maintain operational awareness in dynamic and contested environments, the Department of War identified a requirement for adaptable, dual-use technologies that enhance logistics and decision-making resilience.
Top 5 Global Telco-case study banner image
From stalled pilot to $43M annual ROI and 95% accuracy
This Top 5 Global Telco aimed to evolve its internal billing co-pilot into a customer-facing chatbot capable of serving its global customer base.

For models that need to be right. Not just good enough.