CUSTOMER STORY

How A Leading International Telecom Provider Scaled Agentic AI with High-quality Synthetic Data

20+

Task-specific data quality evaluators

+35 point

Function calling accuracy (according to MMLU)

2 Months

To design and build custom data curation and evaluation frameworks

Customer Story

An Asian telecom leader aimed to expand its offerings with a flagship AI personal assistant. However, the team faced critical roadblocks:

Poor personalization with unreliable outputs
Fragmented, manual development workflows
Lack of scalable, metrics-driven evaluation systems

These gaps made it challenging to iterate quickly, inflating development costs and stalling deployment

The company partnered with Snorkel AI to radically improve how it created and evaluated data for agentic systems to overcome these issues. In under two months, we built modular, scalable data curation pipelines. These pipelines enabled high-volume, high-quality training data for key planning and reasoning use cases—delivering a model performance boost and laying the groundwork for production-ready AI systems that are faster, cheaper, and more effective.

Turning AI ambition into reality

Our client, an Asian telco giant, serves 30M+ subscribers and operates across broadband, digital content, and enterprise services. Recently, it began expanding into AI infrastructure and applications, including a “do-it-all” personal assistant app.

Despite major investments, the telco giant’s early agentic AI prototypes struggled with:

Context retention: Models couldn’t maintain context across multi-turn conversations
Generic responses: Plans were vague and impersonal
Tool use: Agents either didn’t call tools or used them incorrectly
Vague evaluation: Feedback relied on manual “vibe checks” with no hard metrics to measure progress
Slow iteration: Manual reviews and a rigid system design slowed improvement

These challenges stemmed from a lack of scalable data development and evaluation infrastructure. Without high-quality training or benchmark datasets, progress was slow, models underperformed, and iteration cycles stalled.

Goal

The goal was to build a best-in-class AI personal assistant powered by open-source models. The company explored building upon proprietary APIs, but wanted a reliable internal model that they could control. The project initially focused on use cases such as meal and trip planning, which required agentic reasoning, tool calling, and constraint handling.

To create models that could reliably complete these tasks, the company needed:

Scalable data pipelines for generating and curating training and evaluation data
Custom evaluation rubrics for advanced behaviors (e.g., multi-turn planning, tool chaining)
Task-specific models fine-tuned to reliably perform on the app’s core applications

Solution

Our client worked with us to create a reusable, modular data pipeline that spanned the full AI development lifecycle—from data creation to evaluation.

Data generation for agentic use cases

norkel’s team helped the telco build infrastructure to programmatically generate high-quality, multi-turn conversations that included:

Persona and scenario creation: to simulate diverse user profiles and intents
Tool use modeling: including validation and formatting of tool calls
Constraint-driven planning: to enforce adherence to user goals and preferences
Scenario diversity: to balance representations across user types and intents

In service of this data, Snorkel’s experts built a suite of more than 20 task-specific data quality evaluators that could automatically assess:

Tool call correctness and format
Constraint adherence
Plan quality and coherence
Action sequencing and reasoning behavior

This new infrastructure supplemented the team’s existing manual review process—reducing reliance on “vibe checks” and academic benchmarks, and enabling faster iteration, continuous evaluation, and integration of real-world feedback loops.

Rapid, scalable impact

In just two months, we built:

A custom evaluation framework for advanced agentic tasks
Scalable, reusable pipelines for generating 60K+ training and evaluation datapoints
Fine-tuned OSS models with 8% higher performance over Llama base models

Results: Better models, faster development, lower costs

Working with Snorkel, the telco accelerated iteration cycles and unblocked development by automating training and evaluation pipelines. Through fine-tuning with curated synthetic data, the project increased the performance of their chosen open source LLM by 8% above baseline on internal evaluation metrics—and more than that on some tasks.

This move away from proprietary models reduced API costs and gave the team greater control over deployment. With reusable pipelines now in place, the team can rapidly spin up new datasets and expand its agentic assistant capabilities to more use cases.

Ready to get started?

Take the next step and see how you can accelerate AI development by 100x.

Join a live demo

Talk to an expert

How A Leading International Telecom Provider Scaled Agentic AI with High-quality Synthetic Data

Customer Story

Turning AI ambition into reality

Goal

Solution

Data generation for agentic use cases

Rapid, scalable impact

Results: Better models, faster development, lower costs

Ready to get started?

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance