Applied AI

How Pixability uses foundation models to accelerate NLP application development by months

January 11, 2023
5 min read

Pixability is a data and technology company that allows advertisers to quickly pinpoint the right content and audience on YouTube. To help brands maximize their reach, they need to constantly and accurately categorize billions of YouTube videos. Using Snorkel Flow, Pixability leveraged foundation models to build small, deployable classification models capable of categorizing videos across more than 600 different classes with 90% accuracy in just a few weeks.

Using AI to help customers optimize ad spending and maximize their reach on YouTube.

There are billions of videos on YouTube. Every minute, another 500 hours are added to the platform. Given this deluge of content, advertisers struggle to identify whether videos are brand-aligned (and brand-safe). Pixability uses machine learning to automatically identify and categorize YouTube content so that advertisers can maximize their reach with suitable content and optimize ad spend.

Challenge

As of 2022, viewers watch, on average, over 700 million hours of YouTube content daily1. To maintain relevancy, Pixability needs to continuously and accurately categorize billions of videos to provide advertisers with the necessary insight to be sure their ads run on brand-suitable content. To do this, Pixability had trained a natural language processing (NLP) model to classify videos automatically, yet the performance wasn’t strong enough.

To improve the training data quality (and reduce the number of revision cycles required to translate domain knowledge to a third-party service), the team realized they needed an alternative to hand-labeling data.

  • Time to label training data for ML solution was prohibitively slow given the reliance on external data labeling services that required multiple iterations. 
  • Constrained collaboration due to the limited amount of time domain experts and data scientists had to solve for ambiguous labels, which blocked their ability to iterate quickly.
  • Rich information was buried within titles, descriptions, content, and tags and was difficult to normalize.

Goal

Minimize the time spent labeling high-cardinality training data while expanding their ability to provide more granular insights to their customers. 

Solution

Using Snorkel Flow’s Data-centric Foundation Model Development workflow, Pixability was able to build an NLP application in less time than it took a third-party data labeling service to label a single dataset. This data-centric workflow allowed Pixability to scale up the number of classes they could classify to over 600 while also increasing model accuracy to over 90% with the new workflow. The large increase in possible classes means Pixability can better place their customers’ ads on the most suitable YouTube content, improving the return on customer video ad spend and satisfaction with Pixability’s services.

The team began by using Snorkel Flow’s Foundation Model Warm Start with zero-shot learning to jump-start training data creation using foundation model (FM) knowledge. From there, they used Foundation Model Prompt Builder to develop and refine prompts to correct out-of-the-box FM errors and pull more domain-specific knowledge from various FMs (rather than relying on a single one). As an example, they used the unstructured video title tags and descriptions stored in their Snowflake data warehouse and created prompts that asked the FM to classify videos based on the description.

Using foundation models to classify videos with Snorkel Flow

Referencing the results of a 50-class multi-label classification model, Jackie Swansburg, Pixability’s Chief Product Officer, said, “With Snorkel Flow, we can apply data-centric workflows to distill knowledge from foundation models and build high-cardinality classification models with more than 90% accuracy in days.

With this programmatic approach to labeling data using knowledge from foundation models, the team generated 500,000 labeled training data points (with virtually no ground truth) that were used to train a model with 90% accuracy. Additionally, the team was able to unlock multi-label NLP capabilities, further improving the granularity Pixability can provide its customers. Now instead of being able to just classify a video as related to “sports,” they could classify it more specifically as “basketball” or “hockey”.

  • Auto-labeled by capturing domain expertise and foundation model knowledge as labeling functions and applying intelligently en-masse.
  • Improved collaboration with domain experts across lines of business to drive programmatic data labeling and iteration, unblocking the data science team.
  • Unified platform for training data creation and model training, including guided error analysis for efficient, effective iteration.

Pixability was able to create a model in weeks instead of months by relying on the Snorkel AI team’s expertise with foundation models and Snorkel Flow’s ability to integrate easily into their existing cloud data warehouse. Furthermore, by labeling programmatically in-house, the Pixability team had greater control over their NLP training data creation and rapid iteration, freeing the capacity to expand to more use cases. As a result, Pixability advanced their product roadmap by several months, unlocking new capabilities that will help them provide deeper insights and improved services to their customers. 

Results

400k programmatic labels

sourced from FM responses and keyword analysis

600+ class multi-label NLP model

that provides greater granularity and support for custom content categories

90% accuracy

on a model with 26x more classes (and 90% accuracy on a 50-class model)

  1. https://blog.youtube/inside-youtube/innovations-for-2022-at-youtube/
Share this article
Nick Harvey author profile
Nick Harvey
Director of Product Marketing

Recommended articles

View all articles
Image
Building AI-Native Systems for Federal Infrastructure: A Conversation with Rezaur Rahman
Christopher Sniffen recently sat down with Rezaur Rahman — CIO / CISO / CAIO at the Advisory Council on Historic Preservation — for a conversation on what it actually takes to build frontier AI for federal infrastructure. They get into the limits of frontier models on geospatial reasoning, mechanistic interpretability for applied AI, the trick that makes vision models useful
May 14, 2026
Snorkel Team
Image
Code World Models and AutoHarness for LLM Agents
At our latest Snorkel AI Reading Group, Carter Wendelken of Google DeepMind walked us through two related papers he presented at ICLR: Code World Models for General Game Playing and AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness. Both ask the same question from opposite ends: when you want an LLM to act reliably in a complex, possibly
May 14, 2026
David Burch
coding-agents-eval
Why coding agents need better data, evals, and environments
Coding agents have moved from tab-complete to teammate. They autonomously inspect repositories, edit files, run commands, diagnose failures, and work through multi-step engineering tasks. That creates a harder reliability problem. A model that only suggests code is easy for a human to evaluate. A coding agent refactoring your repository and testing its own changes is much harder to supervise –
May 11, 2026
Justin Bauer
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.