Applied AI

Explainability through provenance and lineage

April 19, 2022
4 min read

In our previous post, we discussed how trustworthy AI adoption and its successful integration into our country’s critical infrastructure and systems are paramount. In this post, we discuss how explainability in AI is crucial to ensure the transparency and auditability of your AI deployments.

Outputs from trustworthy AI applications must be explainable in understandable terms based on the design and implementation of the underlying ML model(s)—down to how the training data was labeled—as well as the behavior and performance of the system itself.

Challenge: Minimal traceability 

Once an ML model is trained on a hand-labeled dataset, it’s no longer possible to trace the lineage of the model output back to the provenance of the specific rationale someone used to manually label examples that led the model to produce that output.

Don’t miss the opportunity to explore approaches to help our government agencies effectively and efficiently leverage Trustworthy AI. Join us at this insightful online event bringing experts on the federal and defense lines on April 21, 2022, at 12:00 ET.

Snorkel Flow advantage: Explainability through provenance and lineage

Programmatic labeling enables explainability via data lineage, making it easy to review the business logic or code (tracked as labeling functions) that labeled the data that trained your model, and allowed you to answer questions like “what taught my model to do X and how I fix it?” 

Programmatic labeling enables you to trace the ML model output back to the original labeling functions that were used to programmatically label the data used to train the model. Those labeling functions can then form the basis of an explanation describing exactly how this model was created by your organization, at the level of the specific rationale(s) with which this training data was labeled. This explanation can also detail the key features involved in the overall decision process, which features correspond to any given output, and how the outcome might have changed if the inputs were different.

Insuring explainability through providence and lineage is key to unlocking the transparency and auditability of your ML models.

Snorkel Flow captures the entire ML pipeline from labeling the data to deploying and analyzing your models – making all components traceable and auditable

Systematic detection and correction of bias

When biases are present in an AI application, the organization must be able to detect and eliminate them in the training data to correct this unwanted attribute quickly and comprehensively.

Challenge: Systematically correcting bias

If bias is detected in a hand-labeled dataset, the records must be painstakingly reviewed, and changes must be remedied one by one. Otherwise, the dataset must be discarded entirely, costing organizations time and money without ensuring that the underlying bias has been eradicated. 

Snorkel Flow advantage: Detection and correction of bias

Although bias can come from all parts of the AI/ML development pipeline, the largest source is manually labeled training data. With programmatic labeling, the creation of labeling functions within an organization can be subjected to a rigorous review process to identify and eliminate sources of systematic bias before they affect the training data or ML model.

For biased data that make it past this review process, the powerful error analysis tools built into Snorkel Flow can help you identify and remediate these issues by editing your labeling functions in code or using the no-code user interphase. The corrected labeling functions can then be used to quickly build a new version of your programmatically-labeled training data and train a new version of your ML model.

Snorkel Flow’s best-in-class analysis tools allow you to drill down to specific labeling functions, causing a given model output

This process of systematically detecting and eliminating bias is completed in a matter of minutes or hours, without requiring you to throw out your existing AI application and start all over from scratch.

Snorkel AI has been successful in delivering products and results to multiple federal government partners. To speak with our federal team about how Snorkel AI can support your efforts at explainability, understanding and developing trustworthy and responsible AI applications, contact federal@snorkel.ai.

Share this article
Alexis Zumwalt portrayed
Alexis Zumwalt
Director of Federal Strategy and Growth

Recommended articles

View all articles
agentic-in-action
The Standard for Agents You Can Trust: Lessons from the Federal Front Lines
In the first installment of Agentic in Action — a series about real AI deployments, not demos — Snorkel AI’s Kevin Olivieri sat down with three people who have spent their careers where trust isn’t optional: Chris Sniffen, Federal Applied AI Lead at Snorkel AI; John Hickey, President of August Schell; and Mike Baca, CIO of August Schell. The conversation focused on
June 5, 2026
Snorkel Team
collab-gym-thumbnail
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
At our latest Snorkel AI Reading Group, Yijia Shao (Stanford NLP) stopped by our San Francisco office to present Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration. As LLM agents get better at automating tasks on their own, a large class of real-world problems still needs a human in the loop – for their preferences, their domain expertise, or simply for control.
June 4, 2026
Alexis Sobel
Image
Benchtalks #2: The future of coding benchmarks
For our second Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with John Yang, a Stanford PhD student and creator of the SWE-bench franchise, SWE-smith, CodeClash, and most recently ProgramBench. Highlights More on ProgramBench: See the benchmark and the upcoming leaderboard at programbench.com. More from John Yang: Publications and writing at john-b-yang.github.io. Snorkel
June 3, 2026
Vincent Sunn Chen
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.