Explainability through provenance and lineage
In our previous post, we discussed how trustworthy AI adoption and its successful integration into our country’s critical infrastructure and systems are paramount. In this post, we discuss how explainability in AI is crucial to ensure the transparency and auditability of your AI deployments.
Outputs from trustworthy AI applications must be explainable in understandable terms based on the design and implementation of the underlying ML model(s)—down to how the training data was labeled—as well as the behavior and performance of the system itself.
Challenge: Minimal traceability
Once an ML model is trained on a hand-labeled dataset, it’s no longer possible to trace the lineage of the model output back to the provenance of the specific rationale someone used to manually label examples that led the model to produce that output.
Snorkel Flow advantage: Explainability through provenance and lineage
Programmatic labeling enables explainability via data lineage, making it easy to review the business logic or code (tracked as labeling functions) that labeled the data that trained your model, and allowed you to answer questions like “what taught my model to do X and how I fix it?”
Programmatic labeling enables you to trace the ML model output back to the original labeling functions that were used to programmatically label the data used to train the model. Those labeling functions can then form the basis of an explanation describing exactly how this model was created by your organization, at the level of the specific rationale(s) with which this training data was labeled. This explanation can also detail the key features involved in the overall decision process, which features correspond to any given output, and how the outcome might have changed if the inputs were different.
Insuring explainability through providence and lineage is key to unlocking the transparency and auditability of your ML models.
Systematic detection and correction of bias
When biases are present in an AI application, the organization must be able to detect and eliminate them in the training data to correct this unwanted attribute quickly and comprehensively.
Challenge: Systematically correcting bias
If bias is detected in a hand-labeled dataset, the records must be painstakingly reviewed, and changes must be remedied one by one. Otherwise, the dataset must be discarded entirely, costing organizations time and money without ensuring that the underlying bias has been eradicated.
Snorkel Flow advantage: Detection and correction of bias
Although bias can come from all parts of the AI/ML development pipeline, the largest source is manually labeled training data. With programmatic labeling, the creation of labeling functions within an organization can be subjected to a rigorous review process to identify and eliminate sources of systematic bias before they affect the training data or ML model.
For biased data that make it past this review process, the powerful error analysis tools built into Snorkel Flow can help you identify and remediate these issues by editing your labeling functions in code or using the no-code user interphase. The corrected labeling functions can then be used to quickly build a new version of your programmatically-labeled training data and train a new version of your ML model.
This process of systematically detecting and eliminating bias is completed in a matter of minutes or hours, without requiring you to throw out your existing AI application and start all over from scratch.
Snorkel AI has been successful in delivering products and results to multiple federal government partners. To speak with our federal team about how Snorkel AI can support your efforts at explainability, understanding and developing trustworthy and responsible AI applications, contact federal@snorkel.ai.