In our previous posts, we discussed how explainable AI is crucial to ensure the transparency and auditability of your AI deployments and how trustworthy AI adoption and its successful integration into our country’s critical infrastructure and systems are paramount. In this post, we dive into making trustworthy and responsible AI possible with Snorkel Flow, the data-centric AI platform for government and federal agencies.

Collaborative labeling and training workflows

Only a rapid, iterative, direct feedback loop between data labeling and model training can enable the governance, explainability, and adaptability needed for responsible AI.

Challenge: Labeling and modeling direct feedback 

Data scientists building ML models on hand-labeled datasets are often disconnected from the subject matter experts (SMEs) who labeled the data. This constrains the data scientists’ insight into the labeling process. If any problems are later discovered in the data, the data scientists have limited options for addressing these issues.

Caption: Traditional methods involve handing over a static, manually labeled dataset. Re-labeling requires restarting the manual labeling process all over again.

Snorkel Flow advantage: Collaborative labeling and training workflows

Snorkel Flow allows companies the ability to bring together data scientists and SMEs to build trustworthy AI solutions in ways that were previously not possible.

By enabling organizations’ in-house SMEs to programmatically label massive training datasets quickly and efficiently, companies are able to accelerate AI/ML application development and reduce the workload burden on critical knowledge workers. Feedback from data scientists on the performance of ML models trained on this programmatically-labeled data can then be shared with SMEs. 

Components of the ML process are connected, transparent, and driven by feedback instead of linear, opaque, and immutable

Driving a virtual cycle of rapid improvement by providing actionable feedback to SMEs on how to improve their labeling functions and training data. This iterative workflow at the heart of Snorkel Flow makes it possible to build adaptable ML models and AI applications that can be analyzed and modified as needed to ensure trustworthy AI.

The “bill of materials” for responsible AI and ML

An effective and comprehensive AI governance regime requires an understanding of all upstream components, such as any third-party training data and pre-trained models. If these upstream components are unable to ensure trustworthiness, no downstream system built from these components will be able to ensure trustworthiness either.

Challenge: Defect visibility in pre-trained models

A recent 200-page report authored by more than 100 scholars across Stanford University explored many of the risks and issues associated with relying on pre-trained ML models. Among many legal and ethical considerations, the report highlighted the potential for inequity and misuse when incorporating such “foundational models” into downstream applications. The fact that downstream applications necessarily inherit all known and unknown defects of the upstream foundation model, as well as the lack of a deep understanding of their behavior and failure modes, led the authors of this report to warn that their use “demands caution.”

Snorkel Flow advantage: Fully-transparent “bill of materials” for responsible AI and ML

Due to the time and cost typically required to build AI/ML applications from scratch, some organizations rely on external resources like third-party APIs and pre-trained models as a means of saving time and resources. However, this introduces additional risk, and makes a full governance regime for trustworthiness more challenging.

Caption: Underlying every component of a Snorkel Flow application is a combination of human-readable Python functions and data tables

Snorkel Flow solves this problem by making it practical for enterprises to build their own custom AI/ML applications in-house, quickly and easily, without relying on pre-built components with opaque origins. Across the entire process, from labeling training data to training ML models, Snorkel Flow enables comprehensive control of, and visibility into, the full AI supply chain.

Wrapping up trustworthy and responsible AI

For organizations building AI applications using programmatic labeling and the data-centric approach pioneered by Snorkel AI and enabled by Snorkel Flow, all of these requirements for trustworthy AI become possible.

We believe Snorkel Flow will demonstrate several fundamental advantages to tackling the development and deployment of Trustworthy AI:

Governance, auditability, and adaptability—as easy as managing source code

In Snorkel Flow, inspecting how a training dataset has been (i.e., what an AI model was taught to do) is as easy and well-understood as inspecting software source code – because that’s exactly how it’s managed and tracked.

Tracing model errors back to training data origins for AI software assurance

By governing training data programmatically, Snorkel Flow lets you easily trace the lineage of AI model errors back to the data, and rapidly fix them.

Identification and elimination of training data biases quickly and systematically

Manual labeling introduces biases in many subtle ways, leading to AI models with biases that are difficult to discover and resolve. Snorkel Flow’s approach allows you to systematically correct biases by modifying the code or using our no-code UI.

A core focus on close SME collaboration

Many AI ethics issues arise from a lack of understanding between the data scientists who build AI models and the Subject Matter Experts (SMEs) who understand the data and mission objectives. Snorkel Flow puts SMEs at the center of the process to avoid this critical gap in communication.

Snorkel AI has been successful in delivering products and results to multiple federal government partners. To speak with our federal team about how Snorkel AI can support your efforts at understanding and developing trustworthy and responsible AI applications contact