How To Overcome Practical Challenges for AI in the Public Sector

I joined Snorkel AI to lead Federal and Strategic Technology Programs because I saw a real opportunity to make a difference for some of the world’s most important missions. I’ve spent the past 15+ years in the national security space, first in the Air Force, then as a data scientist supporting the DOD and Intelligence Community, and later as a tech lead for many incredible investments at In-Q-Tel. In these roles, I’ve been repeatedly amazed by how Artificial Intelligence (AI) and Machine Learning (ML) are already transforming the business of government. But the positive impacts of this transformation, from increasing the efficiency of public services to enhancing the effectiveness of tax dollars, are still in the earliest stages. Throughout the government, AI projects often come up short because of challenges like the pain of creating labeled ML training data and the need for explainability – challenges that Snorkel AI is uniquely and elegantly solving by enabling subject matter experts (SMEs) to iteratively create AI applications powered by their domain knowledge. Of the nearly 5,000 different startups I saw during my time at In-Q-Tel, Snorkel stood out to me as having the greatest potential to accelerate the practical use of AI throughout the government, which is why I knew I had to join the team.

Virtually everyone I’ve met in federal leadership roles over the past several years is looking for AI/ML to enhance and improve their organization’s workflows, as reflected in a 2019 Executive Order highlighting the “paramount importance” of AI. Building on decades of research funding, it’s no wonder to see so many concerted efforts currently underway to accelerate AI policy development and technology adoption. However, the proliferation of AI across the U.S. government has not been as rapid as we all hoped. Government employees have seen first-hand the power that AI-based applications can deliver at home – from the Gmail spam filter, to the Netflix recommendation engine, to automated fraud alerts from their credit card company – and they’re frustrated by the lack of similar capabilities at work. So what’s behind this divide? Public sector organizations generally have access to the same talent, software models, and hardware infrastructure as any private sector company, but they face a number of relatively unique practical challenges that hinder their operationalization of AI.

Government employees have seen first-hand the power that AI can deliver for mobile and web apps used daily but are frustrated by the lack of similar capabilities at work.

Practical Challenges

Today, the mythical story of modern AI is based on heaps of readily-available training data, wielded by legions of Silicon Valley’s finest deep learning engineers, and ruthlessly optimized for raw accuracy above all other considerations. But in the federal space, there’s a starkly different reality in terms of how AI can be implemented and how it must be applied.

  1. Training Data Availability: Training custom ML models to tackle unique and complex problems requires lots of labeled training data. Existing labeled training datasets are widely available for common tasks like detecting pictures of cats on the Internet or classifying the sentiment of movie reviews. However, for more specialized tasks like categorizing internal government documents or extracting custom entities from incident reports, government agencies must label their own training data, which generally involves a ton of manual labor.
  2. Privacy & Security: Public sector institutions often work with highly sensitive data, from personally-identifiable information to classified materials. Such data typically cannot be shared with external parties for crowdsourced labeling or use with public cloud services. Internal access is also frequently restricted to only those with a valid need-to-know, making the task of labeling datasets, training ML models, and deploying AI applications even more difficult.
  3. Workforce Utilization: While virtually no organization, government or otherwise, has as many data scientists as it might like, one thing many public sector agencies do have is a large number of SMEs. Unfortunately, there’s often no good way to incorporate these experienced knowledge workers into the development of AI applications, without reducing them to data-labeling automatons or expecting them to retrain into ML engineers.
  4. Scalability & Adaptability: Most public sector SMEs are organized into many small teams working on many different topics, not one large team focused on a single topic. As such, there’s unlikely to be a critical mass of expert labelers available for every conceivable ML use case. Additionally, labeling isn’t a “one and done” task — training data must be periodically updated to adapt to changes in real-world inputs or shifts in organizational objectives, which only exacerbates the scalability issue.
  5. Explainability & Avoiding Bias: When it comes to an important policy decision or military action, it’s essential to be able to articulate why a choice was made. It can be tricky to explain why a given ML model produced a particular output — especially with “black box” techniques and non-technical audiences. Increasingly, public sector organizations are required to provide a justification for automated decisions, in the hopes that such transparency will help expose and eliminate potential biases.

Putting Snorkel Flow to Use for the Public Sector

Snorkel Flow is an end-to-end enterprise ML software platform that incorporates a novel approach to creating labeled training data called “programmatic labeling.” Based on research funded by DARPA, DOE, NIH, ONR, and others, trusted by the world’s leading organizations, and committed to serving the public sector, Snorkel Flow leverages your entire organization’s knowledge to accelerate AI application development in a variety of ways:

Programmatic Labeling: Rather than spending weeks or months painstakingly labeling data by hand, Snorkel Flow enables SMEs to create “labeling functions” that rapidly label massive amounts of training data in hours.

Labeling Functions can turn simple inputs, like keywords or database lookups, into a powerful way to rapidly create a massive amount of labeled training data.

On-Prem or Private Cloud: Snorkel Flow can be deployed in the cloud, in an on-premise computing environment, or even on a standalone machine, keeping your data private by avoiding off-premises manual labeling.

Simple, yet Powerful: In Snorkel Flow, SMEs create robust labeling functions in a push-button user interface with no coding required. For data scientists and ML engineers, the platform is also deeply configurable via an SDK and API.

Iterative Adaptation: Snorkel Flow helps teams iteratively train, deploy, monitor, and adapt AI applications to changing inputs and objectives, easily modifying labeling functions as needed instead of repeating the painful hand-labeling process.

Traceable & Auditable: Programmatic labeling allows you to trace the output of an ML model back to specific labeling functions created by individual SMEs. This provenance and lineage can help with auditability, explainability, and other compliance requirements.

Here at Snorkel AI, we are incredibly honored and excited to be working with several great customers across the federal government who use Snorkel Flow to build AI applications for some of their most mission-critical use cases. By overcoming these practical challenges, AI and ML will positively transform the government across the board, and I’m personally thrilled to help Snorkel play a big role in facilitating this growing transformation.

Sign up for a demo today if you’re interested in learning more about how Snorkel Flow can help make AI practical for your organization.


Accelerate your AI application development today

Technology developed and deployed with the world’s leading organizations

Related articles

Building a Successful AI Startup
Read more
Forager: Rapid Data Exploration for Rapid Model Development
Read more
Meet the Snorkelers
Read more