Government keynote presentation by FBI CTO Gregory Ihrie

Gregory Ihrie is the Chief Technology Officer for the FBI, responsible for technology, innovation, and strategy. He also leads the FBI’s efforts in advancing the bureau’s management, policy, and governance of AI systems. Ihrie chairs the FBI’s Scientific Working Group on Artificial Intelligence, as well as the Department of Justice’s AI Committee of Interest. He is one of three officers responsible for AI within the DOJ.

You can find Gregory’s full presentation below or watch the whole event on YouTube.

Additionally, a lightly edited transcript of his talk can be found below.

The Federal Bureau of Investigation (FBI) has a long history of seeking out and identifying technologies that can enhance our capabilities and allow us to better carry out our mission. At the same time, the FBI, as a part of the Department of Justice and the Executive Branch, is bound by the Constitution. We must always operate within the bounds of established law and policy. In other words, at the FBI we need to deliver both effective and trustworthy AI simultaneously. While AI is no longer very “new,” it can feel that way when it comes to the management of AI systems. Advances in technology arrive at an extraordinary pace, including advances in fundamental research and applications of AI systems. The concern around the ethics of AI parallel that pace. We need better ways of understanding risks, of thinking holistically about systems, and of technical and procedural controls that can be used to ensure that AI systems operate in a trustworthy way.

These parallel advancements—of effective and trustworthy AI—present a real challenge for the FBI. Imagine that I asked a data scientist to make a minimally viable AI system, maybe to simply research code or as a training exercise. That data scientist could use the internet to access some high-quality resources—just in blogs or on Github, for example—and they could make a system that works in, essentially, a single afternoon. But if I ask a data scientist to make me a minimally viable trustworthy AI system, they might have a hard time knowing where to start. They would likely have to start with a literature review. They would need to decide how to assess the bias of models and data sets. What will they do to control for model drift and what tools do they need for explainability? Even then, they haven’t even gotten into necessary procedural controls. For this kind of data science research project, you do not have a guarantee of delivery, much less of the time required. It is very hard to do both things simultaneously.

Although in this article I want to focus on the FBI’s management of AI, rather than on applications, I wanted to start at the beginning and establish a concrete reference point for the different kinds of AI use-cases.

The AI use-cases at the FBI fall into two broad categories: “triage,” and biometrics. Triage means the various kinds of text, image, audio, and video capabilities that potentially help FBI agents and analysts review the information gathered via investigations or tips from the public. The primary biometric use case for the FBI is facial recognition. Operationally, in both of these categories, the FBI needs to properly handle massive amounts of data, often in critically time-sensitive conditions. In 2017, for example, at the Las Vegas concert mass shooting, hotel surveillance cameras, computers, mobile phones, and other devices generated more than a petabyte of data. Ultimately, it compelled the FBI to tag and mark more than 21,000 hours of video. High-quality tools, including AI, are essential for handling such large volumes of data required for us to fulfill our mission. And both triage and biometrics use-cases are valuable today for FBI investigations and they will continue to be so into the foreseeable future. So, we need to be capable of delivering on this while ensuring all of the FBI’s AI applications operate within the law and the Constitution and have legal and ethical controls in place.

The most important of these controls for the Bureau is that all of our AI use-cases each have direct human accountability. In other words, a human being is ultimately accountable for any actions taken—not the AI system. In our triage and biometric AI applications, the FBI implements this by having agents and analysts evaluate the results of the AI as lead information, and they hold responsibility for acting on that information. We can never use “blame the AI” as an excuse. For all of the FBI’s current and planned AI use-cases, each ensures human accountability in some way by always having a “human in the loop.” This means that a human is involved in every single decision. Even if potential future AI use-cases, that operate at extremely high speed, human accountability remains a non-negotiable requirement.

For example, AI applications to cybersecurity might need to act to prevent malware execution faster than a human could review the relevant data. Here, a human-in-the-loop control is not going to work. Regardless, a human being ultimately has to be responsible for any actions, and must not operate the system—any system—if they cannot take that responsibility. This requirement of human control and accountability is probably the most powerful ethical tool in our toolbox. Still, it only addresses a portion of the requirements we are obligated to meet.

Specific to artificial intelligence, Executive Order 13-960, titled, “Promoting the Use of Trustworthy Artificial Intelligence of the Federal Government,” covers our use of AI on non-national security systems. The Order requires that AI use-cases be: “lawful and respectful of our nation’s values; purposeful and performance-driven; accurate, reliable and effective; safe, secure, and resilient; understandable, responsible, and traceable; regularly monitored; transparent and accountable.”

Our use of AI on national security systems is covered by the Office of the Director of National Intelligence (ODNI) principles of AI ethics for the intelligence community, and the associated framework attached to it. But it has similar, though not identical, requirements. Many other U.S. government agencies, foreign governments, and international organizations have come out with similar ethical standards. Even though the FBI isn’t bound by these outside standards, we still need to understand the ideas behind them and be able to meet them in the interest of furthering our law-enforcement partnerships around the world.

The first entry point into our AI compliance efforts and this complicated space is an enumeration. We needed to group the list of standards across various ethics, policies and guidance, and once those were grouped we began to organize technical and procedural controls inside the FBI that address each area—using our standardized “must,” “should,” “may” language.

Here is a specific example. From the language of the Executive Order, the “understandable” requirement is very similar to the “transparency, explainability, and interpretability” portion of the ODNI framework. In that case, we could group these together and provide model controls that address both of them. For each use case we must certify that any task could not have been accomplished with a simpler model. So, if you can solve a problem with a small regression or a simple decision tree, we would not use AI. We need to use a method that is inherently explainable. The use case must also explain or communicate uncertainty around results. There are a lot of tools for that job: classical statistics, confidence intervals, credible intervals, text warnings, visual design of the system, etc.—some way to communicate uncertainty must be in place. Further, each use case should have user training and explainability or interpretability measures. These could include applications of various algorithms—lime, shaft, for instance—or projection algorithms—TSNE, YouMap, or something else. Then, any “may” options follow after this.

That’s one of our requirement groups, and there are a lot. There were nine in the executive order above, which I do not have the space to get into here. On the control side, three things came up so often that we actually pulled them out and addressed them separately from the specific principles that they support: training, auditing, and continuous monitoring. We have enumerated requirements for these principles, again using our standard “must-should-may” language. These function almost as sub-principles because they apply to so many of the ethical controls elsewhere in our process.

Now, even though we have pulled out the bare minimum “must” items in this language, it is still not sufficient to make a system ethical. As the FBI sought both external and internal input when we were setting up our AI compliance program, it became very clear that a “checklist” approach was not a good fit. Rather the best practice is to holistically evaluate each AI use case. The FBI implements that holistic evaluation via an AI ethics council that borrows a lot of ideas from institutional review boards (IRBs) commonly used in medical ethics. The Bureau built that AI ethics council from the ground up, with diversity in mind. Checklists and intake questionnaires are useful as a guide, and they help a group like the ethics council with consistency, but the council must ultimately deliberate and evaluate the controls of a use-case in the context of the overall use-case itself, along with any other controls that are being implemented. We do this to formalize the rigor that we expect in ethical evaluations, but it also enables us to be really responsive to the rapidly-changing environment of artificial intelligence, AI ethics, and the research across both of those fields.

Another crucial aspect of the ethics council structure is that it helps the FBI set a coherent framework to evaluate any areas where checklists are simply not suitable. For example, I am not aware of any mathematical method or criterion of stating when an AI prediction has been sufficiently “explained,” or of codifying what it means to be “explained” and to whom. Developers, statisticians, and operators of the systems likely need different kinds of explanations. The concepts of “responsibility,” and “traceability,” which cover human roles and the need for the FBI to have the ability to track how a given decision was made, are primarily about procedure controls, rather than technical ones. The FBI is keenly aware that we need to have robust processes in place for these items, because much of our work is likely to end up in front of a judge. And for any AI use case, the correct answer regarding responsibility and traceability is likely going to require a lot of expertise from numerous disciplines, and that requires evaluation by humans.

Ultimately, trustworthy AI depends on both human and technical factors, and the way we manage that at the FBI is by using a human-driven process to holistically evaluate our methods and predictions and ensure our compliance with policy and with the law.

Frequently asked open questions: Areas in which the FBI welcomes input from the wider ML/AI community:

Standard Processes:

Many who might be reading this are likely aware that there are many other ongoing efforts to codify ethical AI standards, and many of these are open to the public. Some of that work is very high-level, and the FBI views this field as mostly saturated. Frankly, there are diminishing returns for having the N+first framework that gives the definition for artificial intelligence or that tells me that explainability is even a valuable goal. In some cases, that framework even has negative values. Here, then, are some alternative ways to think about how our experience overlaps with these processes.

Ethical interoperability:

Ethical interoperability is a concept developed by Dr. David Denks, who is now at the University of California, San Diego. It focuses on how organizations can partner despite slightly different sets of ethical standards. For example, if the FBI works on an AI use-case under the executive order standard of being “accurate, reliable, and effective,” is that sufficient to also meet the standard of, say, the Department of Defense, which states that an AI must be equitable? What about the reverse—if an AI is equitable does that meet the standard of accuracy, reliability, and effectiveness? Perhaps not in full or only partially. It is difficult to disambiguate how to meet each separate standard. But the FBI cares about this because partnerships are incredibly important to us and our mission. In the field of AI and elsewhere, the FBI wants to be able to share tools, knowledge, and results with our partners.

The FBI really believes this interoperability is possible. Pretty nearly all of the AI ethics standards we have reviewed at a high level are generally compatible, even if they differ in terminology or specific inclusions. We are performing a careful and considered compliance process for responsible AI at the FBI, and it is based on standards that apply to the FBI, but it is time-consuming and expensive in a number of ways, and duplicating that process for every standard under the sun is not ideal. Instead, we think that the field, broadly speaking, will be better off if there is some common core or framework of ethical standards across the board to enable the exchange of responsible and trustworthy artificial intelligence systems.

I would ask us all to consider, then, whether minor modifications are required at all, or if there is existing work that can be used for these goals as is. There is a real cost to issuing principles that are similar but not identical versus those that are already extant.

In the realm of ethical science, is there an integrative way to evaluate the overlap of these ethical principles? Can we build a framework capable of helping an organization like the FBI, to borrow a phrase from the software world, to comply once and then run anywhere with our AI use-cases? That would be really valuable to us and in supporting our organizational partnerships.

AI Risk Management at the National Institute of Standards and Technology (NIST):

NIST is currently running a process to define a risk management framework for AI. Notionally, it is expected to be be complete and released to the public by January 2023. NIST has a strong track record at producing standards like this through open-source and collaborative processes, and government agencies, including the FBI, have been able to incorporate other NIST risk management frameworks before. One example is in the FBI’s cybersecurity evaluations. In other words, this is a management structure and a construct that government agencies already understand. For us, it would be really beneficial to see what lessons the AI ethics field might take away from the processes and procedures that we are already using to manage cybersecurity. Perhaps there are analogs of job roles that already exist in the security field—such as information systems security officers or approving officials—or tools for risk and compliance that could help us manage the inventory and life cycle of high-risk data.

Regardless of the possibilities there, we need to bridge the gap between agencies like the FBI and the high quality research already out there on the technical aspects of AI ethics and on all aspects of AI more broadly. The FBI can engage in some of these areas, but it is not primarily a research organization. At the end of the day, our primary requirement is that we implement effective technical and procedural controls in such a way that they ensure we are adhering to the law and the Constitution. We have yet to find much work at that intersection: How do we apply existing technical controls to meet existing high-level ethics or trustworthy AI principles? Can the AI community create tools that would allow a data scientist to make—harkening back to my example above—a trustworthy AI product as easily or as quickly as any AI product? Can they produce something that, in essence, works “out of the box” and that can be integrated into the FBI’s workflow?

Procedures for human accountability in AI use-cases:

Since the FBI must always maintain human accountability—that remains a non-negotiable for us—we would be very interested to learn how to create procedures for that accountability even when you do not or cannot have a human in the loop. The FBI’s current AI use-cases all have a human in the loop, and that remains an important control and an absolute requirement for us. But the AI research generally includes use-cases where humans simply cannot be in the loop—I mentioned cybersecurity tooling as an example above, in which you must stop malware much more quickly than a human can evaluate the features going into a model. What procedures might help the FBI nevertheless maintain that human accountability even in such use-cases? Can we still have the tooling, the auditing, the explanation and alerting requirements for use-cases where human review must be asynchronous?

Commercial AI and compliance:

What else can groups outside the U.S. government who are working on AI do to help government agencies?

First, begin thinking about the compliance portion of your systems from the very start of your research and development process. For us, tracking the provenance of data is important, and it would be especially helpful for an end-user like the federal government to understand the components of AI data pipelines, rather than receive a more opaque “boxed” system. One requirement for a mission, for example, might require specialized integration that would be greatly facilitated if our users understand the component parts of a system so that we can both track our data and selectively integrate portions of a system most useful for our purposes.

It remains challenging to demonstrate trustworthy, ethical AI with top-level performance data alone. And there is already some promising research here. Model cards and data sheets for data sets are exciting innovations over the last few years. And we have observed that they cover the kind of information that regularly comes up when our AI ethics council evaluates use cases. Those might be some examples of the kinds of things commercial or other FBI partners could build into their designs that would help us tremendously down the road.

One area of AI that perhaps the FBI is not as focused on is AI in a common commercial product. This includes things like map navigation systems, word processors, spell-checks, etc. At the FBI, our use of these common commercial products still has to comply with law and policy, but the FBI is not really the best agency to evaluate whether that is the case. What we really need, then, is to inherit a ready-made set of well-governed and trustworthy AI systems and controls whenever we use these common products. An example of how to accomplish this might be the formation of standards bodies or organizations that can shoulder the burden of that assurance problem. Something like the NIST AI risk management framework that the government already uses, for instance.

Developing responsible technology:

The FBI encourages the responsible use and development of technology generally, and we know that anytime there is an emerging tech development, as full of promise as they might be, they are also ripe for misuse, abuse, and illegality. To use an AI-specific example, the synthetic content generated by generative adversarial networks (GANs) is often a technique used for this kind of misuse—”deep fixes” as the term appears. These can be used for illegal purposes like fraud in addition to their myriad lawful uses. We applaud the efforts of companies who think about and address the wider impacts of their products from the outset, and who guide any outputs toward legal and ethical use-cases. Perhaps things like rate-limiting, server-side logging, and prompts in a user interface are not always appropriate for every application. But any development entity is going to know their technology much better than the FBI ever could, and it fundamentally remains up to those entities to consider the wider social and ethical implications of their technology. It is something we want to encourage the AI community to continue thinking about and how we can all play a part in building more trustworthy systems.

The continued centrality of people and hope for the future:

No matter the size of an organization, it remains only as capable as the people who comprise it. In any sort of emerging technology, an organization has to have people with the knowledge and the skills to manage and understand it. That of course also extends to ethics and compliance. The FBI wants its people to build the skills and knowledge not only to perform mission tasks efficiently and accurately but also to be capable of documenting, checking, and demonstrating that our work is compliant. In the field of AI data science, data literacy is essential, and the most likely way the FBI is getting their people that literacy is via our private sector and academic partnerships. That includes everything from massive online courses to specialized degree programs to commercial training and exchanges.

I have two observations that really make me hopeful for the future, and that I hope makes you hopeful too. The first is that the skills and preparation that incoming government technical personnel already have for dealing with data and AI have never been better. The curriculums around AI have improved tremendously over the last decade. Even advanced programming skills have become more commonplace and no longer need to be tight from scratch for new hires in technical fields. There is a great variety of machine learning and statistics classes out there now, including AI-ethics-specific courses. Almost everyone now is arriving with the ability to deal with data through its entire life cycle, including in the realm of trustworthy AI. That kind of progress has made me really quite hopeful for the future of responsible AI in government.

The second observation is that it has been great to see just how engaged and knowledgeable FBI and DOJ attorneys have become on AI-related topics. My legal colleagues have quickly learned to ask the right questions. They work closely with data scientists and regularly train in technical and data science topics. Attorneys play a really vital role in our compliance and ethical efforts, and it is heartening to see that many in the legal community have taken on the challenge of trustworthy AI in their own professional lives. It engenders a confidence that we can continue to do the right thing in this space. I am humbled by this engagement across professions and disciplines, frankly, and I cannot wait to see what future progress we can make together in building ethical AI systems.

Government keynote presentation by FBI CTO Gregory Ihrie

Frequently asked open questions: Areas in which the FBI welcomes input from the wider ML/AI community:

Standard Processes:

Ethical interoperability:

AI Risk Management at the National Institute of Standards and Technology (NIST):

Procedures for human accountability in AI use-cases:

Commercial AI and compliance:

Developing responsible technology:

The continued centrality of people and hope for the future:

Recommended
articles

Why GenAI evaluation requires SME-in-the-loop for validation and trust

Research spotlight: is long chain-of-thought structure all that matters when it comes to LLM reasoning distillation?

Why enterprise GenAI evaluation requires fine-grained metrics to be insightful

Join our newsletter for expert advice, the latest research, and exclusive events.

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance

Government keynote presentation by FBI CTO Gregory Ihrie

Frequently asked open questions: Areas in which the FBI welcomes input from the wider ML/AI community:

Standard Processes:

Ethical interoperability:

AI Risk Management at the National Institute of Standards and Technology (NIST):

Procedures for human accountability in AI use-cases:

Commercial AI and compliance:

Developing responsible technology:

The continued centrality of people and hope for the future:

Recommended articles

Why GenAI evaluation requires SME-in-the-loop for validation and trust

Research spotlight: is long chain-of-thought structure all that matters when it comes to LLM reasoning distillation?

Why enterprise GenAI evaluation requires fine-grained metrics to be insightful

Join our newsletter for expert advice, the latest research, and exclusive events.

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance

Recommended
articles