This post showcases a panel discussion on the academic and industry perspectives of ethical AI, which was moderated by Director of Federal Strategy and Growth, Alexis Zumwalt, Fouts Family Early Career Professor and Lead of Ethical AI (NSF AI Institute AI4OPT), Georgia Institute of Technology, Swati Gupta, Chief Data Officer, Department of the Navy, Thomas Sasalsa, Senior Manager of Responsible AI (Equity and Explainability), LinkedIn, Sakshi Jain, and Data Science Fellow, BNY Mellon, Skip McCormick.

  • Thomas Sasala emphasized that for the Department of Defense, the most important thing to consider is the ethical use of AI. The Navy and Marine Corps are using AI for a variety of purposes, including engagement capability and force support.
  • Swati Gupta discussed that as algorithms become more pervasive, the ethics and AI field is interested in understanding the algorithmic impact on the well-being of people. She encourages cross-cutting research with law and public policy people, along with optimization machine-learning and people working in various applications.
  • Sakshi Jain explained that there are many ways bias can make its way into a system. For example, when humans make data, ML models might potentially be learning those biases, and when products get launched the experience is not suitable for one group as it is for another.
  • Skip McCormick discussed that at BNY Mellon, they use AI and ML to arm their clients and employees with superior intelligence and foresight. But also use trust-based metrics to measure their models while taking responsibility for the decisions they make personally and corporately.

Watch the full panel discussion on YouTube, or below.

Also, a lightly edited transcript of the discussion is found below:Zumwalt: Joining us for today’s panel we have an array of subject matter experts from across academia, technology, finance, and government. Our panelists today are: Tom Sasala, the Chief Data Officer of the Department of the Navy; Swati Gupta, Fouts Family Early Career Professor and Lead of Ethics AI at the Georgia Institute of Technology; Sakshi Jain, the Senior Manager for Responsible AI Equity and Explainability at LinkedIn; and Skip McCormick, Data Science Fellow at BNY Mellon.

Why is trustworthy AI important for your industry or field, and what steps are your organizations taking to address it?

Zumwalt: We will go in the order of Tom, Swathi, Sakashi, and Skip. 

Sasala: In the Department of Defense, we have a lot of mission requirements to require trustworthy decision making, writ large. We are looking at AI across a broad spectrum of support functions as well as autonomous functions. The most important thing that we are narrowing in on is (and I know we will address this in the second half of the question, here) really around the ethical use of AI and around: What do we really want to trust machine learning or artificial intelligence infrastructure to assist a human with, or maybe even potentially replace a human? In terms of trustworthiness, it really comes down to not only the algorithms themselves but also the data that we are feeding the algorithms and the speed in which we can get the models trained, in a way that we feel comfortable that recommendations or the battle management aids, as we call them, are available to the commanders and the sort of decision makers in a way that is timely against the targets that we’re prosecuting. It becomes very super complicated when you move beyond something (and I’m going to put this in air quotes) as “simple” as autonomous vehicles, because that’s not even remotely simple, per se. But, if you are using some sort of machine-augmented decision making for kinetic targeting, specifically, where we are going to engage in some activity that might result in lethality, there are a lot of strings attached to that. 

We are really trying to address this across the broad waterfront, and so we have broken—I’m going to call it infrastructure, that’s a loose term—but the way in which we are approaching AI is broken up into a couple big buckets because of this. There is really that war-fighting application of AI, in terms of battle management aids and human-assisted decision making. Then there’s what we are calling the “unmanned vehicle” portion of AI, which is really navigating in the open waters or in the air or subsurface using autonomous vehicles. And then the third part is really more what I am involved with on a day-to-day basis, which is prescriptive and predictive analytics using more of the business data. It is really an approach that aims to help our senior leaders understand the space that we are in from an operations perspective and from a business management perspective, and give them those insights that maybe they have not had in the past, and maybe that they want to answer some very specific questions. We have established an enterprise data environment to enable that capability. We are looking right now to: how do we encapsulate some of the basic infrastructure from an AI perspective, and use AI as an enabler? 

On the Navy side, we have the navigation implementation plan, navigation plan implementation framework, that has AI as a core enabler to a lot of the other NIF objectives. Then on the Marine Corps side, we are also looking at, in conjunction with some of the other land force capabilities, a.k.a. the Army: how can we enable AI for a variety of different purposes mostly for engagement capability, but also in terms of force support? If you consider just simple things like voice translation from different languages into other languages, kind of a “Star Trek” universal translator, is something that is really important whenever you are doing a force protection and when you’re doing a cleanup after some sort of activity that we have going on. 

Zumwalt: Thank you so much. Yes that cross functionality is really key. I worked closely with the Navy when I was at the Army looking at something similar. So, Swati, if you could answer the same question?. Swati Gupta: Hi everyone, I’m Swati Gupta. I’m an assistant professor and Fouts Family Early Career Professor at Georgia Tech, and I also serve as the Leader of Ethical AI for a recently awarded NSF Institute fellowship since 2021. I have a PhD in Operations Research from MIT and my research interests are in machine-learning optimization and algorithmic fairness. My work cuts across various domains, such as e-commerce and quantum optimization energy, and I think about ethics in every algorithm that I come across. 

To answer the question, coming from an optimization machine-learning background, our field is interested in understanding, as algorithms are becoming more pervasive, what is the impact of algorithms on the well-being of people? Particularly research-wise, developing cutting-edge algorithms and technologies to understand: What is fair? What is biased? How can we have transparency in algorithms? How can we incorporate ethics? Maybe some things that are legally driven or policy driven—I will talk about that a little bit later as well. And, how do we make machine-learning and optimization algorithms accountable, especially those that interact with society. Specifically, at Georgia Tech, we encourage a lot of interdisciplinary research and in these areas of short-formed as fate, we specifically encourage cross-cutting research with law and public policy people, along with optimization machine learning and people working in various applications. I really do believe that it is important to understand the legal landscape that we are operating in and tailor our algorithms and find new questions to solve because of the changing landscape. 

The next thing that I would say is that, again, one of the main things that we are doing in our organization is teaching course development. Teaching students at the undergraduate level, at the graduate level, even at the K–12 level, to think about bias and algorithms and understand, just from the grassroots level, what does it mean to solve an optimization problem that is not biased? What does it mean to look at numbers and not just make decisions based on numbers, but really know there is some uncertainty or some social context behind these numbers. We are really trying to develop new courses that can incorporate ethics into the basics of machine learning and AI itself. And with that, at the NSFI Institute itself we are doing internal ethical audits for all projects to make sure that we are policy compliant, to even understand what an ethical framework would look like, and what ethics and design questions could look like. We want to train students to be able to ask dependent on the application that they are working in, and really go beyond IRB protocols—which are set for human behavioral research—but really go beyond those and understand: what are our well-being metrics that need to be incorporated into technology from design and not just as an afterthought. 

That’s maybe a long answer to a short question. 

Zumwalt: Absolutely a very important answer. Education is definitely critical, especially as we think about the next generation, when it comes to building AI/ML applications ethically. So, Sakshi, on to you. 

Sakshi Jain: Hi all, it has been so exciting to hear all these responses from different areas. I’m Sakshi. I work at Linkedin. I lead the responsibility efforts here, and so my team focuses on making systems fair, making systems equitable, explainable, trustworthy, all of those terms here that we used. Before this, I was working for seven years in anti-abuse, which is also broadly a part of trust, where we were working on protecting Linkedin from malicious activity that’s happening on the site. So I have been in and around the space for a very long time, and I feel passionately about moving and uplifting the historically marginalized groups, especially on this platform. 

Your question as to why is this important for the space. One is, Linkedin is one of the largest social networks for professionals out there, and every day literally millions of members are connecting with opportunities. Connecting with the right person, right connection, right job, applying to the right job, and being informed on the platform. We all understand that your professional growth plays a big role in your well-being—of yourself and people around you—so it was “no questions asked.” Like, of course we have to think about what it means to be ethical and transparent. It is not just a unique position and opportunity but also a responsibility we feel toward our members. On that front, Linkedin has been investing quite significantly in deeply understanding all the different aspects of responsible AI. I generally tend to talk more about equity and fairness, which I feel is super difficult. We are focusing on understanding how, historically, societal biases can make their way into a system like Linkedin, which is huge. And it is complex and nuanced, to say the least. 

Just to give a few examples, we know that humans are biased. Humans can be biased and they make data, and our models might potentially be learning those biases. Beyond that there are so many other ways bias can make its way into a system. One is maybe your product experience is actually not as suitable for one group as the other, which you may not even realize. Maybe members or different segments actually behave differently and it just appears as bias when you start to measure it. We know that, for example, females do not tend to self-promote themselves as much as males, and so when you look at data it may just appear like, “oh are we being biased if females are not filling up their profile as much.” It is not just a technical problem, it’s a social-technical problem, which everyone here understands. And identifying the right solution and curating it has to be very carefully done, because it is very easy to solve the wrong problem, here. 

Two things that I think would stand out, where I feel has been super helpful in Linkedin, is, one: the operating model. Since we are all practitioners here, you would understand that fairness equity, especially, can be quite inefficient, having to draw alignment across all the different business verticals of the space. I think operating model is a key here, and Linkedin has a very largely cross-functional team, across policy experts, lawyers, engineers, AI specialists, designers, etc., who work with the executive team centrally to identify first of all: what principles we stand behind. Because we have to answer that question to know what to solve, and then operationalize and launch it across all the different verticals. Greg mentioned, I think, the ethics council and IRB. We have something similar in the company to understand how the different experiments are actually impacting the ethical side of it. That’s one. 

The other piece that we have been doing is actually asking members to volunteer the information on their race, ethnicity, and gender. This is quite a bold move, I think, on behalf of a company, and I don’t know of many companies who do that. The motivation behind this is really, if you want to know how big the bias is, if you want to measure it, we need this data to understand and investigate. It is really easy to say we can’t measure, so we don’t know, so we won’t solve the problem. It is really easy to do that. But it is very hard to actually convince our members to give us this information and to build systems that can, in a privacy-preserving manner, allow us to draw insights. That is something we are really proud of. 

Thirdly, we are trying to share our challenges and learnings more openly with the industry through open-source tools, because we want feedback. We want to know how they are solving it because it is not easy to begin with. These are some of the things we have been doing broadly in the responsibility space. 

Zumwalt: Thank you so much. Yes, I was really curious to hear about that one, because we use Linkedin every day. Snorkel is growing and hiring like crazy. I know Greg mentioned that in his talk too, and we use the Linkedin tool quite a bit. Thank you so much for what you do. And Skip, let’s bring us home on this question. 

Skip McCormick: We are talking about trustworthiness, and that means we really need to understand: what do we mean by trust? If you like, google it, you will get that it is a firm belief in reliability, truth, ability, or strength. In a financial context, where I come from, Bank of New York Mellon, a trust is also an arrangement where one party holds the property of another party for the nominal good of the beneficiaries. That’s the classic trust instrument. Trust is also a verb, and it is related to having faith in someone or something. In Bank of New York Mellon, trust is essential in all of those ways—in every decision we make, every recommendation or action we take, and every responsibility that we assume. So our use of AI has to augment the trust that BNY has earned over almost three centuries of service. This is because, ultimately, every decision is a human decision. It is owned by the human who made the decision. We want to make it to the best of their abilities, based on the most comprehensive basis and understanding we can employ. Our AI and ML techniques, they have to stand up to the same standards of trust that the human has to stand up to in a marketplace where trust is everything. 

At BNY Mellon, our objective is to use AI and ML to arm our clients and employees with superior intelligence and foresight. Those methods have to be reliable and verifiable and sound. This is another way of saying that our AI and ML must be worthy of trust by the human beings who look to them for insight, and anything less is actually useless. So, where machine learning provides amazing scale and coverage that exceeds human power, in the end it is still probabilistic and requires expert human judgment at the decision and action stage. And this is where the responsibility is personal, 100 percent human, and the trustworthy AI systems are better systems. They produce better results. So, trust is the foundation. Our users require reliable trust-based metrics. We have a class of metadata we call feed-fairness, ethics, accountability, and transparency. These are meta-models that we use to measure all of our models so that the decision makers who want to use the outputs of the models can make a decision in the appropriate way based upon the appropriate reliability and application of what the model is telling them. Ultimately, we have to own the decisions we make personally and corporately, not the computer. Trustworthy people require trustworthy tools, and my dad was a bit of a carpenter, and he always said, “amateur carpenters blame their tools or the wood, the pros take full responsibility.” The same principle applies to a data scientist. The data is the data. You create the models, you are responsible for the output, and you can’t blame it on the model or the data. The responsibility falls to you. 

So, bringing it back to Snorkel, one of the key parts of trust through the models is: do you really understand the input of your data? Were you able to keep up with the training? If that process is too big for humans to deal with, how do you know? If you have automated processes, which is what I like about Snorkel, you can quickly retrain your models on updated tagged information as frequently as you need to based upon those metrics that are telling you how reliable your model is. 

Zumwalt: Thank you, Skip. And yes, that traceability back to the individual that wrote the labeling function is really one of those key cornerstones of Snorkel Flow technology. So thank you for pointing that out. Thank you, Skip. And Skip, I have to say that we have all been admiring your background as well. 

So, after this—thank you, guys, all, for answering that question—we are going to ask each of you an individual question. I’m going to start with Tom.

Tom, how do you think about incorporating ethics and responsibility into an enterprise data architecture for the Navy? 

Sasala: Yes, absolutely. Thank you so much, and I realized I failed to really introduce myself during the first time, so maybe it is fair to say: Tom Sasala, Chief Data Officer for the Department of the Navy. Just an interesting point of clarification, the Department of the Navy is comprised of the U.S. Navy in the U.S. Marine Corps, so a lot of people just say “the Navy,” when in fact that is more than one thing, so that actually introduces a lot of complications into our naval data architecture. And I say “naval” meaning the Secretary the U.S. Navy and U.S. Marine Corps. And I see Skip nodding his head there, who I remember from my prior life in a different community. But I have been twiddling with those knobs in the background since I sat down. So I just want to say: that’s awesome. 

You know, it is really hard from an architecture perspective to actually say how we want to embed a lot of this into here. What we are trying to do is—and this is a little bit of a riff on the Navy theme—it is called buoys, not barriers. We want to put up the kind of barriers left and right and allow people to navigate down the channel in a safe environment. If they choose to deviate from the boundaries that we’ve established in our architecture and in our policies, procedures, governance, and some of the other things, then they are doing so at their own risk. What we are asking people to do is just to be honest with themselves and be honest with us when they want to leave the channel, when they want to go over one of those barriers. Just let us know. We are here to help, but we want to make sure that they know that those challenges that they are going to face, because they are outside of that kind of “normalness,” are going to be there. 

From an architecture perspective, we are trying to establish controls that allow—and I’m going to use this in a very loose term, and I really appreciate Skip defining trustworthiness—but the ethical use of data is a matter of subjectiveness on a lot of different fronts. We are trying to put controls in place that allow people to have access to as much data as they need without necessarily giving them access to everything that they might not need. Typically, when people get access to things that they were not normally afforded, they have a tendency to misuse or misappropriate some of that stuff. So, in this case we do restrict health data, we do restrict personally identifiable information, we have laws around how we need to handle and control that. Those laws apply broadly across companies as well as the public sector, in my case the Department of the Navy. There’s also classified information and sensitive source information that we need to make sure that we control as well. We do have different tiers, in terms of the traditional DoD classification mechanisms, that we tier those people. You have to have a security clearance to get access to that data. 

“Then the second largest barrier is getting clean, curated data that is complete and trustworthy.” – Thomas Sasala, Chief Data Officer, US Navy

Then we also try to enforce, to the extent that we can, this notion of “need to know.” Way back after 9/11, we did try to pivot from this need-to-know to the “responsibility to provide,” and I’m really trying to kind of push that model as well. So all the data will be available for use.

The other thing that we want to talk about in terms of ethical use—and I just want to point this out—is we intrinsically want to trust that the people that have access to the data are going to use it appropriately. There’s a cautionary tale when you log into the system that says that you are being monitored, and so we are monitoring their use and their activity, and if we see something that’s out of bounds, that triggers a series of questions and maybe even potentially actions. Some of them might be positive, some of them might be adverse. We want to really just trust people. 

The other big thing we are trying to do in the DoD, which is counter-cultural, is not pre-assume the use of the data once it’s been created. A lot of times data will get created for a mission and people think that data can only be used for that mission. I use this example. When I was in the intelligence community, the information we are gathering when you badge-in and out of the building can be used for a lot more than whether or not you are at work or not at work. It could be used, just from a health safety perspective, if there is a fire drill. Are you still in the building, are you not in the building? It could be used for auditing your time card. Did you actually show up to work or not show up to work? It could be used for some sort of anomalous behavior detection. Are you coming to work at 2:00 AM, when you don’t normally come to work at 2:00 AM, and does that align to some sort of organizational or geographical event that would necessitate you to be there? 

So, there are lots of different uses of that data. We are trying to bake that into our architecture. I’ll just say, there are different services that we are trying to offer and different algorithms we are trying to put into place to maximize the use of the data, but also to flag what I would say is—I don’t want to say inappropriate use of data, but maybe questionable use of data—that might be some sort of ethical quandary. Because, as we are starting to see now on the business side, as we have integrated human capital data we have integrated financial management data, we have integrated some of what we call our logistics and sustainment data, which is really our materials. Like: how much we have of what and where it is in the world. As we integrate that with some other more operational concerns, people can really start to devine a lot of interesting things about what the Navy is intending to do by where we are moving supplies, by the types of money we are spending on what types of material, and that kind of stuff. Those integration points and those dashboards at that highest level, which I’m operating at because I work directly for the Under Secretary and the Secretary of the Navy, to give them a holistic view of the Department of the Navy. Who actually gets to see these dashboards, and what level do you granulate these dashboards? And so we are struggling with that a little bit, that balance of giving access to data, giving access to analytics, empowering people with tools to do the analytics. I think it was actually said, something about allowing people to do that work and the previous speaker talked about, the algorithm is the easy part. Absolutely the model is actually the easiest part, getting them the data and getting them access to the data in the DoD is by far the largest barrier. 

Then the second largest barrier is getting clean, curated data that is complete and trustworthy, that we can trust that we can use it for some sort of decision. My challenge is to take that to the next step and look over the horizon and say, “well what decision did we use that data for?” And if the data comes into question later on, can we go back and trace it back to that decision, and then maybe revisit the decision or not. And so that revision recall is a huge problem for us, and that is really an operational concern. Again, in former lives we had other concerns around some of the data, and certainly I appreciate the perspectives from Swati and Sakshi, being from the private sector, that may or may not be a question for them. I don’t know. But as we move from that defensive data analytics work into the offensive data analytics work, and how do we use the data to drive our business and our operations, it is really an open question right now, in terms of: I don’t think it’s an architecture question. I think it’s a governance and a procedure question. But certainly architecturally we have definitely put controls in place technologically—automated, I might add, because that is the other thing, is the data moves faster than humans. We need to automate all that and then fine-tune it.

The last thing I will just mention is we are auditing our auditing, if you will. We are actually trying to audit how well our decision-making on the use of the data is, to see if we need to tweak that decision-making algorithm. It is really hard to think of the data creates data that also is data, and so the management of that becomes very hard. So with that I will pass it back to you. 

Zumwalt: Modern problems require modern solutions, don’t they? I think we can relate to Skip, as he was doing. You hit it on the nose there and and I completely agree. Swati, much of your research focuses on algorithmic fairness. 

Swati, how do we build algorithms that are fair if people can’t always agree on what “fair” is? 

Gupta: That is an excellent question, Alexis. I just want to start with saying that the point of machine learning in AI is to discriminate. It is to create positive and false labels. So first we need to address: what is bias in data? And I think this has been said earlier, but let me just use one of the definitions from a nature article a few years back that I really like: that bias is when scientific or technological decisions are based on a narrow set of systemic structural or social norms and concepts, and the resulting technology can privilege certain groups and harm others. So understanding the different parts of this definition makes the notion of fairness really relevant to the application that we are considering it under. What is the application that we want to define a notion of fairness for? 

On that note I had a few thoughts that I will quickly share, because I know we are running out of time. So the first one is if you want to mitigate the impact of historical biases in an application—and that is something that Sakshi also mentioned—that maybe I want to give more visibility to people who have traditionally not been very visible on social platforms such as Linkedin. So when considering machine learning pipelines, we want to, let’s say, give loans to historically disadvantaged groups such as Latinos and African Americans who have not been given very good interest rates in the past. So that’s good. 

The second thing that can guide what a notion of fairness is, is maybe the reflection of values of an organization. For instance, is inclusivity a value? For certain applications, even the identification of values may not be enough. For instance, in organ transplantation organizations in the US, the directive from national policies is to have efficiency and fairness in the way the organs are allocated, and that determines the prioritization of these organs—of kidney transplants, kidney donors and kidney recipients. The values are efficiency and fairness. So do we prioritize the number of lives saved, or the quality in life-years, or the demographics which are getting the kidneys, or the representation of blood types? One can get to trade-offs, even when the values are a little bit clearer in an organization. One thing that we have been trying to do is generate portfolios of possible solutions, a small number of portfolios of possible solutions that can then be audited by stakeholders, policy makers, and lawyers, and they can debate on which one is a better solution. This is not something that we should just try to solve algorithmically. 

“If you want to mitigate the impact of historical biases, give more visibility to historically disadvantaged groups when considering machine learning pipelines.” – Swati Gupta, Fouts Family Early Career Professor and Lead of Ethical AI (NSF AI Institute AI4OPT), Georgia Institute of Technology

The next point is, sometimes when applications are forward-looking, then propagating the impacts up to many years forward can help us identify: what are the values? For instance, if freelancers and the gig economy continue to be ranked on the basis of user reviews, which are known to be biased, we are ultimately rejecting these users from the platforms. And maybe you want to fix that. The other points I had were sensitivity to noise. Maybe in expectation we are doing pretty well, but overall the burden of noise in the data is incurred by certain groups. One concrete example that I want to maybe explore here is engagement with stakeholders to understand what people care about. Consider a resume-screening AI algorithm. Do people care about impact in terms of false negatives when predicting higher ability? Or do they care about getting the top talent? And this is obviously from an organization and applicant view. Brandon, for instance, also mentioned changing the names of resumes from “Jennifer” to “John” increases the number of offers for postdocs, and this was a study that was done a few years back. But here law can basically help us define what to do when there are these competing objectives. We need organizations to work, so prioritizing talent is important and incentivizing people to have more talent is important, but also the notion of inclusivity or, legally, anti-discrimination laws have disallowed a host of hiring and promotion practices that operate as built-in headwinds for minority groups. Looking at the law and policy directives, we worked with the lawyers to understand: what can an AI system actually do to be able to screen resumes that have some more representation? Really the solution that came out of it was to be transparent about uncertainties in different parts of the model and account for those when selecting people. So give people the benefit of the doubt and use that as a potential lever to increase diversity in the hires. 

Last point, and then I’ll stop, is recognizing that decisions build on top of decisions. So we are trying to understand a pipeline of decisions in a machine learning model that also interact with society and feedback from the users, and there might be a societal mechanism such as admissions that takes in all this data. So really trying to audit the pipeline at the different junctures is important to understand the dependencies and ultimately come up with the notion of fairness, even something that could be at different levels, but then ultimately it works toward making the entire system much more ethical. 

Zumwalt: Thank you so much, Yes thank you so much. Those are really great use cases that you walked us through there. Thank you. 

Sakshi, what approaches are you taking to make the use of AI at Linkedin more equitable, privacy sensitive, and explainable? 

Jain: Thank you for that question, and I love that Swati covered so many of the hiring use cases to motivate the problem, so thank you for that. We have some efforts, and I will touch a little bit on each: fairness, privacy, and explainability. There is a lot of cross-pollination, as you can understand.

In fairness, as a central theme, one other thing that you are trying to move toward is: how do we make it super simple for modelers to be able to build responsible AI? And I think Greg touched upon this, which I made a note of. Like, oh they can build a model today in an afternoon, but when we ask them to build a model which is equitable it takes so long. And it starts with literature review. So that’s actually toward our goal. Our vision is to move to a place where modelers, when they train the model, there is an automated monitoring system for bias. And if the bias is found, then they can automatically slap on a more modular framework for mitigation. 

Now this is our vision, and of course there are so many challenges. To begin with, one of them is: what is the definition of fairness and what is a mitigation solution that you would apply? And we all understand that in academia even elsewhere, there is no one definition of fairness that we all agree on. The way we are going ahead is by creating a small suite of definitions and a corresponding mitigation solution in this larger framework that modelers use. And I just want to give maybe one or two examples of fairness cases that look very different. 

On Linkedin, when you get recommendations for who you may want to connect with, it is a two-sided marketplace. We have to think about fairness not only on the attributes of someone who is being recommended, but also on the attributes to whom it is being recommended. So now how do these two systems interact with each other for a fairness notion? That is one. Second is on content moderation. We take down content which does not comply with our policies, and with everything that is going on in the world it is so easy for the models to learn something which is directly attributing to a gender, or a protected attribute, to a race, and it may just learn that when actually taking down our content. So how can we evaluate our models to ensure that with this, the same post would have the same action from our models if all the context was the same except for maybe the gender of the content subject. Can we say that and claim that? These two problems look so different, and so we have to figure out what is the suite of measurements and monitoring solutions and mitigation. 

“Privacy is critical for ethical AI, we are exploring two privacy techniques: homomorphic encryption and differential privacy. Both are ways of detecting spam harassment on the platform without looking into member data.”

The other place where AI gets used is, a lot of our clients and partners they want to actually have a more diverse slate. How do we do that in a creative way such that you are not actually putting someone less qualified above someone more qualified, and yet give them a diverse slate. That is a very complicated question. Without thinking about affirmative action discrimination, how do we ensure all of that and do it cleanly? That is the other direction we are thinking about. 

In privacy, we are super stoked to think about homomorphic encryption, a technique where you can actually do modeling on encrypted data. And, so if you want to detect spam harassment on the platform without looking into member data. We are exploring: will this technology have promise? And we are actually super stoked about it. The other more commonly used is differential privacy. This is, as I said, as we are collecting this race data, how do we draw learnings from it without leaking any private information? So these are really the two directions we are exploring. 

Finally, on explainability, I don’t have to motivate this I think, so I will touch upon what we have been doing. In this case, as well for all the modelers, we are building these small analysis tools—actually not just for models—but these analysis tools that can help them understand where the errors could be concentrated. What is it that is deciding the decision on this model? What in the training data is actually causing the wrong answer? I think someone touched upon that as well. And also, toward our members, explainability, which looks very different than explainability toward the modelers: what is your field composition today looking like? People don’t really realize. And can we provide them knobs to give us feedback on, “hey I’m actually also interested in this,” and “hey actually I’m not interested in this,” to bring it back.

So these are some of the highlighting points of what we have been doing in some of these areas. 

Zumwalt: Sakshi, stoked, as you should be. I think I speak for everybody here when I say, thank you so much for what you do. We are all very grateful, and these are really important use cases, so thank you. 

Skip, what consequences flow from an erosion of trust in AI/ML systems? 

McCormick: Well I’m going to condense it because I know we are out of time. Basically, failures to comply with the financial laws and regulations, they affect trust in the marketplace. It impedes confidence. And as a data scientist it is frustrating because it slows down AI/ML adoption. It makes our work less meaningful. Makes it less satisfying, and ultimately harder. We are trying to affect the culture change here, and that’s hard. 

Whenever there is a failure in a model, we generally mean, “oh the model failed to predict something with the stipulated precision recall or some other metric.” That is a normal failure. That is expected and the standard part of the overall approach, as long as we always know when the model output isn’t meeting those criteria. And therefore we can adjust how much we rely on it. The process established to detect and address those failures—that has to be trustworthy. But the model can fail. That’s the whole point, is it is predictive. We should know this model has a 30 percent precision, that means 70 percent of the time it is wrong. That is not a failure, that is normal. We have to remember that data science is a science. It is hypothesis-driven. It is a process where failure of a hypothesis is expected and necessary in a changing context. As a scientist, I need continuous accurate measurement and testing. That is the part I need to trust. It is necessary to drive the retraining, the model improvement over time, and that’s what makes the process trustworthy. The individual models may or may not be trustworthy, I just need to know when that is the case. I need to be able to trust those metrics. 

“A regulatory or ethical failure can affect not only the responsible person, but also the employee, the customer, the industry, or the whole economy.” – Skip McCormick, Data Science Fellow, BNY Mellon

So that is in contrast to a regulatory or ethical failure, which is completely different. That is a matter that can affect not only the responsible person, but also the employee, the customer, the industry, even the whole economy. That is a heavy-duty responsibility that can never be borne by a computer algorithm or model. It is always borne by the modelers themselves, the technologists and the leaders who govern or validate that work. So that kind of failure is no bueno.

Snorkel AI has been successful in delivering products and results to multiple federal government partners. To speak with our federal team about how Snorkel AI can support your efforts at understanding and developing trustworthy and responsible AI applications contact or request a demo to learn more.