In AI We Trust




The increasing prevalence of artificial intelligence (AI) in society presents enormous opportunities while posing challenges and risks. Few companies or industries today remain untouched by AI, and those that haven’t adopted AI yet are wrestling with whether and how to integrate it into their systems and decision-making.

Professor and Chair Zoe Szajnfarber

In September 2021, GW’s School of Engineering and Applied Science was awarded $3 million from the National Science Foundation Research Traineeship (NRT) program to transform the graduate education model and prepare future designers to navigate the opportunities and risks inherent in designing new AI algorithms and deploying them in real-world systems.

Zoe Szajnfarber, professor and chair of the Engineering Management and Systems Engineering Department, is co-leading the project with Robert Pless, professor and chair of the Computer Science Department, and colleagues from both departments. Szajnfarber explains what it means for AI to be trustworthy, what skills future AI designers need and their new approach to training Ph.D.s. 


Dotted Line


Q: What do we mean by trustworthy AI? How do we characterize trust in the AI context?

A: Our operational definition of trust is the willingness of an organization or society to rely on AI systems to make decisions. In December 2020, the White House issued an executive order, titled “Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government,” which defined 10 principles for designing, developing and acquiring AI for use in government. These include requirements that algorithms are accurate, reliable and effective; explainable to subject matter experts, users and those affected by the algorithm; and fair and consistent with the values of our nation. These broad definitions serve to frame our technical approaches to defining measures of accuracy, explainability, interpretability and fairness that are key foundations of trust.            

NRT team
From left, SEAS Professors Rachelle Heller (CS), Ekundayo Shittu (EMSE), Zoe Szajnfarber (EMSE) and Robert Pless (CS) are leading the $3 million “Co-Design of Trustworthy AI and Future Work Systems” project funded by the National Science Foundation. (William Atkins/GW Today)

Q: Is there something you feel people misunderstand or misjudge about AI?

A: AI is often talked about as though it’s one uniform thing that can solve all our problems, but that’s misleading. There are currently a large number of algorithmic approaches that broadly fall under the umbrella of AI, and they vary widely in terms of both their capabilities for inference and prediction and the opportunity for unknowingly introducing bias and the limits of where they will work.

What I think is not very well understood, in popular coverage, is how AI can fail in ways traditional computational approaches do not, which has significant implications for how to best leverage and regulate it. Advanced AI predictions are often driven by surprising (and highly context-specific) connections made by processing vast amounts of data.

A classic example involves an extremely accurate AI-generated prediction of whether a picture includes a dog or a wolf. While a human might have focused on the shape of the ears or the color of the eyes, the algorithm noticed that wolf pictures were much more likely to have snow in the background, which proved a very useful classifier in this context. However, this highly accurate classifier—snow—might also label a skier as a wolf when applied to new data in a way that the human model focusing on the pointy ears never would. Currently, regulations and certification approaches create safeguards for typical failure modes, but the way that AI generates predictions changes that game. Researchers are actively working on tools to support better explainability and interpretability. 


Q: For future designers and adopters of AI, what skills are critical, and how does your program address that need?

A: Currently, algorithm innovators and decision-system designers are trained in separate silos. This is a recipe for both missed opportunities and the introduction of poorly understood risks. In terms of opportunity, an important potential of AI in work lies in rethinking the nature of a “task.” Systems designed to take advantage of the strengths of experts should look different than ones designed for gig workers or AI algorithms. At the same time, while algorithms are capable of processing massive amounts of information, there is increasing recognition of the risks of delegating decisions completely (i.e., trusting the algorithm). Algorithms are known to exaggerate bias, sometimes fixating on the wrong salient features. Leveraging the opportunities for novel AI integration requires new strategies for trusting the corresponding tools. Unlocking these types of opportunities while mitigating the emergent risks requires researchers with depth and cross-training in AI algorithms, work system design and a capacity to analyze the societal implications of emerging capabilities. Our program aims to provide that training and community.

Q: This project takes a different approach to Ph.D. education. Can you explain?

A: Ph.D. programs often emphasize technical excellence and depth in a narrow focus area. While expert knowledge is the minimum bar for deploying AI, understanding the issues that might arise during implementation or after an AI product has been deployed are also critical to the impact of new technology. For the most part, academia prioritizes “pure” research problems, but for students wishing to engage in our thematic area, many of the features that distinguish important topics from uninteresting or potentially dangerous ones rest on the interaction between theory and practice. Being able to make that judgment hinges on meaningful interaction with practitioners. Currently there are limited opportunities for such interaction. At the same time practitioners are regularly being asked to select which tools to adopt absent of the requisite background.

Our goal in designing the fellowship program was to balance disciplinary depth with interdisciplinary practice. We are implementing this in myriad ways, including seminars, cross-listing professionally oriented certificate classes with introductory Ph.D. classes and our flagship summer bootcamp. We’re particularly excited about the bootcamp, which leverages principles from design thinking to rapidly explore and prototype convergent research ideas in context in a highly collaborative format. We’re running it for the first time right now; it’s taking all of us out of our comfort zone but in a good way, by challenging us to define core assumptions in our disciplines and learn new ways of thinking. I already find it changing how I think about research problems.



A group of George Washington University Ph.D. fellows met with members of the Virginia Task Force 1, a domestic and international disaster response resource sponsored by the Fairfax County Fire and Rescue Department. They spoke to experts to learn about the different elements that go into a successful search and rescue situation and potential opportunities for AI to support their operations.






Q: You mentioned the importance of problems in context. What sites were chosen for the bootcamp and why?

A: The first two weeks of our bootcamp are focused on exposing ourselves to the real-world messiness of implementing AI in operational systems. For this year, we identified three sites that varied in their level of engagement with and adoption of AI tools as well as the safety criticalness of their application area.

We met with Comcast’s AI research division to discuss the opportunities and risks of implementing AI across their technology platforms, for example, with voice-assisted search via TV remote controls and home security. We then visited the MITRE Corporation, a federally funded research and development center and leader in air traffic safety and associated regulation. We toured several of their labs and had a chance for informal discussion with their technologists focused on evaluating the adoption of new AI and machine-learning tools in their systems.

Finally, we visited the Fairfax County Urban Search and Rescue training site, where the tools tend to be less technologically advanced, but there’s an interest in exploring advanced decision-support systems that could improve their ability to rescue victims from collapsed structures. As a result, the visit focused more on learning about their context and probing potential research opportunities. As part of the bootcamp, we spent time collectively digesting what we’d learned (and were inspired by), and what that means for the types of problems we wanted to work on.

Q: What makes GW uniquely situated to do this kind of work and training?

A: Many of us came to GW for the opportunity to enjoy both rigor and relevance. It’s really rare to be able to do theoretical work and then walk down the street to your stakeholder (often NASA, for me) and share the importance of the research insight, which is something that I’ve enjoyed as a PI for years. With this program we’re trying to create these opportunities on a larger scale. The dean of our engineering school, John Lach, often refers to this as the school’s “engineering and…” approach. If we weren’t in D.C., engaging with stakeholders from day one, we would be solving different, likely less impactful problems, and our students would be having a very different Ph.D. experience.

When I talk to prospective students, I always emphasize the value of being a small tight-knit community in a world-class city. One of the advantages of our small faculty size is that we have more opportunities to interact. In a larger school, Dr. Pless and I may never have connected, and that connection and the research conversations it has spawned has been one of the most fun and rewarding parts of this project.