Our weekly SRI Seminar Series welcomes Dylan Hadfield-Menell, the Bonnie and Marty Tenenbaum Career Development Assistant Professor at MIT’s Department of Electrical Engineering and Computer Science, and a Schmidt Futures AI2050 Early Career Fellow. Hadfield-Menell runs the Algorithmic Alignment Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.
Hadfield-Menell’s research focuses on the problem of agent alignment: the challenge of identifying behaviours that are consistent with the goals of another actor or group of actors. His research group works to identify solutions to alignment problems that arise from groups of AI systems, principal-agent pairs such as human-robot teams, and societal oversight of machine learning systems. He is also interested in work that bridges the gap between AI theory and practical robotics, and the problem of integrated task and motion planning.
You can’t have AI safety without inclusion
It has long been observed that specifying goals for agents is a challenging problem. As Kerr’s classic 1975 paper ‘On the Folly of Rewarding A while Hoping for B’ observes, many reward systems often create incentives for undesired behavior. This concern motivates work in AI alignment: how can we specify incentives for AI systems such that optimization induces behavior that reliably accomplishes our subjective goals? In this talk, I will discuss how brittle alignment arises as a natural consequence of incomplete goal specification. I will present a theoretical model that shows sufficient conditions such that uncontrolled optimization of any goal that fails to measure features of value eventually produces worse outcomes than no optimization at all. Next, I will show how the same theoretical result applies to questions of inclusion in value specification: if we reframe the model such that the different features of value are how different people define value, then optimizing an incomplete goal can be expected to harm excluded people. As a result, technology that aligns an agent with a single person or organization’s values is dangerous. I will conclude with a discussion of research directions for multi-stakeholder alignment and discuss the need for decentralized value learning and specification.
Dylan Hadfield-Menell is the Bonnie and Marty (1964) Tenenbaum Career Development Assistant Professor of the Department of Electrical Engineering and Computer Science (EECS) at Massachusetts Institute of Technology (MIT) on the faculty of Artificial Intelligence and Decision-Making and a Schmidt Futures AI2050 Early Career Fellow. He runs the Algorithmic Alignment Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. His research focuses on the problem of agent alignment: the challenge of identifying behaviours that are consistent with the goals of another actor or group of actors. His research group works to identify solutions to alignment problems that arise from groups of AI systems, principal-agent pairs (i.e. human-robot teams), and societal oversight of ML systems. He is also interested in work that bridges the gap between AI theory and practical robotics, and the problem of integrated task and motion planning.
Hadfield-Menell received his PhD in computer science from UC-Berkeley, and his MS and BS (both in computer science and electrical engineering) from MIT. Hadfield-Menell is an NSF Graduate Research Fellowship Recipient and a Berkeley Fellow, with multiple conference papers published in the AAAI/ACM Conference on AI, Ethics, and Society and the ACM/IEEE International Conference on Human-Robot Interaction, among others. He was the technical lead on The Future Starts Here Exhibit for the Victoria and Albert Museum, and has interned at Facebook and Microsoft.
To register for the event, visit the official event page.
The SRI Seminar Series brings together the Schwartz Reisman community and beyond for a robust exchange of ideas that advance scholarship at the intersection of technology and society. Seminars are led by a leading or emerging scholar and feature extensive discussion.
Each week, a featured speaker will present for 45 minutes, followed by an open discussion. Registered attendees will be emailed a Zoom link before the event begins. The event will be recorded and posted online.