Skip to content
You are using an unsupported browser. For best results please use the latest versions of Chrome, Edge, Firefox or Safari.
Loading Events

« All Events

  • This event has passed.

SRI Seminar Series: Owain Evans, “Truthful language models and AI alignment”

November 30, 2022, 3:10 pm - 4:30 pm

SRI Seminar Series: Owain Evans

Our weekly SRI Seminar Series welcomes Owain Evans, a research associate at Oxford University’s Future of Humanity Institute. Evans’ research interests are in AI safety and the future of AI, with a current focus on truthful and honest AI.

Talk title

“Truthful language models and AI alignment”

Abstract

Like it or not, language models will play an increasingly central role in how people learn about the world and communicate to others. This poses a challenge. Can we create models that are factually accurate, calibrated (e.g., avoiding overconfidence), and reliably non-manipulative? This kind of model would help individuals and society to form more accurate beliefs and to avoid misinformation. It would also have the potential to help with the problem of AGI alignment or AGI risk (Bostrom 2015, Russell 2019).

I will present recent work on defining and measuring “truthfulness” for language models, on calibration, and on using models to forecast world events. I will discuss connections to reducing epistemic harms from AI and to the problem of AGI alignment.


Recommended readings

O. Evans, et. al., “Truthful AI: Developing and governing AI that does not lie,” arXiv preprint, 2021.

S. Lin, J. Hilton, O. Evans, “TruthfulQA: Measuring How Models Mimic Human Falsehoods,” arXiv preprint, 2021.

A. Zou, et. al., “Forecasting Future World Events with Neural Networks,” arXiv preprint, 2022.


About Owain Evans

Owain Evans is a research associate at the Future of Humanity Institute at Oxford University. His research interests are in AI safety and the future of AI. He received his PhD from MIT. In 2019, he was a visiting scholar in the CHAI group at UC Berkeley. He is on the board of directors at Ought, a non-profit lab that created the AI research assistant Elicit. He has worked on preference learning, reinforcement learning, forecasting, and philosophical questions relating to AI. His recent work aims to understand truthfulness and honesty for AI models.

Registration

To register for the event, visit the official registration page.


About the SRI Seminar Series

The SRI Seminar Series brings together the Schwartz Reisman community and beyond for a robust exchange of ideas that advance scholarship at the intersection of technology and society. Seminars are led by a leading or emerging scholar and feature extensive discussion.

Each week, a featured speaker will present for 45 minutes, followed by an open discussion. Registered attendees will be emailed a Zoom link before the event begins. The event will be recorded and posted online.

Details

Date:
November 30, 2022
Time:
3:10 pm - 4:30 pm
Event Category:
Event Tags:
Website:
https://srinstitute.utoronto.ca/events

Organizer

Schwartz Reisman Institute for Technology and Society
View Organizer Website