Our weekly SRI Seminar Series welcomes Ann Copestake, a professor of computational linguistics at the Department of Computer Science and Technology at the University of Cambridge. Her research involves developing computer models of human languages. In conjunction with DELPH-IN, an informal international consortium, Copestake has developed software that has been used to develop formal computational accounts of the syntax and compositional semantics of many different languages.
Copestake’s current research explores the development of semantic models compatible with broad-coverage computational processing, as well as combining distributional semantics with model-theoretic accounts and establishing the performance of deep learning systems according to linguistic criteria. She has worked on a variety of application areas including scientific text processing, information extraction, augmentative and alternative communication, machine translation, natural language interfaces, lexical acquisition, and tools for lexicographers.
LLMs and the Information Layer
In 2007, shortly before she died, Karen Spärck Jones briefly discussed the prospect of a natural language “Information Layer,” which she conceptualized as an integral part of computer systems. In this view, the Information Layer is “the vast language stuff that exists electronically” and is seen not as something external to which applications such as browsers interface, but as a resource that computer systems in general could exploit dynamically, using relatively straightforward techniques.
In this talk, I want to discuss and develop this idea in the context of modern large language models (LLMs). Most natural language processing (NLP) applications have always been oriented towards human users. The main exceptions are text (or speech) categorization and information extraction, which produce output according to a relatively simple, predefined ontology. As more complex and more autonomous computer systems are developed, the need for flexible processing of language and other modalities to support their operations becomes ever greater, requiring something well beyond conventional information extraction. Systems incorporating LLMs could support such needs, and indeed, also support the use of natural language for communication between systems, but there are many complications and risks. Conceptualizing system architectures in terms of a common Information Layer could allow us to take a less piecemeal approach to system development and regulation.
Ann Copestake is a professor of computational linguistics in the Department of Computer Science and Technology at the University of Cambridge. Her research involves developing computer models of human languages (or, more precisely, models of some aspects of human languages). In conjunction with DELPH-IN, an informal international consortium, she has developed software that has been used to develop formal computational accounts of the syntax and compositional semantics of many different languages. Her current research mainly concerns the development of models of semantics that are compatible with broad-coverage computational processing (parsing and generation). She is also interested in the formal aspects of combining distributional semantics with model-theoretic accounts and in utilizing DELPH-IN technology to establish the performance of deep learning systems according to linguistic criteria. She has worked on a variety of application areas including scientific text processing, information extraction, augmentative and alternative communication, machine translation, natural language interfaces, lexical acquisition, and tools for lexicographers.
To register for the event, visit the official event page.
The SRI Seminar Series brings together the Schwartz Reisman community and beyond for a robust exchange of ideas that advance scholarship at the intersection of technology and society. Seminars are led by a leading or emerging scholar and feature extensive discussion.
Each week, a featured speaker will present for 45 minutes, followed by an open discussion. Registered attendees will be emailed a Zoom link before the event begins. The event will be recorded and posted online.