SRI Seminar Series: Ethan Perez, âDiscovering language model behaviors with model-written evaluationsâ
In this talk, Perez explores to what extent language models can be evaluated through processes generated by the models themselves, presenting recent research that demonstrates successful results, as well as enabling the discovery of novel behaviours within the models.