Resources: Adversarial techniques for scalable oversight

Exercises: Adversarial techniques for scalable oversight

Resources: Reward misspecification and instrumental convergence

Exercises: Reward misspecification and instrumental convergence

Resources: Governance (Alignment 23)

Exercises: Governance (Alignment 23)

Resources: Task decomposition for scalable oversight

Exercises: Task decomposition for scalable oversight

Resources: Careers and Projects

Exercises: Careers and Projects

Next steps: Programs

Resources: Interpretability

Exercises: Interpretability

Resources: Agent foundations

Exercises: Agent foundations

Resources: Artificial General Intelligence

Exercises: Artificial General Intelligence

Resources: Introduction to Machine Learning (23)

Exercises: Introduction to Machine Learning (23)

Resources: Goal misgeneralisation

Exercises: Goal misgeneralisation

The theoretical foundations of the field of machine learning break in a number of ways when we use them to describe real-world agents.
This week covers agent foundations research.
We cover the agent foundations research agenda (pursued primarily by the Machine Intelligence Research Institute (MIRI)), which aims to develop better theoretical frameworks for describing AIs embedded in real-world environments.
We cannot cover the field in depth, but we hope to give you an overview that you can use to pursue further aspects you find interesting. More content is yet to be confirmed for this week.

By the end of the unit, you should be able to:
\- Explain the problem of embedded agency.
\- Define the AIXI algorithm, and explain why it is not computable.
\- Explain what logical decision theory is.
\- Use a causal influence diagram to illustrate the concept of an RL agent having an incentive to influence its environment.


AI Alignment (2023)

Agent foundations

Introduction to Machine Learning (23)

Artificial General Intelligence

Reward misspecification and instrumental convergence

Goal misgeneralisation

Task decomposition for scalable oversight

Adversarial techniques for scalable oversight

Interpretability

Governance (Alignment 23)

Careers and Projects