Unit 1: Further Understanding the Alignment Problem

Resources: Further Understanding the Alignment Problem

AGI ruin: a list of lethalities
Eliezer Yudkowsky · 2022 · 35 min ·
Create a free account to track your progress and unlock access to the full course content.
Where I agree and disagree with Eliezer
Paul Christiano · 2022 · 25 min ·
Create a free account to track your progress and unlock access to the full course content.
Worst case thinking in AI alignment
Buck Shlegeris · 2021 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.
Empirical findings generalize surprisingly far
Jacob Steinhardt · 2022 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.

Optional Resources

Optimal Policies Tend To Seek Power
Alex Turner, Logan Smith and Rohin Shah et al. · 2021 · 15 min
Create a free account to track your progress and unlock access to the full course content.
Reward is not the optimization target
Alex Turner · 2022 · 15 min
Create a free account to track your progress and unlock access to the full course content.
Advanced artificial agents intervene in the provision of reward
Michael K. Cohen and Marcus Hutter and Michael A. Osborne · 2022 · 45 min
Create a free account to track your progress and unlock access to the full course content.
Risks from Learned Optimisation: Deceptive alignment
Evan Hubinger, Chris van Merwijk and Vladimir Mikulik et al. · 2019 · 55 min
Create a free account to track your progress and unlock access to the full course content.
The theory-practice gap
Buck Shlegeris · 2021 · 10 min
Create a free account to track your progress and unlock access to the full course content.
Yudkowsky contra Ngo on agents
Scott Alexander · 2022 · 30 min
Create a free account to track your progress and unlock access to the full course content.
How do we become confident in the safety of a machine learning system?
Evan Hubinger · 2021 · 50 min
Create a free account to track your progress and unlock access to the full course content.

Alignment 201