Unit 2: What is AI alignment?

Resources: What is AI alignment?

Resources (2 hrs 10 mins)

What is AI alignment?
Adam Jones · 2024 · 15 min ·
Create a free account to track your progress and unlock access to the full course content.
Intro to AI Safety
Robert Miles · 2021 · 20 min
Create a free account to track your progress and unlock access to the full course content.
More Is Different for AI
Jacob Steinhardt · 2022 · 4 min ·
Create a free account to track your progress and unlock access to the full course content.
Four Background Claims
Nate Soares · 2015 · 15 min ·
Create a free account to track your progress and unlock access to the full course content.
What risks does AI pose?
Adam Jones · 2024 · 15 min ·
Create a free account to track your progress and unlock access to the full course content.
How we could stumble into AI catastrophe
Holden Karnofsky · 2023 · 36 min ·
Create a free account to track your progress and unlock access to the full course content.
Why alignment could be hard with modern deep learning
Ajeya Cotra · 2021 · 25 min ·
Create a free account to track your progress and unlock access to the full course content.

Optional Resources

What failure looks like
Paul Christiano · 2019 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.
Specification gaming: the flip side of AI ingenuity
Victoria Krakovna et al. · 2020 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.
Artificial intelligence is transforming our world — it is on all of us to make sure that it goes well
Max Roser · 2022 · 11 min
Create a free account to track your progress and unlock access to the full course content.
The alignment problem from a deep learning perspective
Richard Ngo and Soeren Mindermann and Lawrence Chan · 2022 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.
Superintelligence: Instrumental convergence
Nick Bostrom · 2014 · 15 min ·
Create a free account to track your progress and unlock access to the full course content.
The easy goal inference problem is still hard
Paul Christiano · 2015 · 5 min ·
Create a free account to track your progress and unlock access to the full course content.
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals
Rohin Shah · 2022 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.
What is inner misalignment?
Jan Leike · 2022 · 10 min
Create a free account to track your progress and unlock access to the full course content.
A central AI alignment problem: capabilities generalization, and the sharp left turn
Nate Soares · 2022 · 15 min
Create a free account to track your progress and unlock access to the full course content.
Is power-seeking AI an existential risk?
Joseph Carlsmith · 2021 · 25 min ·
Create a free account to track your progress and unlock access to the full course content.
Risks from Learned Optimization
Evan Hubinger, Chris van Merwijk and Vladimir Mikulik et al. · 2019 · 55 min
Create a free account to track your progress and unlock access to the full course content.
OpenAI Plays Hide and Seek…and Breaks The Game!
Two Minute Papers · 2019 · 6 min
Create a free account to track your progress and unlock access to the full course content.
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Jacob Hilton and Owain Evans and Stephanie Lin · 2021 · 10 min
Create a free account to track your progress and unlock access to the full course content.
Optimal Policies Tend To Seek Power
Alex Turner, Logan Smith and Rohin Shah et al. · 2021 · 15 min
Create a free account to track your progress and unlock access to the full course content.
The other alignment problem: mesa-optimisers and inner alignment
Robert Miles · 2021 · 24 min
Create a free account to track your progress and unlock access to the full course content.
Propositions Concerning Digital Minds and Society
Nick Bostrom and Carl Shulman · 2022
Create a free account to track your progress and unlock access to the full course content.
Digital People Would Be An Even Bigger Deal
Holden Karnofsky · 2021 · 19 min
Create a free account to track your progress and unlock access to the full course content.
AI: Racing Toward the Brink
Eliezer Yudkowsky and Sam Harris · 2018 · 110 min
Create a free account to track your progress and unlock access to the full course content.
On the opportunities and risks of foundation models
Bommasani · 2022 ·
Create a free account to track your progress and unlock access to the full course content.
AGI Safety From First Principles
Richard Ngo · 2020 ·
Create a free account to track your progress and unlock access to the full course content.

AI Alignment

Resources: What is AI alignment?

Resources (2 hrs 10 mins)