Unit 3: Challenges in achieving AI safety

Resources: Challenges in achieving AI safety

Resources (1 hr 20 mins)

Purpose of session
BlueDot Impact · 2023 · 2 min
Create a free account to track your progress and unlock access to the full course content.
Emergent Deception and Emergent Optimisation
Jacob Steinhardt · 2023 · 20 min ·
Create a free account to track your progress and unlock access to the full course content.
AI Safety Seems Hard to Measure
Holden Karnofsky · 2022 · 18 min ·
Create a free account to track your progress and unlock access to the full course content.
Compilation: Why Might Misaligned, Advanced AI Cause Catastrophe?
BlueDot Impact · 2023 · 14 min ·
Create a free account to track your progress and unlock access to the full course content.
Nobody’s on the ball on AGI alignment
Leopold Aschenbrenner · 2023 · 13 min ·
Create a free account to track your progress and unlock access to the full course content.
Avoiding Extreme Global Vulnerability as a Core AI Governance Problem
BlueDot Impact · 2022 · 9 min ·
Create a free account to track your progress and unlock access to the full course content.

Optional Resources

What are some arguments for AI safety being less important?
Jakub Kraus · 2023
Create a free account to track your progress and unlock access to the full course content.
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals
Rohin Shah · 2022 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.
What failure looks like
Paul Christiano · 2019 · 15 min ·
Create a free account to track your progress and unlock access to the full course content.
Goal Misgeneralisation examples
DeepMind Safety Research · 2023
Create a free account to track your progress and unlock access to the full course content.
The other alignment problem: mesa-optimisers and inner alignment
Robert Miles · 2021 · 25 min
Create a free account to track your progress and unlock access to the full course content.
Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society
Carina Prunkl and Jess Whittlestone · 2020 · 30 min
Create a free account to track your progress and unlock access to the full course content.
AI Timelines: Where the Arguments, and the "Experts," Stand
Holden Karnofsky · 2021 · 10 min ·
Create a free account to track your progress and unlock access to the full course content.
What Everyone in Technical Alignment is Doing and Why
Thomas Larsen and Eli Lifland · 2022 · 46 min ·
Create a free account to track your progress and unlock access to the full course content.
Discontinuous progress in history: an update
AI Impacts · 2020 · 50 min
Create a free account to track your progress and unlock access to the full course content.
AI Governance: Opportunity and Theory of Impact
Allan Dafoe · 2020 · 20 min ·
Create a free account to track your progress and unlock access to the full course content.
Coordination challenges for preventing AI conflict
Stefan Torges · 2021 · 28 min ·
Create a free account to track your progress and unlock access to the full course content.
AI Governance Optional Resources (Extended)
BlueDot Impact · 2023
Create a free account to track your progress and unlock access to the full course content.

AI Governance (2023)

Resources: Challenges in achieving AI safety

Resources (1 hr 20 mins)