Unit 2: What is AI alignment?
Resources: What is AI alignment?
Resources (2 hrs 10 mins)
- What is AI alignment?
Create a free account to track your progress and unlock access to the full course content.
- Intro to AI Safety
Create a free account to track your progress and unlock access to the full course content.
- More Is Different for AI
Create a free account to track your progress and unlock access to the full course content.
- Four Background Claims
Create a free account to track your progress and unlock access to the full course content.
- What risks does AI pose?
Create a free account to track your progress and unlock access to the full course content.
- How we could stumble into AI catastrophe
Create a free account to track your progress and unlock access to the full course content.
- Why alignment could be hard with modern deep learning
Create a free account to track your progress and unlock access to the full course content.
Optional Resources
- What failure looks like
Create a free account to track your progress and unlock access to the full course content.
- Specification gaming: the flip side of AI ingenuity
Create a free account to track your progress and unlock access to the full course content.
- Artificial intelligence is transforming our world — it is on all of us to make sure that it goes well
Create a free account to track your progress and unlock access to the full course content.
- The alignment problem from a deep learning perspective
Create a free account to track your progress and unlock access to the full course content.
- Superintelligence: Instrumental convergence
Create a free account to track your progress and unlock access to the full course content.
- The easy goal inference problem is still hard
Create a free account to track your progress and unlock access to the full course content.
- Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals
Create a free account to track your progress and unlock access to the full course content.
- What is inner misalignment?
Create a free account to track your progress and unlock access to the full course content.
- A central AI alignment problem: capabilities generalization, and the sharp left turn
Create a free account to track your progress and unlock access to the full course content.
- Is power-seeking AI an existential risk?
Create a free account to track your progress and unlock access to the full course content.
- Risks from Learned Optimization
Create a free account to track your progress and unlock access to the full course content.
- OpenAI Plays Hide and Seek…and Breaks The Game!
Create a free account to track your progress and unlock access to the full course content.
- TruthfulQA: Measuring How Models Mimic Human Falsehoods
Create a free account to track your progress and unlock access to the full course content.
- Optimal Policies Tend To Seek Power
Create a free account to track your progress and unlock access to the full course content.
- The other alignment problem: mesa-optimisers and inner alignment
Create a free account to track your progress and unlock access to the full course content.
- Propositions Concerning Digital Minds and Society
Create a free account to track your progress and unlock access to the full course content.
- Digital People Would Be An Even Bigger Deal
Create a free account to track your progress and unlock access to the full course content.
- AI: Racing Toward the Brink
Create a free account to track your progress and unlock access to the full course content.
- On the opportunities and risks of foundation models
Create a free account to track your progress and unlock access to the full course content.
- AGI Safety From First Principles
Create a free account to track your progress and unlock access to the full course content.