Unit 3: Goal misgeneralisation
Resources: Goal misgeneralisation
Resources (1 hr 40 mins)
- Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals
Create a free account to track your progress and unlock access to the full course content.
- Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals
Create a free account to track your progress and unlock access to the full course content.
- Thought experiments provide a third anchor
Create a free account to track your progress and unlock access to the full course content.
- ML Systems Will Have Weird Failure Modes
Create a free account to track your progress and unlock access to the full course content.
- The alignment problem from a deep learning perspective
Create a free account to track your progress and unlock access to the full course content.
- What failure looks like
Create a free account to track your progress and unlock access to the full course content.
Optional Resources
- Risks from Learned Optimisation: Deceptive alignment
Create a free account to track your progress and unlock access to the full course content.
- Take Off Speeds
Create a free account to track your progress and unlock access to the full course content.
- Modeling the Human Trajectory
Create a free account to track your progress and unlock access to the full course content.
- Clarifying "What failure looks like" (part 1)
Create a free account to track your progress and unlock access to the full course content.
- Long-term growth as a sequence of exponential modes
Create a free account to track your progress and unlock access to the full course content.
- Unsolved Problems in ML Safety
Create a free account to track your progress and unlock access to the full course content.
- Another outer alignment failure story
Create a free account to track your progress and unlock access to the full course content.
- Goal Misgeneralization in Deep Reinforcement Learning
Create a free account to track your progress and unlock access to the full course content.
- What Multipolar failure looks like
Create a free account to track your progress and unlock access to the full course content.