Unit 2: What makes aligning AI difficult?
Resources: What makes aligning AI difficult?
Resources (45 mins)
- The True Story of How GPT-2 Became Maximally Lewd
Create a free account to track your progress and unlock access to the full course content.
- RLAIF vs. RLHF: the technology behind Anthropic’s Claude (Constitutional AI Explained)
Create a free account to track your progress and unlock access to the full course content.
- A simple technical explanation of RLH(AI)F
Create a free account to track your progress and unlock access to the full course content.
- Intro to Large Language Models
Create a free account to track your progress and unlock access to the full course content.
- Problems with Reinforcement Learning from Human Feedback (RLHF) for AI safety
Create a free account to track your progress and unlock access to the full course content.
Optional Resources
- Illustrating Reinforcement Learning from Human Feedback (RLHF)
Create a free account to track your progress and unlock access to the full course content.
- Constitutional AI: Harmlessness from AI Feedback
Create a free account to track your progress and unlock access to the full course content.
- Why Does AI Lie, and What Can We Do About It?
Create a free account to track your progress and unlock access to the full course content.