Unit 2: Training safer models
Teaching AI right from wrong
Resources (35 mins)
- The True Story of How GPT-2 Became Maximally Lewd
Create a free account to track your progress and unlock access to the full course content.
- A simple technical explanation of RLH(AI)F
Create a free account to track your progress and unlock access to the full course content.
- Problems with Reinforcement Learning from Human Feedback (RLHF) for AI safety
Create a free account to track your progress and unlock access to the full course content.
Exercises
Optional Resources
- Illustrating Reinforcement Learning from Human Feedback (RLHF)
Create a free account to track your progress and unlock access to the full course content.
- Constitutional AI: Harmlessness from AI Feedback
Create a free account to track your progress and unlock access to the full course content.