Unit 2: Training safer models
RLHF and its limits
Resources (15 mins)
- The True Story of How GPT-2 Became Maximally Lewd
Create a free account to track your progress and unlock access to the full course content.
- A simple technical explanation of RLH(AI)F
Create a free account to track your progress and unlock access to the full course content.
- Problems with Reinforcement Learning from Human Feedback (RLHF) for AI safety
Create a free account to track your progress and unlock access to the full course content.