Blog

Problems with Reinforcement Learning from Human Feedback (RLHF) for AI safety

Sarah Hastings-WoodhouseAugust 19, 2024