Unit 4: Task decomposition for scalable oversight
Resources: Task decomposition for scalable oversight
Resources (1 hr 50 mins)
- AI Alignment Landscape
Create a free account to track your progress and unlock access to the full course content.
- Measuring Progress on Scalable Oversight for Large Language Models
Create a free account to track your progress and unlock access to the full course content.
- Learning Complex Goals with Iterated Amplification
Create a free account to track your progress and unlock access to the full course content.
- Supervising strong learners by amplifying weak experts
Create a free account to track your progress and unlock access to the full course content.
- Summarizing Books with Human Feedback
Create a free account to track your progress and unlock access to the full course content.
- Language Models Perform Reasoning via Chain of Thought
Create a free account to track your progress and unlock access to the full course content.
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Create a free account to track your progress and unlock access to the full course content.
Optional Resources
- Factored cognition
Create a free account to track your progress and unlock access to the full course content.
- Scalable agent alignment via reward modeling: a research direction
Create a free account to track your progress and unlock access to the full course content.
- Adversarial Training for High-Stakes Reliability
Create a free account to track your progress and unlock access to the full course content.
- Reward-rational (implicit) choice: A unifying formalism for reward learning
Create a free account to track your progress and unlock access to the full course content.
- Humans can be assigned any values whatsoever
Create a free account to track your progress and unlock access to the full course content.
- The MineRL BASALT Competition on Learning from Human Feedback
Create a free account to track your progress and unlock access to the full course content.
- Humans Consulting HCH
Create a free account to track your progress and unlock access to the full course content.
- Strong HCH
Create a free account to track your progress and unlock access to the full course content.
- A General Language Assistant as a Laboratory for Alignment
Create a free account to track your progress and unlock access to the full course content.
- Learning the preferences of bounded agents
Create a free account to track your progress and unlock access to the full course content.
- Cooperative inverse reinforcement learning
Create a free account to track your progress and unlock access to the full course content.
- Training language models with language feedback
Create a free account to track your progress and unlock access to the full course content.