Unit 4: Defence in Depth
Layer 2: Constrain dangerous AI capabilities
Resources (40 mins)
- What is AI alignment?
Create a free account to track your progress and unlock access to the full course content.
- Introduction to AI Control
Create a free account to track your progress and unlock access to the full course content.
- A Playbook for Securing AI Model Weights
Create a free account to track your progress and unlock access to the full course content.
Exercises
Optional Resources
- Can we scale human feedback for complex AI tasks?
Create a free account to track your progress and unlock access to the full course content.
- Introduction to Mechanistic Interpretability
Create a free account to track your progress and unlock access to the full course content.
- AI Safety Evaluations: An Explainer
Create a free account to track your progress and unlock access to the full course content.
- Lock Down the Labs: Security for AGI
Create a free account to track your progress and unlock access to the full course content.
- What is input data filtration in AI safety?
Create a free account to track your progress and unlock access to the full course content.
- Enhancing Model Safety through Pretraining Data Filtering
Create a free account to track your progress and unlock access to the full course content.
- Constitutional Classifiers: Defending against universal jailbreaks
Create a free account to track your progress and unlock access to the full course content.