Unit 5: Minimising harm

Assuming harm

Using Dangerous AI, But Safely?
Robert Miles · 2024 · 30 min
Create a free account to track your progress and unlock access to the full course content.
Introduction to AI Control
Sarah Hastings-Woodhouse · 2025 · 5 min ·
Create a free account to track your progress and unlock access to the full course content.
What is input/output filtering in AI safety?
Sarah Hastings-Woodhouse · 2025 · 5 min
Create a free account to track your progress and unlock access to the full course content.
Constitutional Classifiers: Defending against universal jailbreaks
Sharma et al. · 2025 · 10 min
Create a free account to track your progress and unlock access to the full course content.

Optional Resources

Technical AI Safety