Unit 5: Minimising harm
Assuming harm
Resources (50 mins)
- Using Dangerous AI, But Safely?
Create a free account to track your progress and unlock access to the full course content.
- Introduction to AI Control
Create a free account to track your progress and unlock access to the full course content.
- What is input/output filtering in AI safety?
Create a free account to track your progress and unlock access to the full course content.
- Constitutional Classifiers: Defending against universal jailbreaks
Create a free account to track your progress and unlock access to the full course content.
Optional Resources
- Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway
Create a free account to track your progress and unlock access to the full course content.