Unit 5: Robustness unlearning and control
Resources: Robustness unlearning and control
Resources (1 hr 40 mins)
- Adversarial Machine Learning explained with examples
Create a free account to track your progress and unlock access to the full course content.
- Universal and Transferable Adversarial Attacks on Aligned Language Models
Create a free account to track your progress and unlock access to the full course content.
- Deep Forgetting & Unlearning for Safely-Scoped LLMs
Create a free account to track your progress and unlock access to the full course content.
- Measuring and Reducing Malicious Use With Unlearning
Create a free account to track your progress and unlock access to the full course content.
- AI Control: Improving Safety Despite Intentional Subversion
Create a free account to track your progress and unlock access to the full course content.
Optional Resources
- Adversarial Machine Learning Reading List
Create a free account to track your progress and unlock access to the full course content.
- CAIS: Adversarial Robustness Introduction
Create a free account to track your progress and unlock access to the full course content.