Unit 3: Preventing Misgeneralization
Resources: Preventing Misgeneralization
Resources (1 hr 50 mins)
- The prototypical catastrophic AI action is getting root access to its datacenter
Create a free account to track your progress and unlock access to the full course content.
- Robust Feature-Level Adversaries are Interpretability Tools
Create a free account to track your progress and unlock access to the full course content.
- Adversarial Training for High-Stakes Reliability
Create a free account to track your progress and unlock access to the full course content.
- ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation
Create a free account to track your progress and unlock access to the full course content.
- Imitiative Generalization
Create a free account to track your progress and unlock access to the full course content.
Optional Resources
- Adversarial Robustness as a Prior for Learned Representations
Create a free account to track your progress and unlock access to the full course content.
- Constructing unrestricted adversarial examples with generative models
Create a free account to track your progress and unlock access to the full course content.
- Unrestricted adversarial examples via semantic manipulation
Create a free account to track your progress and unlock access to the full course content.
- Planting undetectable backdoors in machine learning models
Create a free account to track your progress and unlock access to the full course content.
- 2-D Robustness
Create a free account to track your progress and unlock access to the full course content.
- Towards deep learning models resistant to adversarial attacks
Create a free account to track your progress and unlock access to the full course content.
- Relaxed adversarial training for inner alignment
Create a free account to track your progress and unlock access to the full course content.
- Adversarial examples for evaluating reading comprehension systems
Create a free account to track your progress and unlock access to the full course content.