Unit 7: Technical governance approaches

Resources: Technical governance approaches

Resources (1 hr 20 mins)

Emerging processes for frontier AI safety
UK Government · 2023 · 25 min ·
Create a free account to track your progress and unlock access to the full course content.
Computing Power and the Governance of AI
Lennart Heim, Markus Anderljung and Emma Bluemke et al. · 2024 · 20 min ·
Create a free account to track your progress and unlock access to the full course content.
AI Governance Needs Technical Work
Mau · 2022 · 12 min ·
Create a free account to track your progress and unlock access to the full course content.
AI Watermarking Won't Curb Disinformation
Jacob Hoffman-Andrews · 2024 · 7 min ·
Create a free account to track your progress and unlock access to the full course content.
We need a Science of Evals
Marius Hobbhahn · 2024 · 15 min ·
Create a free account to track your progress and unlock access to the full course content.

Optional Resources

Model Evaluation for Extreme Risks
Toby Shevlane · 2023 ·
Create a free account to track your progress and unlock access to the full course content.
Evaluating Language-Model Agents on Realistic Autonomous Tasks
METR · 2023 · 8 min
Create a free account to track your progress and unlock access to the full course content.
Red-teaming language models with language models
Ethan Perez, Saffron Huang and Francis Song et al. · 2022 · 5 min ·
Create a free account to track your progress and unlock access to the full course content.
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper and Carson Ezell · 2024
Create a free account to track your progress and unlock access to the full course content.
Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns and Steven Basart et al. · 2021 · 30 min
Create a free account to track your progress and unlock access to the full course content.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Google · 2022 · 60 min
Create a free account to track your progress and unlock access to the full course content.
SWE-Bench
Carlos Jimenez and John Yang · 2024
Create a free account to track your progress and unlock access to the full course content.
Anthropic's Responsible Scaling Policy
Anthropic · 2026
Create a free account to track your progress and unlock access to the full course content.
Responsible Scaling Policies Are Risk Management Done Wrong
Siméon Campos · 2023
Create a free account to track your progress and unlock access to the full course content.
Increased Compute Efficiency and the Diffusion of AI Capabilities
Konstantin Pilz and Lennart Heim and Nicholas Brown · 2023
Create a free account to track your progress and unlock access to the full course content.
What Does It Take To Catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
Yonadav Shavit · 2023 ·
Create a free account to track your progress and unlock access to the full course content.
Azure OpenAI Service abuse monitoring
Microsoft · 2023 · 3 min
Create a free account to track your progress and unlock access to the full course content.
Challenges in evaluating AI systems
Anthropic · 2023 · 15 min ·
Create a free account to track your progress and unlock access to the full course content.

AI Alignment

Resources: Technical governance approaches

Resources (1 hr 20 mins)