Unit 4: Understanding AI

How does AI think?

What Do Neural Networks Really Learn? Exploring the Brain of an AI Model
Rational Animations · 2024 · 15 min
Create a free account to track your progress and unlock access to the full course content.
Introduction to Mechanistic Interpretability
Sarah Hastings-Woodhouse · 2024 · 5 min ·
Create a free account to track your progress and unlock access to the full course content.
Neel Nanda on the race to read AI minds
Robert Wiblin · 2025 · 5 min
Create a free account to track your progress and unlock access to the full course content.
The Misguided Quest for Mechanistic AI Interpretability
Dan Hendrycks and Laura Hiscott · 2025 · 10 min
Create a free account to track your progress and unlock access to the full course content.

Optional Resources

MoSSAIC: AI Safety After Mechanism
Farr et al. · 2025
Create a free account to track your progress and unlock access to the full course content.
Let's Try To Understand AI Monosemanticity
Scott Alexander · 2023 · 25 min
Create a free account to track your progress and unlock access to the full course content.
Against Almost Every Theory of Impact of Interpretability
Charbel-Raphael Segerie · 2023 · 20 min
Create a free account to track your progress and unlock access to the full course content.
Interpretability Will Not Reliably Find Deceptive AI
Neel Nanda · 2025 · 10 min
Create a free account to track your progress and unlock access to the full course content.
Barriers to Mechanistic Interpretability for AGI Safety
Connor Leahy · 2023 · 15 min
Create a free account to track your progress and unlock access to the full course content.

Technical AI Safety