Technical trends driving AI progress

Intelligence Explosion

Will AI progress accelerate even faster?

Map the threat landscape

Prioritising threat pathways

Option 1: Power concentration

Option 2: Gradual disempowerment

Option 3: Catastrophic pandemics

Option 4: Critical infrastructure collapse

Imagining a better future

What future do you want?

Steering the race to AGI

The characters

What might success look like?

Building defences

Layer 1: Prevent dangerous AI training

Layer 2: Constrain dangerous AI capabilities

Layer 3: Withstand dangerous AI actions

Choose your focus

Go deep in one area

Make a plan

Create your 1-pager (optional)

Optional advice

Next steps: Deeper-dive courses

Next steps: Programs

One way to break down existing AGI Strategies is into the following three broad buckets.

This is a gross over-simplification, but we believe it captures the essence of the main “camps” for how to make AI go well.
### 1: Government control over AGI
This strategy argues in favour of centralising control over compute and frontier AI model weights into a small number of Government-backed “AGI Projects”. These projects would either coordinate with each other to ensure that one doesn’t race ahead to build a dangerous AI system, and/or they’d keep each other in check via coercion, surveillance and strategic deterrence.

Prominent proposals of this strategy are “[The Project](https://situational-awareness.ai/the-project/)” by Leopold Aschenbrenner, “[CERN for AI](https://milesbrundage.substack.com/p/my-recent-lecture-at-berkeley-and)”, “[Chips for Peace](https://www.lawfaremedia.org/article/chips-for-peace--how-the-u.s.-and-its-allies-can-lead-on-safe-and-beneficial-ai)” by Cullen O’Keefe, "[Technical Requirements for Halting Dangerous AI Activities](https://techgov.intelligence.org/research/technical-requirements-for-halting-dangerous-ai-activities)" by Barnett et al., and “[Superintelligence Strategy](https://www.nationalsecurity.ai/)” by Hendrycks et al.

In this strategy, global development of frontier AI systems is controlled by these small number of actors, who only build more capable AI systems when they’re confident that doing so would be safe and they can maintain control over the AI system. The project would need to be very secure, both from within (so the AI model itself does not escape, or get helped to escape via internal accomplices), and from external actors trying to steal the AI model weights.

Information about algorithmic innovations might need to be made “[born secret](https://www.governance.ai/research-paper/ai-policy-levers-a-review-of-the-u-s-governments-tools-to-shape-ai-research-development-and-deployment)”, i.e. top secret/confidential from the point at which they’re created, to prevent actors with smaller amounts of compute from training frontier AI systems in the future.

Proponents of this strategy believe that current geopolitical and commercial race dynamics will push towards the development of AI systems that could cause human extinction. They don’t believe it’s possible to build an “aligned superintelligence”, at least in the short-term, and they don’t believe it’s possible to build defences against a “misaligned superintelligence”. They believe frontier AI development must be stopped, paused or tightly controlled by the world’s governments.

Concerns about this strategy include that it could lead to an [extreme concentration of power](https://www.forethought.org/research/should-there-be-just-one-western-agi-project) among the groups controlling the project(s), that it’s intractable and undesirable for Governments to take control over global AI compute supplies, and that it would fail to prevent other actors from building dangerous AI systems anyway due to algorithmic progress.
### 2: Hand over control to aligned superintelligence
This strategy argues that building superintelligence is inevitable, and that only a superintelligent AI could steer us towards a utopian future and protect humanity from harm. Proponents believe that “[AI alignment](https://bluedot.org/blog/what-is-ai-alignment)” is possible, i.e. building AI systems that reliably take actions that follow the interests of the AI company that’s trained it. (Note that AI alignment means different things to different people, spanning from “does what this specific user wants” to “acts in line with human values” to “doesn’t try to kill everyone”).

Actors pursuing this strategy want the “good guys” to “win” the race to superintelligence. Some proponents of this strategy argue that the aligned superintelligence should take advantage of its superior intellectual capabilities to gain a “[decisive strategic advantage](https://www.lesswrong.com/posts/vkjWGJrFWBnzHtxrw/superintelligence-7-decisive-strategic-advantage)” over all other AI projects, in order to achieve some form of world domination, and to prevent anyone else from building a “misaligned” superintelligence.

This strategy isn’t made explicit by many actors (”put the AI’s in control of the universe” doesn’t have wide appeal!), but it is the implicit strategy of many actors in the AI ecosystem. To hear this for yourself, go to San Francisco house parties with AI company employees.

You can hear a hint of this [here](https://www.youtube.com/watch?v=JdT78t1Offo&t=989s) in an interview with Anthropic co-founder Tom Brown.
> There’s going to be a handoff, where humanity hands off control to transformative AI at some point. Hopefully it’ll be aligned with us and that’ll be a good transition that goes well, but it might not be. The stakes are incredibly high.
### 3: Build defences and diffuse AGI
This strategy argues for everyone to have access to their own AGI and compute, but for tremendous resources to be invested into defensive technologies. Proponents believe that the best future with AI is one where access and control over AI is widespread, and no single actor or group has too much power over this transformative technology. 

Prominent proposals of this strategy are “[d/acc](https://vitalik.eth.limo/general/2025/01/05/dacc2.html)” by Vitalik Buterin, “[def/acc](https://defacc.substack.com/p/what-is-defacc-anyway)” by Matt Clifford, and to some degree “[Differential Technological Development](https://jetpress.org/volume9/risks.html#:~:text=Differential%20technological%20development)” by Nick Bostrom.

In the future this strategy envisions, protective technologies outpace destructive technologies. For example, even if millions of people could build viruses worse than the ones which caused the COVID pandemic, humanity would be fine because we’ve built rapid pathogen detection, containment and treatment capabilities. Even if every future teenager could build malware which could hack into a bank today, our financial systems are safe because the banks of the future have much more sophisticated cybersecurity. Future AIs would behave in desirable ways as a result of “AI alignment” techniques succeeding, like Anthropic’s [Constitutional AI](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback). And whatever new weapons of mass destruction are concocted, the world coordinates to build defences against those weapons before they’re able to cause catastrophic harm, and ideally before the weapon is ever built.

Critiques of this strategy focus on our inability to guarantee that we _can_ or _will_ build defences against new types of attacks. For example, Bostrom’s “[vulnerable world hypothesis](https://nickbostrom.com/papers/vulnerable.pdf)” proposes a thought experiment: what if we lived in a world where making nuclear weapons was easy, using materials available to everyone? What protective technologies could we build in such a world?

Another critique of this strategy is that while an attacker only needs to find one civilisational vulnerability to exploit, the defenders must protect and fix all exploitable vulnerabilities. It might be the case that the amount of resources needed to defend dwarfs the amount of resources needed for a bad actor with powerful AI to cause harm.

————————————

Note that some people believe we should pursue all three strategies, e.g. that we should slow down and control AI development in the short-term via government AGI projects, that we need to build defences against AI harms in the medium-term as a result of a diffusion of frontier capabilities, and that in the long-term we should relinquish control to an aligned superintelligence.


In the last unit, you created a threat scenario and a kill chain.

Read about each layer of defence in the next 3 sections, then you’ll apply ONE layer of defence to slow down or stop the attack.

A reminder of the layers: 
1. **Prevent the training of a dangerous AI model**
    - E.g. stop an AI company from training their next generation of AI model
    - How could you prevent the training and creation of the AI system with the capabilities required to execute on your attack?
2. **Constrain a dangerous AI model's capabilities and actions**
    - E.g. an AI company ensuring that its new, dangerous AI model remains under their control and doesn't take dangerous actions in the world
    - If a dangerous AI is trained, how could an AI company constrain its ability to cause harm?
3. **Withstand dangerous AI actions**
    - E.g. beef up society's resilience to AI-enabled attacks via biosecurity, cybersecurity, and democratic hardening
    - If the dangerous AI systems escape all controls, how could we make society more resilient to its dangerous actions?


Return to your [character](https://bluedot-impact.notion.site/agi-strategy-character-cards) from Unit 1, and your kill chain from Unit 3.

Which defensive layer do you think is most critical for your kill chain?

**Now describe what it would look like for your character to implement that defensive layer, to protect humanity from your kill chain.**


Applying defences against your attack

The first layer of defence is prevention: not training dangerous AI systems in the first place.

Interventions that (mostly) live here: export controls and the most advanced AI chips, international agreements and regulations not to train AI systems that cross certain red lines, “safe by design” AI systems, and norms in the research community not to race ahead in a reckless way. 

Prevention feels the cleanest. If nothing dangerous gets built, nothing bad can happen. But it’s also one of the hardest layers to establish. There’s intense commercial competition between AI companies, and intense rivalry between nations. Most actors are incentivised to race ahead. 

What would it take to prevent any actor from training an AI system with civilisation-threatening capabilities? How could we overcome tremendous geopolitical and commercial incentives to race ahead?


When AI companies train their next generation of AI systems, they don’t know what that AI system will be capable of. They know that if they make it bigger and train it for longer, it will be better, but they don’t know how much better and in what way.

This also means that they don’t know how dangerous their next generation of AI systems will be. They discover that during and after the training process. Then they implement safeguards to try to prevent their AI systems from taking harmful actions.

So, if an AI company trains a dangerous AI system, how will they know that they’ve done so? And what might they do about it?


[Pliny the Liberator](https://venturebeat.com/ai/an-interview-with-the-most-prolific-jailbreaker-of-chatgpt-and-other-leading-llms/) is an online persona who jailbreaks AI systems. They've created [this database](https://github.com/elder-plinius/L1B3RT4S) of prompts that you can use to bypass guardrails on AI models. 

Try this out for yourself to understand how vulnerable these AI systems are to simple attacks. 

Describe your process and results below. 

**A quick note before you begin:** Attempting jailbreaks on commercial AI platforms (ChatGPT, Claude, Gemini, etc.) may violate their Terms of Service and can result in account suspension. We recommend:
- Using a secondary account you don't rely on for other work
- Testing on open-source models locally (e.g., via Ollama)
- Stopping if you receive any warning from the platform

This exercise is meant to be about understanding vulnerabilities not about encouraging ToS violations.


Learn how weak existing guardrails are

What happens if a dangerous AI model is trained, and it bypasses an AI company’s safeguards, or escapes their control entirely? 

If dangerous AI models roam wild, we need to harden society against a possible enslaught of attacks. 

Imagine what it would look like to have a pandemic-proof, where no matter how good an AI system is at building bioweapons, we detect and contain pathogens immediately. 

Imagine what it would look like for critical national infrastructure to be secure against physical and cyber attacks. We might have the most capable attackers working “on our side”, testing the defences, highlighting where there are weaknesses, and using their expertise to shore up the defences. 

Imagine what it would look like for our democratic systems to elect wise, benevolent leaders and for our information ecosystems to enable people to make sense of reality together. 


AGI Strategy

Layer 1: Prevent dangerous AI training

Resources (30 mins)