Why we run our AI safety courses – BlueDot Impact

Why we run our AI safety courses

By Adam Jones (Published on December 4, 2024)

We think radically transformative AI (TAI)[1] is a realistic possibility within 10 years.

Stop to think about this: this is huge! Organisations could potentially create thousands of virtual engineers, scientists, doctors, or lawyers in minutes. Governments could do the same with civil servants, military strategists or weapons designers.

In this scenario, it’s hard to imagine AI like this not taking over most jobs.[2] Just this job displacement would fundamentally transform our world, even if the rest of society stayed the same.

But it’s more realistic that society would not remain the same in other areas - if everything is automated tasks might happen 10, 100, or even 1000 times faster than today. Scientific breakthroughs that currently take decades might occur daily.

The social implications are stark: whoever develops TAI first could gain overwhelming economic and military advantages.

What will TAI look like?

While we can't predict exactly what TAI will look like, we can make educated guesses about its properties. TAI systems will likely:

  • have superhuman capabilities across a wide range of tasks, as we’ve already seen for existing AI systems for narrower tasks
  • pursue specific objectives, both because we typically design AI systems to actively work toward goals (even if that is just predicting the next word) and because this is likely to be more valuable for its users
  • take actions autonomously, as has been the trend with most computer systems[3] and we’ve seen in recent AI releases[4]

The technological foundations will likely build on current approaches: self-supervised learning, reinforcement learning, and multimodal large language models. These tools are already showing remarkable capabilities.

What are the risks?

Based on these basic assumptions, two obvious categories of risk emerge:

  • Misuse risks arise if the model is highly capable at carrying out harmful tasks. This includes high-likelihood lower-impact risks such as enabling cases of fraud, and low-likelihood higher-impact risks such as biological weapon development.
  • Malfunction risks arise if the model takes actions autonomously for important objectives, but makes key mistakes. Again, this could include high-likelihood lower-impact risks such as financial trading errors leading to stock market losses, to lower-likelihood higher-impact risks such as escalation of global miltiary conflicts.

Additionally, we could have very capable AI systems trying to pursue some objectives. We don’t know how to make sure they're pursuing objectives we want (alignment), or what the objectives should be (moral philosophy)! This can lead to:

  • Misalignment risk, where AI systems pursue incorrect or harmful objectives, leading to gradual or sudden loss of human control over the world.

For more on risks, see our previous article What risks does AI pose?

What risks do we focus on?

We’re a non-profit training people to tackle the world’s most important problems.

While uncertain, it seems possible we could have TAI systems fairly soon that could pose massive risks - and very few people with the right combination of skills, context, and motivation are working on reducing these risks.

Reducing these risks could be incredibly important because of the scale of a potential catastrophe. By catastrophe here we mean an event comparable to more than 1 billion human deaths (although this line is a little arbitrary).

Although estimates of catastrophic risks vary, even conservative estimates imply that reducing these risks would be very valuable. For example, a 1%[5] chance of 1 billion deaths is 10 million expected lives. And many well-informed people think the likelihood and scale is significantly greater than this: the median NeurIPS and ICML author estimated a 5% chance of human extinction level risk from AI.[6]

That's why you're taking this course: we believe you have the potential to help reduce catastrophic AI risk, which we consider one of the most effective ways to positively impact the world's future.

For more on AI catastrophes, see 80,000 Hours’ article Preventing an AI-related catastrophe.

Shouldn’t we also worry about current AI harms?

Yes! Current AI systems are already making discriminatory decisions, being used to scam vulnerable people, and create and spread disinformation. It would be good to reduce these risks.

However, these are not the primary focus of our courses.

This is because far more people are already working on these risks, many already have partial solutions in place, and society is usually more able to recover from them. In contrast, catastrophic risks are often irreversible - if 80% of the world population are killed, it is much harder for society to recover.

Additionally, there are many existing other courses that focus on the near-term harms of AI systems. This isn’t the case for catastrophic AI risks.

How are we tackling catastrophic AI risks in our courses?

We currently offer two primary courses:

  • AI alignment: Explores the technical challenge of building AI systems that reliably do what humans intend. You'll explore proposals to ensure frontier AI models are developed responsibly with proper safeguards against potential catastrophic risks.
  • AI governance: Apporaches AI safety from a policy perspective, examining a range of policy levers for steering AI development, including legislation, standards and regulation.

We think both are necessary as neither technical nor governance approaches alone are likely to be sufficient. In the future we may also offer other courses where we see gaps in the world’s plan for addressing catastrophic AI risks. For example, we might need AI resilience, which aims to prepare the rest of society for the impacts of AI.

Conclusion

Reducing catastrophic AI risk is one of humanity's most crucial challenges. There aren't enough qualified people working on this problem. Our courses aim to prepare you to help address this critical need.

Footnotes

  1. By this, we mean an AI system that leads to a 10x increase in innovation, or something equivalent. This is mainly meant to distinguish it from present day AI systems, both narrow (e.g. credit decision models at a bank) and more general but not boosting innovation by 10x yet (e.g. ChatGPT).

    For more on this definition, we recommend The Transformative Potential of Artificial Intelligence (particularly section 4), and the section ‘Basic assumptions and framework’ in Machines of Loving Grace.

    We usually avoid the term ‘AGI’ because people’s preconceived notions about it often lead to unproductive arguments. Also, we ultimately care about how AI will affect the world, rather than the capability level or generality they have for the sake of it.

  2. A common objection is that this only applies to remote jobs. However, assuming the AI system has human-level intelligence in a wide range of domains, it could likely build and control robots to do physical jobs too.

    For more on this, see our previous article Why are people building AI systems?

  3. A common pattern for tech adoption is:

    The first version provides humans information, but humans are taking all the actions.

    The second version assists humans with tasks, possibly suggesting actions.

    The third version takes more actions, up to the point of replacing humans.

    For example, consider IT systems in banking. Initially this was to help move information more quickly between tellers who were still taking all the actions fairly manually. Then we started getting IT systems that allowed bank staff to take actions on people’s accounts and have things applied directly. And now we have IT systems that have largely replaced humans: online banking removes the need for a human to look up your balance, or initiate a transaction on your behalf.

  4. See Anthropic’s Claude’s computer use, OpenAI’s function calling, Microsoft’s autonomous agents, Apple’s Ferret-UI, ACT-1, Open Interpreter.

  5. Estimates vary a lot, including among experts – pretty much spanning from 0.01% to 99.99%. The median estimate of a 2022 and 2023 survey of authors that recently published in NeurIPS and ICML was 5% for extremely bad (e.g., human extinction) outcomes.

  6. There is some philosophical debate about how to value existential risks against catastrophic risks, due to arguments about astronomical waste or longtermism. Although this is not required to be highly motivated by the threat of an AI catastrophe!

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.