Modular AI Safety courses proposal
This proposes to split our AI safety courses (AI Alignment, AI Governance) into modular pieces. We’re publishing this in line with our views on working in public.
This is a proposal, not a plan! It’s open to change, we might not do this, and many details have not been worked out yet.
Why change the current structure?
How are things currently?
We currently have two courses: AI alignment and AI governance. These are online part-time facilitated courses, which have the structure:
- 4 weeks: Icebreaker and introductory AI content
- 4 weeks: Core course content
- 5 weeks: Projects phase
People apply through application forms on our website. They then need to wait until the next iteration of the course gets started. We run these courses approximately every 4 months.
Apart from the 4 weeks of core course content, the courses are fairly similar.
Why is this not ideal?
Delayed start puts people off applying. It might be several months between applying and starting the course. This means energy and excitement for the course may fade, and delay makes people less motivated to sign up. We’re better than we used to be at this - the maximum wait has dropped from ~1 year to ~3 months, and we’re upfront about the wait. But it’s still not great for hooking people while we’ve got their attention.
Long commitment puts people off applying. There’s evidence from others who ran our courses that fewer people sign-up for a single longer course (compared to two shorter ones). If that holds for us, that means we’re potentially wasting marketing budget and missing good people. I know I personally would be much more keen for a shorter course if I was uncertain about applying. This is also something that we could easily test empirically.
Applying itself puts people off. Only about 50% of people who get to the application form from LinkedIn Ads actually apply. While the equivalent wouldn’t be 100% taking the course, it would likely be higher.
Our application evaluation limits capacity and isn’t perfect. Having a restrictive application process decreases the number of people who might make a start getting into AI safety. While our evaluation criteria do select good people, they aren’t perfect. There are also many borderline applicants who I think we’d be better at evaluating after seeing how they responded to the first sessions.
Repeated content bores students and wastes their time. Many students take both our AI alignment and AI governance courses. Given almost two-thirds of the course is shared, they cover a lot of the same stuff again. We have received complaints from some participants that they didn’t find this engaging, and some participants were wary of applying to both courses because of this overlap. (The exact amount of overlap is debatable e.g. projects could be similar or different. However, the point still stands that there is significant overlap between courses.)
Repeated content duplicates work for course leads. Redeveloping similar sessions is wasteful - reviewing feedback, analysing resource performance, and keeping sessions updated is a lot of work.
Long courses increase the number of drop outs. Running a course over a longer period means that it’s more likely something interferes with someone being able to take the course properly. Retention also tends to generally decrease over time - both for our courses and online courses in general.
Deferring people is difficult. Courses being spaced 3 months apart makes participants hesitant to defer: leading to potentially worse engagement and attendance throughout. Additionally, a long course means participants are more likely to be caught out by an emergency half way through. And when they are, they then have more content to repeat (with the problems above) - or may choose just not to come back.
The proposal
The plan is to make two splits: separating out the introduction to transformative AI weeks, and the project phase. We can also add further extensions like new core courses such as AI regulation, or post-course extras such as a hackathon.
Intro to transformative AI
This new course would replace the first 2-3 taught content sessions of the AI alignment and AI governance courses. We would also remove the AI governance session in the AI alignment course, given it would be easier to take the AI governance course afterwards. Both courses would shorten as a result.
The course would effectively be a MOOC: a self-paced, non-facilitated online course that anyone could join at any time.
- Title: Introduction to transformative AI (or maybe ‘AI safety essentials’)
- Length: 5 sessions, self-paced. Each session will have more resources and exercises than our facilitated courses, to make up for the lack of synchronous discussions.
- Completion: Auto-graded assignments determine whether the user ‘passes’ sessions (some multiple choice, some free-text graded by peers or maybe an LLM). Considered complete when all sessions are passed.
- Participant selection: No application process, and participants can start at any time. Ideally they can start the resources for session 1 without registering, and then sign up to save their progress and do the exercises.
- Access: Participants get access to a Slack or Discord community we set up (or maybe just the AI Alignment Slack - we can discuss this with AED). This is not the main AI Safety Fundamentals workspace, which will be limited to people accepted onto the core courses.
Course structure:
- Session 1: How do we build AI systems?
- This is similar to the first half of the current session 1 of the AI alignment course
- By the end of the session, students should be able to:
- Define key ML terms including weights, biases, parameters, neurons and activations
- Explain at a high level how neural networks are trained with gradient descent
- State the structure of a trained large language model (LLM), and explain the base and fine-tuning LLM training stages
- Explain why compute, data and algorithms are key to training ML systems
- Session 2: What AI systems are people building, and why?
- This is similar to the second half of the current session 1 of the AI alignment course
- By the end of the session, students should be able to:
- Define transformative AI
- Understand economic and non-economic incentives behind developing transformative AI systems
- Describe key advances in AI capabilities over the past few years
- Explain arguments for how and when transformative AI might be developed
- Session 3: Risks of AI
- This is similar to the current session 3 of the AI governance course
- By the end of the session, students should be able to:
- List several present harms and near-term risks of AI
- Understand a detailed case study of AI harms of their choosing
- Describe how today’s AI risks might scale in the future as AI systems improve
- Explain how transformative AI might constitute an existential risk to humanity
- Evaluate potential scenarios for anticipated risks from advanced AI
- Session 4: Benefits of AI
- This is similar to the current session 2 of the AI governance course
- By the end of the session, students should be able to:
- Imagine what a positive AI future might look like
- Explain the potential economic and non-economic benefits of AI systems
- Critically evaluate claims about the benefits and risks of AI
- Session 5: What is AI safety
- This is new, but has some similarities to session 2 of the AI alignment course
- By the end of the session, students should be able to:
- Describe the key areas of AI safety, including alignment, moral philosophy, competence, governance and regulation
- Choose appropriate next steps for their background and aspirations
Standalone projects phase
Effects on other courses. This new course would replace the project phase of the AI alignment and AI governance courses. Those courses would become shorter as a result, and participants wouldn’t have to repeat the project phase if taking multiple courses (but could if they would like to). This might encourage more people to take the core courses: I know a couple of people who would greatly benefit from the course but chose not to apply because they didn’t want to do the project (as they already worked on AI safety, so doing project work wouldn’t be as novel or valuable for them). We could still offer a fancy looking certificate for having completed the trifecta (intro to TAI, a core course, and a project) - similar to other programmes that include a separate capstone project course.
Benefits for projects. We may find we get more people completing projects overall, because we give them an easier way to get their foot in the door and get started with the intro to TAI. Additionally, the people who did do projects would be self-selecting for people who are keen and committed, and think the format would work for them - hopefully leading to more productive and engaging sessions that lead to better projects. But both these claims are highly speculative and we don’t have clear evidence for them yet. Having a separate project course would also enable people to do multiple projects, as some participants have asked for. The high demand for programmes like MATS, LASR, Pivotal, PIBBSS further suggests there is interest in a standalone projects course.
Risks for projects. This may result in lower conversion from the core courses onto the projects. This is something we’ll need to test. When we previously made the project phase ‘more’ mandatory, this improved the quantity and quality of submissions on the alignment course. But we also changed a huge number of other things about the project phase at the same time.
Alternative projects. Splitting out the project phase enables running alternatives such as a hackathon or intensive writing week. We’ve had difficulty finding the ‘perfect’ projects structure, I think because people work differently so there is no one-size-fits-all solution. With the modular structure, we can run different formats and give participants the opportunity to choose what suits them.
Alternative post-core courses. Different formats of producing some project output also become one of many possible post-core courses. We could also offer other valuable courses to interested students, such as career planning (where students would learn about the kinds of roles available, which organisations are hiring in the space, how to write good applications, and then get support applying for places).
This can be done separately from splitting out the intro to TAI course, and this proposal can be thought of as two distinct parts. We could do the rest of the proposal while keeping the projects attached, or decide to split out projects later. I think the benefits from splitting this are likely slightly lower and more uncertain than splitting out the intro to TAI course.
Extensions of the modular structure
If we split things into this modular structure, we can also create new courses more easily - it’s just creating 4 weeks of content, and it’s quick to spin up pilots. Trying to design a new course on AI regulation was actually what prompted this idea, given I realised I was redesigning the introduction to transformative AI sessions again - and that many people I’d want to take it have already taken our AI governance course.
Open questions
Of course, all the above is a giant open question of ‘should we do it’. But there are also some more granular questions within:
Can participants skip the intro to TAI course? Some participants already understand this well, and may find this content boring. Maybe they should be able to jump straight into the core courses, for example with an assessment or special application. We probably don’t need to sort this out urgently, given participants have to do this already anyway. This might also be overcomplicating things, because we might expect people who already know the content well to breeze through the auto-graded assignments quickly.
Should the intro to TAI be facilitated? The proposal above suggests it’s a non-facilitated course. I think this is the right approach to enable scale, and get many people in the door first. However, this is different to how we’ve run courses previously. And there’s probably some benefit from being able to discuss the concepts more in a session. If we do have facilitation we’ll need some application process though: there’ll be too many people otherwise. We could also consider a hybrid approach: most people do the self-guided route, but we’ll facilitate it for particularly impressive people? (this adds a lot of complexity though). We could alternatively try peer facilitation, where one person is designated the role of facilitator each week.
How often should we run the facilitated courses? We’re currently running courses back-to-back - which means about every 3 months. For the new structure, we could run the core courses like this too, i.e. every month. But this would be a lot of logistical work and we’d need to significantly improve our systems.
Should we require more to graduate from the intro to TAI course? The proposal above suggests LLM or peer-graded assignments. These wouldn’t be timed or have stringent anti-cheating or anti-plagiarism measures by default, so would be easy to game. (Although gaming the assessment seems potentially higher effort than just doing the course properly.) If this does become a problem, we could explore maybe an interview with a facilitator, an invigilated exam or a mini capstone project (i.e. <4 hours). The motivation to game the course is greatest if we’re issuing certificates - we could decide not to (although I think this would decrease the course’s perceived value, which is particularly important to some students being attracted to us by our marketing).
Other anticipated concerns
Will the number of students on the intro to TAI course overwhelm us with work? The course is designed to be mostly self-serve. However, we haven’t previously run a MOOC-like thing, so it’s uncertain how much work this will be exactly. If it does become too much, we could later gate it with an application process.
Will this dilute the BlueDot course graduate brand? People who graduate from our courses tend to be pretty excellent: meaning that people trust the BlueDot certifications. Opening up the intro to TAI course to everyone might mean diluting this, changing the value of that certification. I think we could mitigate this by being clear about what the different certificates represent. I also think the risk of this is limited: already people often put us on their LinkedIn etc. when taking a course facilitated by other people, or sometimes when going through the course independently. We could also consider not issuing certificates for the intro to TAI course (although I think this decreases its perceived value and would probably be a mistake).
Will our course infrastructure hold up to 10x the number of students? We’ve hired a software engineer who can help us with this! If it’s still a big problem, we could offer a stripped-down version of our infrastructure for the intro to TAI course, where there’s limited functionality so the load is lower.
Will this structure confuse participants? This structure is more complex, and there are more paths participants could go down. However, they don’t need to know about these paths until they complete the intro to TAI course - at which point they’ll just have 2 or 3 courses to choose from, with guidance of which to take given that’s part of session 5’s learning objectives.