AI Alignment June 2024 course retrospective – BlueDot Impact

AI Alignment June 2024 course retrospective

By Adam Jones (Published on October 30, 2024)

This is a review of our June 2024 AI Alignment course, which ran from June to October 2024.

Summary

The June 2024 AI Alignment course showed improvements in project completion and quality. We saw 50% of participants complete projects (up from 36% in March), with 56% of submitted projects rated as high-quality (up from 42%).

Session attendance was slightly lower in the learning phase: 67% attended session 8 (down from 71% in March). We think this might be because we introduced the new session 5 which added an extra week to the course.

In general people valued the curriculum giving them a solid overview of the field, and the discussion sessions helping them with (1) understanding, (2) networking, and (3) accountability.

Key areas to improve include:

  • marketing, particularly by experimenting with Meta Ads and the copy we use
  • attendance management, i.e. balancing, merging and disbanding cohorts smoothly
  • coverage of catastrophic risks, particularly session 2
  • the accessibility of some of the curriculum, particularly session 6
  • communication, particularly around the project phase
  • stability of some parts of the course hub

Objective

This retrospective is primarily to learn how we can improve the course in future rounds. It will therefore focus more on where we can improve, rather than trying to quantify the impact this course had.

However, even with improving the course we need to understand the axes which we are trying to improve it on.

We run the course because we want more people with the motivation, skills and context having more positive impact on the world’s most important problems. We’d therefore ideally be seeing if we’ve improved on this.

However, measuring this directly is very difficult. Even if we did have perfect information as to whether our graduates were having a positive impact on the problem AND we could tell how much we contributed to this, it’s slow to see changes to this metric as people often still need to upskill further before having a huge impact on the field.

We therefore use some proxies, and assume the following are good directions to improve in, all else held equal:

  • For marketing, more accepted participants is better (assuming the quality bar is held constant)
  • For the resources, higher ratings are better (assuming the learning objectives are still being hit)
  • For the learning phase, increased session attendance and greater subjective understanding of the learning objectives is better
    • NB: there are also places where we think we need to change some of the learning objectives in response to feedback
  • For the project phase, more submissions and higher quality submissions are better

(Separately we are also trying to get closer to measuring the thing we want, an internal project known as operation priority paths… But that’s a story for another article!)

Marketing

We experimented with different platforms for paid ads, spending different amounts to get a click, application, and accepted application from LinkedIn, Twitter, Reddit, Blind and DEV Community. We analysed this already in a previous article, where the top level results were:

Platform Cost per 1000 impressions Cost per click Cost per application Cost per accepted application
LinkedIn £7.54 £1.67 £75.83 £227.59
Twitter £0.33 £0.20 £136.82 £345.30
Reddit £0.48 £0.17 £1,574.55 -
Blind £5.63 £0.58 £804.89 -
DEV Community £7.77 £9.73 - -

We also tested out many different creatives. We used a completely different set of creatives to our March course (our previous retro found our paid marketing here was fairly poor), which we analysed in another article. We also tried out smaller tests to creatives to understand how to optimise smaller changes.

There are many places we didn’t pay attention to and could still improve for next time:

  • Improving the copy, as in hindsight the copy was fairly weak and we didn’t test out any changes to it
    • We believe we improved the copy significantly for the next iteration of the course. It now focuses much more on the benefits and credibility of the course, rather than something generic about AI. We have not yet thoroughly tested further changes we could make.
  • Trying new major ad platforms including Meta (Facebook and Instagram), given it has worked well for others in this space including 80,000 Hours and AI Safety Hungary
    • We ran ads on Meta for the next iteration of the course. We haven’t got around to analysing this data closely yet, although because of some difficulties getting them set up properly for this round might still not be able to draw very reliable insights.
  • Working on unpaid marketing, such as boosting referrals or posting in community groups
    • We did this much better on the next iteration. We also created a tool for ourselves to track these promotion efforts more formally, so we can understand where our efforts were best spent.
    • There’s likely still room for improvement here, especially around participants referring other participants.
  • Improving the website landing pages, these haven’t been well optimised for getting conversions
    • We have made minor improvements, that include clarifying the focus of the course, making the curriculum more accessible, deleting duplicate content and publishing clearer details about certificates.
    • However, a more radical overhaul would likely be helpful if we are to scale up more.

Resources

Resource ratings were high across the board, with every session except session 2 receiving an average resource rating >4. In the end of course feedback, participants generally thought the curriculum was high-quality and provided them a lot of value, particularly for getting an overview of the field.

As well as session 2 (What is AI alignment?) having lower resource ratings, we also subjectively think it is weaker at hitting the set learning objectives, and the set learning objectives are not incredibly clear. We did improve this a little from the March course, but are making further and more significant changes for the October course.

Session 5 (Robustness, unlearning and control) was a new session for this course. It had reasonable ratings throughout the resources, with every resource having an average rating >4. Other points:

  • There was some disagreement in the written resource feedback about the intro video resource. While most people rated this highly and appreciated the introductory nature of it, the examples it provided, and its humor, others thought it was too basic and wanted more implementation details. We also provided a CAIS video as an optional suggestion which people who complained about this tended to appreciate.
  • AI control was seen as the less relevant topic in this week. We are considering scrapping this as we don’t build on it later in the course, there are limited good resources to use here, and it’s reasonably straightforward to pick up later.
  • Adding an extra session, and therefore lengthening the course by a week, probably did cause a drop in attendance compared to the last iteration.

Session 6 (Mechanistic interpretability) had the highest resource ratings, and was also often praised in the end of course feedback. However, it was also the week most complained about in the end of course feedback with several participants saying the resources were very tricky for them. Multiple people suggested splitting this over 2 sessions.

Other end of course feedback included:

  • Some facilitators flagged that the course might not be clear enough on the focus on catastrophic or x-risks. We agree with this, and think this is a failure of session 2 in particular. One facilitator also suggested the techniques taught in the course do not help people understand how to address fundamental problems underlying AI alignment.
  • Some participants wanted more of a certain style of resource, such as videos, podcasts or diagrams. We’ll be continuing to review the resource formats we provide (including updating several of our own resources since the last iteration to add better diagrams, and ensuring all the content is available in podcast form).
  • Time estimates for reading technical content were often underestimated. We will be reviewing and updating these where appropriate, although it’s hard to find a time that is correct for all people as it depends a lot on their pre-existing familiarity with the content, their reading speed, and how carefully they are investigating the resources.

Exercises

This iteration, we experimented with comprehension exercises for each learning phase session. These were well received, and satisified some participants desire to have graded quizzes (even if in this case they were self-graded).

We also added some optional technical exercises, including for sessions 3, 5 and 6, as well as more prominently linking to other sources of similar content such as the ARENA course for mechanistic intepretability. Some participants still wanted more technical and hands-on coding exercises, and we have added a few more technical exercises since and will be continuing to review the quality of these and whether we can add more.

Participants faced course hub issues with saving exercise responses. Exercise responses would frequently get cut off or disappear. We believe we have now fixed this.

Separately, when we updated the course hub, users would not be able to save their exercise responses before reloading. They often used a workaround of copying the content into their clipboard, refreshing, and pasting it again. This is currently a limitation of the platform we have built the course hub on, but we will investigate whether a workaround is possible.

Discussions

Attendance

Overall attendance was fairly similar to the last iteration. It’s plausible we did better given we had people stick around for an extra week - but hard to know the counterfactual if we had run the course without session 5.

Project phase attendance was better - and given we had a lower percentage of participants turning up by session 7/8, this meant the drop-off was considerably less compared to last iteration. We’re uncertain what exactly did this, but we think it might have been the changes we made to the project phase. These changes included:

  • keeping people in the same cohort, to reduce disruption and boost accountability
  • providing more hand-holding in the project phase, including:
    • a project proposal template, to help unblock people at the start
    • more explicit accountability check-ins and goal setting, to keep people going throughout
    • a write-up template, to help unblock people at the end
    • more explicit peer feedback requirements, to increase project quality
  • having top project samples to make knowing what you were supposed to do less vague
  • we experimented with project feedback, but this wasn’t it

We’ve previously written more detail about our rationale for some of these changes.

Session quality

In the end of course feedback, many people highlighted the value of the group discussions to them. People often mentioned the discussions helped them:

  • understand the materials, through both the highly structured activities and broader discussions
  • build connections with other participants
  • have accountability and support to do the readings

People generally liked the activities, and several positively mentioned the teach-swap-learn activities and AI safety bingo.

A couple of people thought the session plans got repetitive towards the end as they reused the structures a lot. We have considered this feedback - we think we reuse certain structures (think-pair-share and teach-swap-explain) because they work very well to teach the concepts, and probably won’t be making changes just to introduce more variety given the rare frequency of this feedback.

Attendance dropped throughout the course, so some cohorts got smaller. Because this was uneven between sessions, some cohorts became very small and we needed to disband them. This caused disruption as people got split up, or had to change their regular meeting time. Some participants and facilitators told us this was frustrating. In one case, we did not give a cohort and facilitator a lot of forewarning that we planned to disband their cohort, or an option to let them continue, which was likely a mistake.

On breakout rooms, participants tended to appreciate the increased engagement they allowed and felt they could network with other participants more closely. A few people wanted either longer or shorter breakouts, or more or less breakouts (relative to whole-group discussions).

The platforms (BlueDot Meet, Zoom, and Google Docs) tended to work smoothly for the discussion sessions. The one exception was that Google Docs became laggy in later weeks due to the length of the session document. This was also raised on the March 2024 course, but unfortunately for the same reasons set out in its retrospective (no better alternatives, ability to split the doc if it is becoming seriously unweidly) there is not much we can do here.

There was relatively little feedback on specific discussions. The one exception was for the AI governance session, where some participants and a facilitator commented that the discussions ended up being quite US centric, or led to assuming that Russia was a bad actor.

Projects

Submissions

We got more project submissions this course, with 50% of participants submitting a project (compared to 36% and 17% for the March 2024 and February 2023 iterations):

The quality of submitted projects also improved, with 56% of submissions deemed to be ‘high-quality’ (compared to 42% on the March 2024 course).

This meant that of the participants on the course, 28% of people submitted a high-quality project, almost double the 15% on the March 2024 course.

Judging

This was the first course iteration where we experimented with hiring project judges more formally. We had previously relied on various connections we had, ourselves, or some of our facilitators.

We posted the job advert on our website, the AI Safety Fundamentals Slack, and LinkedIn. It also got picked up by a few newsletters, including the 80,000 Hours newsletter which resulted in many applications.

We had to make some changes to our hiring systems to enable us to handle these applications, which have made us better at running these kind of hiring rounds in future.

We gave people a brief work test where they judged a project and gave feedback on it, to help screen applicants. We then gave accepted applicants access to an interface for judging projects, where they scored projects on the rubric, flagged them for prizes, and gave participants feedback.

Given the investments we had to make in setting up the job post and updating our hiring systems, we don’t think this saved us much time judging projects. However, we think these investments will make hiring for future iterations much easier and will both (1) free up considerable time in the judging process and (2) enable us to scale our courses more easily.

Other feedback

Course feedback on the project phase we received included:

  • Sessions provided accountability. Many participants noted the project sessions main benefit to them was the accountability, including being made to set a goal for the week.
  • Comments about feedback quality. Some people found the feedback in the project sessions useful for improving their projects. Some people found it less helpful because:
    • Other participants were working in different areas, and didn’t have relevant expertise.
    • The time for being able to give and receieve feedback was too short. One participant suggested more intentionally pairing people who had a lot of progress to give feedback on with those who didn’t have as much.
  • Time constraints being too tight. Some participants and a facilitator thought that 20 hours was difficult to do a substantive project within, and that several of the example project ideas we had listed would take a beginner more than 20 hours.
  • Desire for more technical guidance. People often hit implementation problems in their projects, and wanted more support with research engineering.
  • Wanting clearer deadline communication. Several participants and facilitators noted the project submission deadline was not clearly communicated, and they had struggled to find this. This was a mistake, and we have now updated this so it appears on the main course page, and is listed on the final course week.

Subjective participant understanding

We user interviewed many participants, spoke with several facilitators, and have seen participant’s projects and some initial post-course steps. We considered (subjectively) how well we were doing at the three pillars we mentioned earlier: motivation, skills, and context.

Motivation

On motivation, we think we are doing an okay, but not excellent, job here. A common failure for us here is participants failing to grasp the impact of catastrophic risks, or misunderstand which parts of AI safety we think people should be motivated to work on. We attempted to correct this on the course by shaping the projects, but don’t think this was sufficient. On the next iteration, we are making changes to session 2 to make this clearer.

Skills

On skills, we’re again doing an okay, but not excellent job. We haven’t been as intentional as we should have been about the procedural knowledge the course is developing. The skills we have at times tried to optimize for at different parts on the course include:

  • Critical thinking, i.e. being able to come to reasoned judgements given information. We mainly evaluated this through facilitator feedback on participants and user interviews. We think the discussions contribute to this somewhat, but aren’t very direct. Applying principles of deliberate practice here could likely help a lot.
  • Written communication, i.e. being able to explain ideas and information to other people through writing. We mainly evaluated this through the project submissions. We tried to improve this with more directed feedback on people’s project write-ups, a project write-up template, publicising the evaluation criteria (that promote communication as a key value) earlier and more obviously, and refreshing the resources on writing. We did slightly better this time around than the March 2024 course, but not by a lot.
  • Research taste, i.e. being able to distinguish between good and bad research ideas. This was mainly evaluated through project submissions. We’ve did better this time, and I think the project proposal template helped quite a lot. But for most people this is more going from ‘poor’ to ‘okay’, rather than to people being ‘good’ at this skill. I think this is also closely related to critical thinking, and having clearer motivations to work on AI safety (as this helps prioritise ideas based on whether they’d help solve the problem you want to).

The skills we’ve mentioned above are not really inherent to AI alignment, and there is a question as to whether we’re in the right place to be trying to develop these. However, they do seem at the core what many hiring managers tell us are missing from people trying to enter AI safety, and we do very often see participants struggling with these. (We should also probably write down our rationale for these in more detail separately!).

We have also considered trying to select only those participants who are strong at these three, and then prioritising upskilling them on motivation and context. However, there are relatively few applicants who would meet these criteria. (This doesn’t mean it shouldn’t be what we do, but does mean this would likely change the shape of our offering considerably and so we’d need to consider it very carefully).

Context

On context (or knowledge), participants seem to pick up this very well. We put a lot of effort into ensuring these were conveyed accurately including creating a lot more of our own content for this iteration of the course.

The knowledge of the techniques covered by the course was well reflected in the:

  • user interviews with participants, where most were able to recall the gist of the key techniques covered
  • exercise submissions, including the comprehension questions
  • discussions, according to some facilitators
  • project submissions

Events

During the course we ran several events, including a launch event, networking events, coworking events and a closing event.

Few people signed up for the coworking events, so we likely won’t be running these again without changing their format or framing.

Some participants were confused about when the launch and closing events were, or by the registration process. One suggested automatically registering people for the launch and closing event. Additionally, several participants were confused about whether the launch or closing events were mandatory (they are optional).

We usually record and upload the launch and closing events, to post them as a unlisted video on YouTube shortly afterwards. Unfortunately, our YouTube account got suspended for unclear reasons which meant we lost a lot of old recordings and created delays uploading the new recordings. We have added an internal task to add our YouTube channel videos to our backups avoid losing them in future.

Also see

If you found this article helpful, you might also want to see our retrospective of our March 2024 course.

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.