Curriculum development retrospective (AI Alignment March 2024) – BlueDot Impact

Curriculum development retrospective (AI Alignment March 2024)

By Adam Jones (Published on July 2, 2024)

This part of the retrospective is primarily for myself, to reflect on how we could improve the curriculum for future rounds of the course. It may also be helpful to others wondering how we do course development and identify resources we’d like to exist. By curriculum, we mean resources, exercises and discussion activities. Note that some meta things about the tools we use to develop the curriculum are in our systems retro.

The resources, exercises and session activities make up the curriculum - and were on the whole very well liked. This also carried through to other later indicators: attendance and project submissions were both the best we’ve seen.

There is still potential for improvement, especially in the project phase - this had lower resource ratings, and attendance dropped off the most. Making the course hub more reliable also seems likely to improve the experience of engaging with the course curriculum.

Which resources did people like and dislike?

People gave most resources high ratings. We ask course participants to rate readings in the course hub, and optionally provide written feedback. 91% of core resources had an average rating greater than 4.0/5, the average rating given for core resources was 4.35/5 (n = 6118), and the average rating given was 4.27/5 (n = 8400).

People particularly liked videos. Ratings were higher for video resources (average 4.5 vs 4.3 in core resources), and comments were usually very positive:

  • “The video format was fantastic!”
  • “Even as someone who knows quite a bit about neural networks, I found this video to be a helpful refresher.”
  • “I loved this video, it was so well done!”
  • “Shockingly accessible. Cleared up some confusions I'd had for months to years.”

The least liked resources were still pretty okay, and we have already taken action to improve these. The four least liked resources on the course were Nvidia: The chip maker that became an AI superpower (average rating 3.7), What is mechanistic interpretability? (3.7), How to get feedback (3.8), Compute Trends Across Three Eras of Machine Learning (3.9). These ratings are still not terrible - and we have already put out a call for better intros to mech interp. We removed the Nvidia article in our changes to weeks 0-2. We had already looked for substitutes for the other articles, but have not been able to find better versions that hit the same learning points - we may add these to a future call for resources, although they’re a lower priority than other content we’d like.

Almost all very low ratings were about an equal mix of participants finding content too easy or too hard. Given that on the same resource people would find it too easy or too hard we think we struck the right balance. And because this represents very few ratings overall (plus receiving other qualitative feedback that it is accessible while providing depth) we think this does not mean we should split up the current course by technical skill.

Other very low ratings. One participant accounted for most other very low ratings, and we have reviewed their feedback carefully and correspondingly updated some of our articles. Other very low ratings were sometimes helpful, although most of this was not easily actionable:

  • “1 star in terms of usefulness to me right now, but I think this is good advice, and I understand why it's included here and in this format”
  • “This is for aspiring researchers, so I'm the wrong audience”
  • “This just wasn't super convincing?” (on an article many others said was convincing 🤷)

Few people read the further resources. Further resources (excluding those that were previously core resources, or were directly referenced in an exercise because this skews the statistics) were rarely read by more than 20 course participants. This suggests we should spend most of our efforts on the core resources, and not that much on the further resources. (Some counterpoints: these 20 people might be the most engaged so we should support them, more people might not mark off further resources even though they read them because they feel it’s not mandatory)

Which exercises and discussion activities worked well?

Participants enjoyed exercises that fed into teach-swap-explain activities. These are exercises that involve researching a specific topic in more detail, explaining that topic to a person in the cohort, then discussing it as a group. Participants consistently highlighted these activities as some of their favourites in interviews, feedback surveys, and end-of-session feedback. They liked the opportunity to dig into content a bit deeper, felt the activity gave them accountability to do the exercise, and found the activity engaging. Rare negative feedback expressed that some of the extra readings for exercises were quite technical, or that the activity didn’t go well where some participants had not done the exercise thoroughly.

Participants enjoyed the unique activities we added for this iteration of the course. This included AGI scenario role-playing (session 1), trying to break real AI models (session 3), exploring SAE features (session 5). Most participants found these engaging, and often noted they learnt something from them in their session takeaways. The negative feedback here included scepticism about how realistic the scenario was (which we partly addressed by making changes to the scenario, and providing facilitators more guidance on answering clarifying questions), and some confusions using the SAE browsing interface (which mostly disappeared after providing more guidance on how the interface worked and reordering some of the activities).

Voting activities need more controversial statements. Voting activities involve participants choosing where they stand on different issues. Our lead cohorts found that participants agreed probably too much, so we tested different statements over the week. We still think we could try adjusting the statements further to encourage more lively discussions. Here's an example lead cohort voting table:

Participants in smaller cohorts want more group discussion, participants in larger cohorts want more breakouts. This was based on reviewing a number of lead cohort session recordings, and session document feedback. This does make sense, and is feedback I’ve also seen on other courses we’re running (particularly our Pandemics course). We’ve updated our facilitator training to more explicitly tell facilitators they can adjust session activity lengths to their cohort’s preferences, including this common preference.

Good facilitation matters, especially asking the right questions. Particularly for group discussions evaluating different proposals, the quality of discussion is affected significantly by the kind of space the facilitator creates and the questions they ask. We updated the session plans throughout the course with good prompts, and introduced extra facilitator guidance to support this. In general, prompts that get participants to find solutions (”how could you improve <proposal>?”) seemed more productive than those that got participants solely to critique ideas (”why might <technique> be insufficient for safety?”) or argue whether something was good or bad (”would <proposal> be useful?”).

Which sessions did participants like?

By sessions I mean packages of resources + exercises + discussion activities. Cohorts usually complete one of these a week.

All sessions were fairly well-liked. Resource feedback and session plan feedback was generally very positive about the activities across the board. Feedback on specific sessions included:

  • Introduction to ML (session/week 0): Several participants got confused between this and the icebreaker and this session given they were happening in the same week (this was even more confusing for lead cohorts). The feedback on the associated launch event was mixed, and many participants came at different preparation levels to this event. Going forward, we’re eliminating this weird week, and we’re simplifying launch events to focus on networking with course participants.
  • Scalable oversight (session 4): People really appreciated our summary article, and found it particularly useful to get a quick understanding of many of the different concepts. A couple of facilitators were less keen on the papers we selected here, and would have wanted to see easier-to-read resources here (we would also like better resources - we wrote our own intro because we thought there was no good alternative!).
  • Mechanistic interpretability (session 5): Some participants (especially those in less technical cohorts) expressed that the readings were quite difficult - we’re working on improving this. Many people also expressed that this and the scalable oversight sessions were the ones they enjoyed the most for their technical depth.
  • Project sessions (sessions 8-11): Session plan feedback was slightly less positive (but was still overwhelmingly positive). People found most of the value was accountability for meeting goals, rather than feedback they got on their projects. Similarly, people tended to find the project session resources useful, but less helpful and targeted than the learning phase sessions.[1] Facilitators often felt the suggested project ideas were too ambitious, particularly for people new to the field, which caused projects to overrun and could have demotivated some participants.

Session attendance was excellent throughout the learning phase, and likely the strongest we’ve ever had in the project phase. 74% of participants attended the final learning phase session,[2] 39% attended the final project session, and 36% submitted a project[3]. For comparison:

  • The previous AI alignment course (March 2023) had 44% attend the final learning phase session and 17% submit a project. The former is our best estimate, as we don’t have great statistics for our prior courses (see our 2023 impact report for more details).
  • The previous Pandemics course (October 2023) had 62% attend the final learning phase session, 35% attend the final project session, and 25% submit a project. We use this as a comparison as it’s the only other course that we have confirmed attendance data for.
  • 8-week online courses usually have completion rates from 2-15%, and 12-week online courses have completion rates of 1-8% (source). Newer studies suggest completion rates were decreasing from 2012, when the original study was done, to 2019.
  • University tutorials usually have approximately 30%-60% attendance, but this varies significantly (source 1, source 2, source 3).

There is still significant room for improvement on project phase attendance. Despite being our best yet, there’s still clearly a big drop-off between sessions 7 and 8 (74% → 54%).[4] We think that publishing more details about projects early, and encouraging participants to come up with project ideas for a session 7 exercise improved attendance. Next iteration we plan to improve by:

  • Not rearranging cohorts for the project sprint. Initial analysis of the AI governance course where we applied this learning seems to suggest this helps with follow-through to the project phase. This was also supported by some end-of-course feedback from participants: “I would have preferred to keep the same cohort [for the project sprint] as the discussions for the first 8 weeks.” and “The transition from course to project [wasn’t great] - specifically, the change in meeting time with a new team.” Multiple facilitators also suggested this was a mistake: “I think shuffling cohorts for the project phase a poor idea. My cohorts had built a bond between participants that would have kept them more involved during project phase. The project cohorts were much less engaged and had a much higher attrition rate.” and “The break between "Learning Cohort" and "Project Cohort" was very bad.”
  • Having the guidance up right from the beginning.
  • More clearly communicating the time commitment of the project phase on our course details pages, which people usually read when applying.
  • Publishing example projects people can take inspiration from.
  • Creating templates for writing up projects to make this easier (we also hope this improves how well participants communicate about their projects).
  • Explore other ideas to better support participants, including co-working sessions, drop-in office hours or possibly a hackathon.

What were people’s high-level thoughts?

End-of-course feedback was very positive. We sent the feedback survey, plus a reminder to complete it, to everyone who was accepted to the course (including if they dropped out early) to ensure we heard from people with a range of perspectives. We asked participants “What did you find most valuable about the course?” and “What wasn't great about the course?”. Responses were broadly very positive, and common themes included liking:

  • (63% of responses) resources and exercises: “I loved the amazing compilations of papers, articles, and resources on each topic! There is so much content out there that having these curated reading lists was extremely helpful for me and I enjoyed doing the readings every week.”
  • (50%) group discussions (both for the learning value + accountability): "I found the weekly virtual meetings so incredibly valuable! It was fantastic to get to other folks interested in the topic.”
  • (24%) interactions with other participants (including at networking events): “Great insights from fellow participants, very good observations that changed my perspective or gave me new perspectives on the consequences of certain developments or AI issues.”
  • (20%) interactions with knowledgeable and friendly facilitators: “My cohort's facilitator, [name], was excellent! He did a great job at getting the conversation started and making us think through our own questions before answering them. I learned so much from him and my classmates. The human component made a big difference for me. I only wish it had been longer.”
  • (15%) being able to complete an independent project: “I appreciate that I was pushed to work on a project and held accountable by the mandatory weekly meetings. This allowed me to create my first portfolio project which I'm thrilled about!”
  • (9%) feeling part of a community: “I made great friends and partners for collaboration. This also made the course content much more engaging. I was also very happy to be welcomed by such an open and kind community, both from BlueDot Impact but also all the participants and facilitators.”
  • (7%) receiving useful career guidance: “Networking with peers passionate about mitigating AI risks was also a significant benefit, providing both collaborative opportunities and career guidance.”

There are still areas to improve, particularly with our systems. Negative themes raised in the end-of-course survey include:

  • (16% of responses) technical issues with Course Hub, including losing exercise responses and being slow to load: “The course hub was a bit confusing sometimes and when I would hit submit on my exercises, they would not update and I would lose a lot of work.”
    • Response: We’re sorry that participants had difficulties with Course Hub, and we’re working on fixing these (also see our systems retro). Many of these issues are with underlying suppliers, and it requires a lot of effort to move off of these suppliers. However, we have migrated away from one supplier that has been related to these issues (blog on this coming soon), and are mid-migration for our login service (which would enable us to migrate more critical services away from other suppliers). We’re also hiring product managers and software engineers to make our systems better!
  • (6%) desiring more technical content: “I think the inclusion of some more technical exercises (maybe optional) in the first part of the course would help build more familiarity and confidence that could help learners get a head start in the project phase.”
    • Response: We have added optional technical exercises to sessions 3, 5 and 6 and are working on adding one for session 4. We have also reviewed the further resources and added a couple that are more implementation-focused (although relatively few people read the further resources).
  • (3%) project sessions were not as useful as learning sessions “The later project sessions were good for motivation and accountability, but the open structure/format wasn't the most helpful for me.”
    • Response: We’re reviewing ways to make these sessions more directly helpful, although it is very difficult to find a structure that works for everyone. Also see the sections above on project session attendance.
  • (3%) intensity of the content, and accuracy of estimated reading times: “I found that the course took more time to complete than predicted (I believe a figure of 5+ hours per week was estimated)” and “I found the estimated reading times to be much lower than they actually were”
    • Response: We’ve been discussing the course intensity with various participants to understand where we should be adjusting reading times. We have also removed some core resources from particularly heavy sessions, reworked sessions 0-2, and are working on creating and sourcing better resources that are faster to read.
  • (2%) attendance of other participants: “The number of ‘regulars’ in my cohort was rather small, only 2-3 people including me (if I remember correctly). It would have been great if cohort attendance would be more towards 4-6 regulars.”
    • Response: We encourage participants to attend their regular session where they can. We do offer cohort switching as a way to help people keep up-to-date if things come up, which can mean a smaller group of regulars in your cohort. At this time I think we’re striking the right balance of flexibility vs cohort stability (we recently reviewed this on the AI Governance (April 2024) course) but we are keeping an eye on this.
  • (2%) session plans get slow with lots of comments on a Google Doc: “At times, the google doc got too long and was very slow to load and use, so our facilitator broke it up into multiple documents.”
    • Response: We’re aware that very long Google Docs can get slow (particularly if documents have a lot of comments, and participants don’t have powerful computers). We previously have experimented with different formats and found this was the best option despite this limitation (but we’re open to suggestions of better platforms!). We encourage participants to flag when they’re experiencing document slowness to their facilitator, and facilitators to split the session plan document into multiple pieces if this is necessary. We have considered a suggestion to have activity documents for each session, and share a folder with participants - however at the moment we think the complexity this adds both to the participant experience and to running the course logistics currently outweighs the benefits.
  • (2%) facilitated sessions felt rushed: “The discussion session sometimes would feel a little rushed but that's understandable due to time limit and wide variety of view points to talk about.”
    • Response: We’ve adjusted the timings of many specific session activities in response to the very valuable feedback participants and facilitators have given us. This has included deleting many parts of activities that are less crucial to allow for more time for unstructured discussions. We have also reiterated guidance to facilitators that the session timings are guidelines and not strict limits, and to use their judgement as to when to move on from activities. We’ll continue to monitor this and welcome feedback on specific session plans or facilitation styles.
  • (2%) changing cohort from the learning to project phase: “The transition from course to project [wasn’t great] - specifically, the change in meeting time with a new team.”
    • Response: We’re not going to change people’s cohorts like this in future - for more, see what we’re changing. We changed this for our AI governance course and it does seem to have gone better.
  • (1%) overly focusing on career impact: “The final few lessons however, especially the one concerning careers, felt out of touch and out of character for what had been a very educational experience up until that point.”
    • Response: We strongly believe understanding career paths in AI safety counts as education, and is important for people being able to have a significant impact in this field (BlueDot Impact’s mission explicitly mentions supporting people to pursue a high-impact career). We think the course details pages presented to people when applying make it clear this course is oriented towards people interested in working on technical AI safety research (see especially sections ‘Who this course is for’ and ‘You’ll be supported to take your next steps’). We therefore do not have plans to remove a focus on going to work in technical AI safety research (but we will consider how we can smooth any perceived transition).
  • There were a number of other points of feedback given less frequently. We have read all the feedback people have given on the course and greatly appreciate this! We have made some other minor changes not mentioned above in response. We haven’t actioned every piece of feedback, particularly where we had conflicting feedback, e.g. some people expressing they particularly liked parts or features of the course that others wanted removed or changed.

Conclusions

Overall, the curriculum and overall structure of the course seem very strong. Attendance was the highest we’ve ever seen, project submissions were both the most numerous and the highest-quality we’ve ever seen, and participants praised most of the core areas of the course.

Areas for improvement include the project phase (based on attendance), specific resources (based on resource ratings), course hub reliability and providing more technical content (based on end-of-course feedback). We’re currently improving on all of these areas, and a few improvements have already been implemented for the AI Governance course and are showing promising results.

Go back to the top-level retro page.

Footnotes

  1. We did expect this, as it’s very hard to give good general advice – plus we wanted people to be able to spend time focused on their projects. We therefore picked just one short (usually 10 minutes) read per week.

    Session 11’s low score is already discussed above – this is the 3.8 rating from the single How to get feedback resource.

  2. The alignment course had 1 pre-course week (session 0), plus 7 ‘proper’ sessions. This likely had some effect on comparisons against our prior courses, but we think the impact of having one fewer ‘proper’ session is minor: for example the drop-off between sessions 6 and 7 was 75% → 74%.

  3. This number is increasing over time as people are submitting late. We expect this will increase the number of submissions by 1-3%.

  4. It should be noted that attendance is a proxy metric, and even more so in the project phase. Anecdotally, multiple participants participated in the project phase but decided to have 1:1s with their facilitator instead of attending their standard session, as they found it worked better for them. Others only turned up to some project sessions because they wanted to spend more of their own time working independently on their project. This suggests we should be reviewing the utility of the project sessions (which we are!) and ensure our offering is compatible with different people’s working styles.

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.