Summary of AI alignment participant user interviews – BlueDot Impact

Summary of AI alignment participant user interviews

By Adam Jones (Published on September 13, 2024)

We conducted user interviews with participants on our June 2024 AI Alignment course. This is a summary of common themes.

If you find this interesting, also see our March 2024 course retrospective.

Context

We run online courses on AI safety. These courses use cohort-based learning. This is where we group 5-8 participants into a cohort with an expert facilitator. Most cohorts meet weekly for discussion sessions with set activities, based on the readings and exercises they do independently.

This week I've user interviewed course participants on our June 2024 AI Alignment course. This was to follow-up on some feedback we received about our discussion sessions. We wanted to learn how participants engaged with the resources, exercises, and discussions. We also wanted to understand what made discussion sessions valuable and engaging for participants.

We interviewed people with different levels of technical background. We tried to interview people who had dropped out from the course, but did not get responses back from them. This means these interviews were only with people who completed the learning phase.

We’re publishing this article in line with our views on working in public.

Opportunities

  • People wanted to understand technical content they found difficult.
    • People often worked around this by asking AI models. Some people found AI models hallucinated when doing this, particularly if not provided the actual paper.
    • Some took notes while doing the readings to help them better understand them.
    • People found the resources that we wrote on our blog to be some of the best. Some people suggested writing more of these, having an explainer for each week.
    • Some people expressed that visuals stuck with them, and they might want more visual or video content. This aligns with us previously receiving high resource ratings for video content and our own content. People also used this in practice when we asked them to explain RLHF, some people mentioned thinking back to the diagram we gave them.
    • Some people said hands-on learning was particularly engaging and helped them remember concepts. They highlighted particular exercises and activities where they interacted with real AI models as useful.
    • People used the discussions to help answer their confusions, sometimes. People sometimes felt other participants also didn’t know, and the facilitator was the most useful person to answer questions though.
    • People noted sessions 3 (RLHF + constitutional AI) and 6 (mechanistic interpretability) as particularly challenging. They also found people got confused about inner and outer alignment.
  • People wanted to not feel dumb after not understanding technical content.
    • Felt comforted by encouragement from facilitators.
    • Felt supported by cohort peers, both encouraging them and also them expressing vulnerability like struggling with some of the resources.
  • People want to be engaged
    • The sessions were generally found to be engaging. People usually said they were highly engaged throughout and very rarely or never bored. Certain facilitators were called out as being particularly engaging.
    • Most people highlighted the teach-swap-explain breakouts and the agree/disagree statement breakouts as being their favourite activities.
      • People noted the teach-swap-explain created accountability for everyone to do the exercises and activity properly, and enabled them to go and research something unique.
      • People said they liked the agree/disagree statement breakouts because this gave them time to think through their positions, and argue them with someone who had a different perspective.
      • One person said they disliked the teach-swap-explain activity. They explained that other people in their cohort sometimes wouldn’t have researched the case study very deeply, or people would do the same case study because people didn’t coordinate in their Slack channel.
    • People said they enjoyed the exercises. They said having to reflect on the resources and actually write stuff down helped clarify their different thoughts.
  • People wanted to deeply understand the material
    • Some people were proactive with their learning. They took notes, and were keen to answer questions in discussions, because they knew that is how they’d get the most value out of sessions. They believed forcing themselves to explain concepts in words would help them clarify their thinking, and give the facilitator to correct any misunderstandings they personally had. One person wrote questions on the resources and sent them to the facilitator before each session, which they found very helpful. They fell out of this habit as work pressures increased, but said they wished they had kept this up.
    • Few people did the optional readings. Many people looked around for other resources online and would discuss them at the beginning of their sessions.
    • People expressed they went to the sessions to practice using the concepts. One participant compared this to learning a new language - the resources are like vocabulary lists, but you have to use it and think about it (in the exercises and sessions) to properly develop the skill. They felt supported by their cohort to do this practice.
    • People also said they went to the sessions to hear alternative perspectives. This helped them better understand the materials.
    • People also said having to discuss the materials made them realise they didn’t actually understand them as well as they thought they did. People found this useful and motivating, as it highlighted gaps where they could improve.
  • People had different motivations for taking the course (in practice, often a combination of the below factors)
    • Some wanted to directly advance their careers.
      • They often mentioned wanting to pivot into AI safety, or figure out how to use the concepts in their current role.
      • Some people arranged in-person meetups which they enjoyed. But this was mainly by chance by bumping into people in cohorts.
      • People said they might have wanted more networking. But none mentioned the events where we offered networking as a way of helping here.
      • Some people misunderstood what the course was about, but still continued because they found it interesting.
    • Some people wanted to prepare for the future.
    • Some wanted to improve their ability to talk about this space.
    • Some were intellectually curious.
    • In general, people thought they had achieved the goals they wanted to on the course. One person said that compared to themselves 6 months ago, they were more knowledgeable and could understand what others were saying about AI safety much better.
  • People were anxious about the project
    • They wanted more information sooner. In particular, how ‘technical’ the project needed to be. One participant was worried it had to be an ML engineering project.
      • Context: Projects can be on anything relevant to AI safety. This could include writing explainers, theoretical research, AI governance proposals or analyses, or many other non-ML-engineering topics! (This is true at the time of writing, and we don’t have any imminent plans to change this.)
    • They didn’t know whether they would come up with a good project idea. One participant suggested facilitators might say something like ‘that could be a good project’ when relevant during discussions in the learning phase.
    • Multiple participants liked the project plan template. They found it helpful for fleshing out their ideas, and gained confidence from having followed some well-defined process. Multiple people said they discovered new and better ideas from going through this process.
    • They didn’t know if their project idea was good, because they didn’t receive feedback on it.
      • Context: We didn’t give detailed feedback on all project proposals submitted to us, due to resourcing constraints. This course iteration, we’re testing out whether feedback ended up improving the projects at all. If it does, next iteration we plan to hire more people to help us with this.

Other behaviours

  • People did the resources in very different ways
    • Almost all participants were already very busy
    • Some people regularly blocked time to prepare for sessions, while some people just prepared when convenient. One person agreed with their manager to spend a day a week just focusing on our course as part of a professional development initiative.
    • Some people did the readings several days in advance. Others did them immediately before the sessions. One person suggested a 1-day reminder would have been more useful for them than a 3-day reminder.
    • Some people had very regular job schedules, while others were travelling through different timezones doing different jobs.
    • Some people read the resources, some people liked listening to the resources while doing other activities (such as exercise), some people sat down to listen to the resources and take notes, and some people tried to find video alternatives. People appreciated the different formats giving them flexibility. One participant also said the audio format helped them take the course with their disability. A different participant said the reverse-classroom model (where they could do the resources at a time and pace suitable for them) helped them take the course with their disability.
    • Some people completed the resources in order, while others did them out of order. Where people did them out of order, they’d usually start with the shortest or easiest looking resources.
    • Almost everyone completed all the mandatory readings each week. People mentioned a sense of commitment to do this - often either to their cohort or to us (as the central BlueDot team). People often expressed that they could see how much effort BlueDot put into the course and curriculum so felt they should put in the effort too. (NB: This is skewed because we only managed to interview participants who were still participating on the course by week 10. We did invite several participants who dropped out of the course earlier, but they did not respond.)
    • Some people browsed the optional resources. More often people would just search around the topic themselves.
    • Most people used AI to help them explain complex technical concepts. Usually participants would upload the paper to a standard LLM and ask the model to explain it in simpler language. Participants found this sometimes would result in hallucinations (particularly if they didn’t upload the paper), which they’d only realise were wrong in the session.
  • Discussion sessions varied a lot, but seemed mostly helpful
    • Some sessions sounded very active. This appeared to be a combination of facilitator encouragement as well as the specific people in the cohort. It seemed much more productive when people were proactive about their learning - keen to ask and answer questions, or share extra resources.
    • Other sessions sounded less active. Sometimes participants and facilitators would not follow the instructions, and there would be limited engagement outside sessions. Some cohorts would have participants that were hesitant to venture answers. One person felt disappointed when their cohort got smaller due to switches and drop outs.
    • Most people mentioned group learning as a motivator for them continuing the course.
    • People tended to enjoy the Google Docs format. One participant said speaking and typing at the same time was difficult with their disability.
    • There were some common ‘good’ facilitator behaviours that were appreciated:
      • Enabling high-quality discussions to run over a little, and cutting lower-quality ones short. But at the same time not overrunning too much.
      • Summarising people’s points in a clear way that made them feel heard and helped others understand.
      • Providing individuals thinking time. This enabled people to think for themselves and gather their thoughts.
      • Using breakout rooms. This had people engaged for more time, and 1:1s helped people connect and form stronger cohort bonds.
        • However, some participants felt breakouts were less helpful for the very technical concepts where the pair might not know the right answer between them.
        • NB: People didn’t express wanting fewer breakouts to have more group discussions directly. This is something we saw from governance course participants, which we haven’t seen on the alignment course. There are about the same number of breakouts, so this is probably to do with the course content or audience rather than the session structure inherently.
      • Being active in the Slack to answer questions and promote engagement. Also helping make participants aware of other things in the Slack e.g. the #opportunities channel.
      • Running an effective icebreaker to make the participants feel better connected. People expressed that knowing other people were on a common footing, and setting the context of it being a comfortable low-stakes setting made them more comfortable to risk saying something incorrect.
      • Directing questions to specific participants when they had not spoken in a while.
      • Engaging with people where they were at, and leading them to the right answer.
      • Challenging people to think critically, for example asking ‘Yes that’s what the companies are saying - but do you think it’s actually true?’
      • Knowing the subject material very well. Participants were impressed that facilitators could provide papers relevant to discussions and projects, and they found this valuable extra reading.
      • Commenting on what people wrote in the session document, as well as on the project proposal documents. This made them feel heard, improved their understanding, and improved their projects.
    • People switching cohorts had different experiences.
      • Some people really enjoyed the cohorts they switched into and felt welcomed. They liked that not too much time was spent on introductions and they could get straight to business.
      • Some people didn’t like the cohorts they switched into as much. They didn’t feel as connected to the participants, so were more hesitant to put forward answers or engage as intensely in the discussions.
  • People tended to have a vague but correct understanding of RLHF and its limitations.
    • Context: we asked people to explain RLHF as best they could. They had learnt about this ~8 weeks ago. This was to help us evaluate whether they had learnt the concepts well and correctly.
    • Most people had an accurate, if a little vague, understanding of RLHF. People tended to understand the high-level picture of:
      • You start with a base model trained on internet text.
      • You get it to generate responses to prompts, and have humans give feedback on these responses.
      • This feedback is fed back into the model to have it generate more of the kinds of responses that humans rate highly.
      • This can cause problems like sycophancy, deception or hallucinations. This is because humans often pick responses that make them feel nice, or look right but are actually wrong.
    • However, few people could explain the detailed steps of how this feedback is fed into the model. Some mentioned ELO scoring, the reward model, or the reinforcement learning step.
      • Context: The ideal answer we were looking for is that humans make comparisons between responses to the same prompt, these would get a score in the ELO ranking system after many comparisons, a reward model would be trained to predict these scores, and this would be used to train the base model to produce higher scoring responses - often using proximal policy optimisation (PPO) with some penalty for changing the model too much, often KL-divergence or simple clipping.
  • Some people were big on taking courses
    • Most people had taken courses in the past. A couple people said they always tried to be doing one course on the side at any one time. Courses people had done before varied widely, and included AI, ethics, and completely unrelated courses. People often mentioned finding courses on Coursera or LinkedIn Learning.
    • For some people this was the first ever online course they had taken.
    • Multiple people expressed unprompted that this was the best course they had ever taken, including beating all of their university classes and on-the-job training.
    • One person already had quite a bit of AI safety experience, and had previously self-studyied the old AGISF curriculum. However, they said they had wished they had started with the facilitated BlueDot course, and regretted not doing so.

Other feedback

  • People disagreed about the balance our course struck between near-term and long-term risks
    • Some people said the content focused on near-term risks too much. They said that given the seriousness of the other risks, this could give people the wrong impression about the scale of the impact AI might have, and how important this area could be.
    • An equal number of people said the content focused on long-term risks too much. They said given the proximity and existing research on near-term risks, they should have been given more attention. They also said the long-term risks sounded sci-fi-like and that some resources’ writing styles feel ‘weird’.
  • People disagreed about the balance our course struck between AI risks and benefits
    • One person said we were too positive about AI. They said the course felt biased towards presenting AI in a positive light and could be overstating its benefits.
    • One person said we were too negative about AI. They said the course felt biased towards focusing on the problems with AI, and not enough with how it might be beneficial.
  • People discussed wanting to learn other things not in the curriculum
    • People often expressed interest in courses about:
      • more advanced technical AI safety (referencing things like alignment 201, a course we used to run that covered more advanced topics)
      • AI policymaking and regulation (which is somewhat covered by our existing AI governance course, which many people said they had or were going to apply to)
        • One person said they missed the deadline for this, and they would have wanted to have been emailed about this.
    • One person suggested us running courses on a much wider range of topics, given that we seemed good at running courses. They suggested courses on science of learning, mathematics, using AI in businesses and AI for education.
    • Some people suggested removing theoretical content from the curriculum, and focusing more on empirical work like mechanistic interpretability. They expressed that is when they were most engaged.
    • One person suggested adding more theoretical content to the curriculum. They suggested a week on agent foundations and singular learning theory.
    • One person said they read many of the SERI MATS mentor guides and found these useful for discovering new resources and learning about certain areas.
  • People acknowledged the usefulness of learning about technical AI safety, even for roles not directly working on it
    • For example, one policy analyst discussed that science and tech policy is often on the intersection of science and politics. Usually things are slow moving and clear enough to get by without help. But in AI it’s fast moving and unclear (because lots of people publish misinformation), so having this course was very helpful for them.
  • People mentioned bugs on the course hub being frustrating. These included:
    • Exercise responses not saving properly.
    • Difficulties using the cohort switching tool. One participant found it confusing that after clicking on a course session on the left sidebar and then ‘switch cohort’, it wasn’t necessarily changing their cohort for that session.

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.