Job spec: https://bluedot.org/ai-safety-strategist/

Contact: Adam Jones ([email protected])

This document sets out some of the context for the work test for the AI safety strategist role.

Context: What are we trying to do?

You don’t need to read all the linked documents - they’re just here for extra context.

We teach AI safety, but we (and the whole field) don't really have a solid plan for making sure transformative AI goes well (see Summaries of AI safety plans). This makes it hard to answer questions like ‘Will mech interp actually be useful for making AI safe?’ or ‘How relevant will the global south be to catastrophic risk reduction?’.

We will fix this by making a real, concrete plan - both to improve our courses and to help lead the field in the right direction. This will meet our Criteria for an AI safety plan.

We considered a number of ways to build this plan (see How to construct a good AI safety plan). Our current approach is to build the strategy bottom up from AI risk scenarios. This involves getting a set of AI risks, and then for each risk:

  1. Listing ways the risk could play out in reality, thinking about different scenarios (like "what if TAI comes really soon?" or "what if TAI is super hard to align?")
  2. Mapping out how each bad scenario would happen, step by step. For example with bioterrorism: you need a bad actor and an AI willing to help → they plan the attack → they build the weapon → they release it → lots of people die
  3. Look at each step and figure out interventions to stop it. Best case: find ways to completely prevent the risk. If that's not possible, at least find ways to make it less likely or less harmful.

Finally, we’ll look at all the interventions and pick a small and tractable set that would handle all the major risks. This will be the set of what needs to be done to keep TAI safe.

This way, we’ll know what to teach in our courses, and we’ll have a real plan we can show to the rest of the field.

Actions: What do you need to prepare?

  1. Make sure you are ready to do 2-hours of uninterrupted work in a productive working environment. You’ll need a computer with internet access.
    1. NB: This changed from an earlier version of the job spec which said 1 hour. We realized doing the work test properly would take at least 2 hours. We have upped the compensation to £60 to reflect this.
  2. Submit the work test start form.

Guidance: What can you expect next?

After submitting the work test start form, you’ll get an automated email with the private details of the timed work test. The task will involve writing up some notes on a part of our AI safety strategy. At the end of the work test you’ll be expected to submit these notes.

Once you submit your work test, you can submit a payment claim to be reimbursed for the time spent on the work test. We’ll try to get you paid within 2 weeks of your request, although it might take up to 1 month due to the winter holidays.