An AI safety plan is a description of proposed actions people could take in the world to make AI go well. This is NOT supposed to be a political agenda, and we hope it to avoid turning into one. Although it may be inevitable we have to take some political actions. This is kind of described in section 1 of https://www.bridgespan.org/getmedia/16a72306-0675-4abd-9439-fbf6c0373e9b/strong-field-framework.pdf [TODO: we haven’t described this super clearly - need to expand on this for clarity].
Some SAFE properties for this plan (the mnemonic is not too forced, I promise):
- Sufficient: if all the actions are carried out, we would consider the world in a good state.
- NB: just solving the alignment problem, or tackling near-term risks, is not sufficient.
- Action-orientated: the plan is a set of actions that explain how people can contribute (and thus how our course could prepare people to contribute), and does not just describe a list of events that happen.
- Feasible: we think it’s reasonably plausible the plan can be executed. This generally excludes plans that require actors to take significant actions against their own interests.
- Encompassing: the plan covers all actions needed, not just some for a short time frame or one jurisdiction.
Why do we need criteria?
- To help clarify what it is we’re looking for
- To make sure what we produce will be sufficient to guide the field
Earlier notes
Criteria
- Must have
- Gives us confidence that if all the actions are carried out, things would be good
- Seems feasible. Generally this excludes plans that require:
- actor(s) to take significant actions against their own interests
- all governments to magically agree on something very complex, particularly where there are benefits for defectors and enforcement is hard
- Explains what actions people can take to contribute to the plan (e.g. is a set of assignable/assigned actions, not just list of events to happen)
- Are comprehensive, e.g. cover all the actions we need to be taken, not just those in one jurisdiction
- Should have
- Detailed enough for people to evaluate whether they are contributing to the plan
- Well-communicated / clear
- We can track progress on the plan
- Ideally from early on, so we can adjust it if we realise the plan won’t work. E.g. having falsifiable hypotheses
- Robustness to some failure
- Could have
- Beyond just making sure we don’t die, the plan explains how we can maximize the benefits of AI
- Multiple viable paths to success
Notes on Good Strategy Bad Strategy: