AI governance needs a theory of victory

Goal: existential security (x-risk from AI is negligible indefinitely or long enough for humanity to plan its future)
Defines ‘theory of victory’, which I think is similar to what we want from a strategy (although maybe less action-oriented?)

We can call [reaching existential security] an AI governance endgame. A positive framing for AI governance (that is, achieving a certain endgame) can provide greater strategic clarity and coherence than a negative framing (or, avoiding certain outcomes).

A theory of victory for AI governance combines an endgame with a plausible and prescriptive strategy to achieve it. It should also be robust across a range of future scenarios, given uncertainty about key strategic parameters.
Three theories of victory:
- AI moratorium: pause AI
  - provides existential security and optionality
  - challenging to establish and enforce a moratorium
    - previous attempts at global coordination for x-risks has been partially, but not incredibly successful (nuclear non-proliferation before end of Cold War, nuclear disarmament, climate change)
- AI leviathan: singleton AGI that stops bad AI
  - with TAI, might be able to establish moratorium by force or with superintelligence diplomacy
  - leviathan might be misaligned / cause lock-in / pose other x-risks. need to consider risks of leviathan vs risks of continued AI development by others.
- def/acc: defensive technologies outpace offensive technologies
  - may be some private incentives to develop defensive technologies, and government could incentivize this further with investments, grants, tax breaks etc.
  - may require lots of coordination, e.g. if need to slow down offensive AI development (faces same challenges as pause)
  - unclear in advance whether technology will be offensive or defensive
  - [notes]
    - also seems possible that defensive technology isn’t powerful enough against very extreme threats. e.g. easy to imagine def/acc tech reducing fraud risks, or even biorisks. but harder to see it stopping gradual loss of control.
Frames AI safety as positive challenge to bring world into a state of existential security, rather than a negative challenge of minimizing many x-risks. Claims:
- This provides clarity: easier to conceptualizing bringing about one outcome than avoiding several different outcomes
- This reduces strategic incoherence: where interventions minimising different risks trade-off against each other
What we want from an endgame:
- existential security
- keeping options open to improve society (avoid lock-in of negative or meh values)
What we want from a strategy:
- Plausible
- Prescribe action
What we want from a theory of victory:
- Robustness to different parameters (e.g. time to TAI, takeoff speed, alignment difficulty)
What we want from interventions:
- Supports theories of victory
- Ideally many theories of victory, without significant downside
We also might want to consider other x-risks, e.g. biosafety, nuclear security. If AI can help with these, we’d want to pick those paths.
Nukes
- We managed to not kill ourselves
- Several people advocated for world government and a moratorium (although this didn’t happen)
- The US was the only state with nukes for a while, and maybe could have used this to prevent others developing nukes, but didn’t. But this also might have been a very bad idea to do by force: “Even if victory were finally achieved after colossal sacrifices in blood and treasure, we would find Western Europe in a condition of ruin far worse than that which exists in Germany today, its population decimated and overrun with disease.”
- Nukes are different from AI:
  - Mainly single-use. Although there is nuclear power, most developments were weapon specific. This makes it less costly to not develop, and reduces opposition to pausing development.
  - Fewer actors have the resources, and not getting much cheaper over time. But compute and data available to many companies let alone governments.
  - Physical distribution, meaning misuse by small scale bad actors much harder.
[notes]
- I’m not convinced the positive reframing is incredibly helpful. But having another way of looking at the problem probably can’t hurt too much.
  - Where I think it might be most helpful is being more comprehensive - i.e. it biases people to consider unknown unknowns better?
- This hasn’t given me great confidence that an AI moratorium is plausible. The article says it’s challenging, lists previous attempts that have mainly failed, and then says ‘it remains plausible’?