• Goal: existential security (x-risk from AI is negligible indefinitely or long enough for humanity to plan its future)

  • Defines ‘theory of victory’, which I think is similar to what we want from a strategy (although maybe less action-oriented?)

    We can call [reaching existential security] an AI governance endgame. A positive framing for AI governance (that is, achieving a certain endgame) can provide greater strategic clarity and coherence than a negative framing (or, avoiding certain outcomes).

    A theory of victory for AI governance combines an endgame with a plausible and prescriptive strategy to achieve it. It should also be robust across a range of future scenarios, given uncertainty about key strategic parameters.

  • Three theories of victory:

    • AI moratorium: pause AI
      • provides existential security and optionality
      • challenging to establish and enforce a moratorium
        • previous attempts at global coordination for x-risks has been partially, but not incredibly successful (nuclear non-proliferation before end of Cold War, nuclear disarmament, climate change)
    • AI leviathan: singleton AGI that stops bad AI
      • with TAI, might be able to establish moratorium by force or with superintelligence diplomacy
      • leviathan might be misaligned / cause lock-in / pose other x-risks. need to consider risks of leviathan vs risks of continued AI development by others.
    • def/acc: defensive technologies outpace offensive technologies
      • may be some private incentives to develop defensive technologies, and government could incentivize this further with investments, grants, tax breaks etc.
      • may require lots of coordination, e.g. if need to slow down offensive AI development (faces same challenges as pause)
      • unclear in advance whether technology will be offensive or defensive
      • [notes]
        • also seems possible that defensive technology isn’t powerful enough against very extreme threats. e.g. easy to imagine def/acc tech reducing fraud risks, or even biorisks. but harder to see it stopping gradual loss of control.
  • Frames AI safety as positive challenge to bring world into a state of existential security, rather than a negative challenge of minimizing many x-risks. Claims:

    • This provides clarity: easier to conceptualizing bringing about one outcome than avoiding several different outcomes
    • This reduces strategic incoherence: where interventions minimising different risks trade-off against each other
  • What we want from an endgame:

    • existential security
    • keeping options open to improve society (avoid lock-in of negative or meh values)
  • What we want from a strategy:

    • Plausible
    • Prescribe action
  • What we want from a theory of victory:

    • Robustness to different parameters (e.g. time to TAI, takeoff speed, alignment difficulty)
  • What we want from interventions:

    • Supports theories of victory
    • Ideally many theories of victory, without significant downside
  • We also might want to consider other x-risks, e.g. biosafety, nuclear security. If AI can help with these, we’d want to pick those paths.

  • Nukes

    • We managed to not kill ourselves
    • Several people advocated for world government and a moratorium (although this didn’t happen)
    • The US was the only state with nukes for a while, and maybe could have used this to prevent others developing nukes, but didn’t. But this also might have been a very bad idea to do by force: “Even if victory were finally achieved after colossal sacrifices in blood and treasure, we would find Western Europe in a condition of ruin far worse than that which exists in Germany today, its population decimated and overrun with disease.”
    • Nukes are different from AI:
      • Mainly single-use. Although there is nuclear power, most developments were weapon specific. This makes it less costly to not develop, and reduces opposition to pausing development.
      • Fewer actors have the resources, and not getting much cheaper over time. But compute and data available to many companies let alone governments.
      • Physical distribution, meaning misuse by small scale bad actors much harder.
  • [notes]

    • I’m not convinced the positive reframing is incredibly helpful. But having another way of looking at the problem probably can’t hurt too much.
      • Where I think it might be most helpful is being more comprehensive - i.e. it biases people to consider unknown unknowns better?
    • This hasn’t given me great confidence that an AI moratorium is plausible. The article says it’s challenging, lists previous attempts that have mainly failed, and then says ‘it remains plausible’?