Problem
We have lots more context on strategy, and AI safety strategy specifically after reviewing some of the existing literature (Summaries of AI safety plans). We don’t know where or how best to start actually building our plan though.
Options
- Bottom-up risk analysis: Get AI risks from MIT database. Focus first on most pressing risks related to TAI (e.g. catastrophic misuse and misalignment). Then:
- For each risk:
- Generate plausible scenarios of how the risk manifests, for a range of different parameters (parameter being something variable like ‘time to TAI’, ‘takeoff speed’, ‘alignment difficulty’, ‘who develops TAI’).
- Make explicit the chain of steps in that risk pathway, e.g. for bioterrorism it’s something like: radicalised person + AI with capability and willingness to help with bioattack → person uses AI to help them plan attack → person gathers materials and develops bioweapon → person releases weapon → pandemic grows and kills many people.
- Identify possible interventions at each step, ideally that would stop the risk emerging completely (or if not, maybe evaluate how much the intervention might prevent this?).
- Zoom out, and aggregate all the interventions. Identify the smallest set of interventions that would significantly reduce all the risks.
- Also see: [Scenario planning for AI x-risk](https://bluedot-impact.notion.site/Scenario-planning-for-AI-x-risk-158f8e6903538016a988ce52a5db1709)
- Also maybe see: https://simonmylius.com/blog/incident-classification (mainly for the idea of analyzing risks at scale in a structured way)
- Top-down risk analysis: Identify the big categories of risk (e.g. catastrophic malfunction, misuse and misalignment). Identify good top-down measures that would give us confidence these areas would be reduced, and progress from less specific to more specific. Stress test against specific instances of risks.
- Also see [Catastrophic risks from unsafe AI (particularly from 23:35 onwards)](https://bluedot-impact.notion.site/Catastrophic-risks-from-unsafe-AI-particularly-from-23-35-onwards-159f8e6903538066a215f8e0fb76261d)
- Priority paths: Look through jobs in 80k database, and consider which ones are priority paths based on vibes. Then try to bucket them into the areas we think they fall into, and identify why we’ve decided they’re priority paths. Turn these learnings into a strategy.
- Also see ‣
- Levers and fundamental rights: Identify the non-negotiable factors of a good society, maybe using EU fundamental rights as a template. Then identify how TAI might affect those factors, and what course corrections will need to be made.
- Also see ‣
- This is somewhat similar to strategy 5. It also focuses on the positive case, and might be in addition to another strategy (e.g. reduce catastrophic risks, and optimise for goodness later)
- Existential security: Flesh out what it means to be in a positive state of ‘existential security’, and what sufficient conditions for this state would be. Then describe how we could get closer to that state.
- Also see: [AI governance needs a theory of victory](https://bluedot-impact.notion.site/AI-governance-needs-a-theory-of-victory-14cf8e69035380e4a258cc2100374e43)
- Safety stories: Brainstorm a wide range of stories for how AI ends up going well (possibly with crowdsourcing, or using LLMs). Then figure out how these stories become reality.
- Also see: [What success looks like](https://bluedot-impact.notion.site/What-success-looks-like-158f8e690353808fab7feb6b11e33674).
- Maybe also see [FLI world building competition](https://bluedot-impact.notion.site/FLI-world-building-competition-159f8e69035380deba25e1353ed46609) has some ideas (particularly the answers to ‘AGI has existed for at least five years but the world is not dystopian and humans are still alive! Given the risks of very high-powered AI systems, how has your world ensured that AGI has at least so far remained safe and controlled?’) but they’re quite vague.
- [added December 13, 2024]: Preferred properties: Identify properties for AI futures (e.g. ‘time to TAI’, ‘takeoff speed’, …). Evaluate how different parameter values affect different categories of AI risk (e.g. rapid takeoff maybe makes misalignment worse and misuse better?). Figure out what combinations of properties work better, and what we can do to nudge into those states.
- [added December 20, 2024]: Milestones: Come up with broad ideas for how AI might go well, and what milestones/checkpoints we’ll need to fly through to be okay. E.g. we get to human level AI without catastrophic misuse, we navigate the transition of the economy without terrible outcomes. Then how we can fly through these milestones.
Selection criteria
- Tractable: We want to get started! Not keep going up meta levels of planning.
- Not tar-pitty: Not trying to boil the oceans (or has a plan for how to do said boiling), and does seem likely we’d get close to a strategy by end of December.
- Will result in a strategy meeting our criteria:
Evaluation
Option |
Tractable |
Not tar-pitty |
Resulting plan: sufficient |
Resulting plan: action-oriented |
Resulting plan: feasible |
Resulting plan: encompassing |
1 |
High |
Low |
Medium |
Medium |
Medium |
Medium |
2 |
Low |
Medium |
High |
Medium |
Medium |
High |
3 |
Medium |
Low |
Low |
High |
High |
High |
4 |
Low |
Medium |
High |
Medium |
Medium |
High |
5 |
Low |
Medium |
High |
Medium |
Medium |
High |
6 |
Medium |
Low |
Low |
Low |
Medium |
High |