This list is not meant to be exhaustive: it includes example interventions to help orient you, as opposed to describing the entirety of the work needed to make AI go well.
Prevent the training of dangerous AI systems
Constrain dangerous AI capabilities
- **Input data filtering**: Implement automated safety classifiers to detect and remove harmful, biased, or poisoned content from training datasets before model training (like content policy filters for violence, hate speech, and PII)
- AI Alignment: Deploy constitutional AI with explicit value specifications and RLHF to ensure models follow defined principles for helpfulness, harmlessness, and honesty
- Interpretability: Understand the internal reasoning process of AI systems, and use this to detect misbehaviour of undesirable objectives.
- AI Control: Maintain human oversight over potentially dangerous AI by using trusted models to monitor untrusted models, requiring human approval for consequential decisions, and more.
- Robustness: Conduct adversarial red teaming exercises to test model resilience against jailbreaks, prompt injections, and evasion attacks before deployment
- Model weight security: Secure model parameters through encryption, confidential computing, and access-controlled storage systems to prevent theft by adversaries (following RAND's five security levels framework)
Withstand dangerous AI actions
Biosecurity
- Prevent patient 0: Establish norms against dangerous virology research. Improve biosafety at BSL4 labs. Reduce human-animal contact and zoonosis by reducing animal farming.
- Detect and contain early infections: Build environmental pathogen surveillance in all countries. Establish rapid containment teams (e.g. GERM teams) to prevent further spread.
- Minimise population-level harms: Develop vaccine candidates for all human-infecting viral families. Build vaccine manufacturing capacity. Set up testing procedures for all known drugs against new infections. Build large-scale PPE stockpiles for critical workers.
Cybersecurity