Recommended Reading

Note: The field of AI alignment research is young. A lot of the existing work in AI alignment is speculative and underdeveloped. Many of the resources lack the rigor or precision often found in academic literature. This is partly what we are hoping to address by promoting additional research.

With this in mind, we believe the resources below are valuable. We hope they are helpful as you develop your own opinions and research ideas.

Basics

What is AI alignment?

Four Background Claims (10 mins)
Of myths and moonshine (5 mins)
The case for taking AI seriously as a threat to humanity (15 mins)
AGI Safety from First Principles (1.5 hours)
AI Racing Toward the Brink (2 hours, podcast)

Fundamentals

When should we expect superintelligent AI systems, and why do we think AI alignment is so important?

Intelligence Explosion: Evidence and Import (30 mins)
Superintelligence, chapter 7: The superintelligent will (20 mins)
Why AI alignment could be hard with modern deep learning (20 mins)
Is power-seeking AI an existential risk? (51 pages; see also this YouTube video)
Biological anchors: A trick that might or might not work (30 mins)
AGI Safety Fundamentals (reading list for 8 week course)

Core problems

What do leaders in the field of AI alignment currently believe? What are the major points of agreement and disagreement?

AGI Ruin: A list of lethalities (45 mins)
Where I agree and disagree with Eliezer (30 mins)
DeepMind alignment team opinions on AGI ruin arguments (20 mins)

ML and RL Resources

The alignment problem from a deep learning perspective (14 pages)
Goal misgeneralization in deep reinforcement learning (9 pages)
Optimal policies tend to seek power (10 pages)
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals (12 pages)
X-risk analysis for AI research (36 pages)
Unsolved problems in ML safety (28 pages)

Getting involved in AI alignment research

Alignment research field guide (25 mins)
Principles for alignment/agency projects (5 mins)
How to get into independent research on alignment/agency (20 mins)
7 traps that (we think) new alignment researchers often fall into (5 mins)
More is different for AI (3 mins)