Recommended Reading
Note: The field of AI alignment research is young. A lot of the existing work in AI alignment is speculative and underdeveloped. Many of the resources lack the rigor or precision often found in academic literature. This is partly what we are hoping to address by promoting additional research.
With this in mind, we believe the resources below are valuable. We hope they are helpful as you develop your own opinions and research ideas.
Basics
What is AI alignment?
Four Background Claims (10 mins)
Of myths and moonshine (5 mins)
The case for taking AI seriously as a threat to humanity (15 mins)
AGI Safety from First Principles (1.5 hours)
AI Racing Toward the Brink (2 hours, podcast)
Fundamentals
When should we expect superintelligent AI systems, and why do we think AI alignment is so important?
Superintelligence, chapter 7: The superintelligent will (20 mins)
Why AI alignment could be hard with modern deep learning (20 mins)
Is power-seeking AI an existential risk? (51 pages; see also this YouTube video)
Biological anchors: A trick that might or might not work (30 mins)
AGI Safety Fundamentals (reading list for 8 week course)
Core problems
What do leaders in the field of AI alignment currently believe? What are the major points of agreement and disagreement?
AGI Ruin: A list of lethalities (45 mins)
Where I agree and disagree with Eliezer (30 mins)
DeepMind alignment team opinions on AGI ruin arguments (20 mins)
ML and RL Resources
The alignment problem from a deep learning perspective (14 pages)
Goal misgeneralization in deep reinforcement learning (9 pages)
Optimal policies tend to seek power (10 pages)
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals (12 pages)
X-risk analysis for AI research (36 pages)
Unsolved problems in ML safety (28 pages)
Getting involved in AI alignment research