Paul Christiano's Blog

winding-down1 personFounded 2015

Paul Christiano's AI alignment blog, hosted at ai-alignment.com, is a Medium publication where he published technical writing on AI safety, alignment, and control from roughly 2015 through the early 2020s. Topics include iterated amplification, reinforcement learning from human feedback (RLHF), eliciting latent knowledge, scalable oversight, and the conceptual foundations of AI alignment. Christiano is one of the most influential figures in AI alignment research, having co-developed RLHF at OpenAI and founded the Alignment Research Center. The blog is no longer actively updated but remains an important archive of foundational alignment thinking.

Funding Details

Annual Budget: -
Monthly Burn Rate: -
Current Runway: -
Funding Goal: -
Funding Raised to Date: -
Fiscal Sponsor: -

Theory of Change

Christiano's theory of change centers on developing technical solutions to AI alignment in advance of transformative AI systems being deployed. His approach involves creating scalable oversight methods — ways for humans to supervise AI systems even when those systems become more capable than their overseers. Key mechanisms include iterated amplification (using AI assistance to extend human oversight capacity), RLHF (using human feedback signals to train models toward desired behavior), and eliciting latent knowledge (getting AI systems to honestly report what they know rather than what they expect humans to want to hear). The blog served as a venue to develop and communicate these ideas to the research community, seeding further work by other alignment researchers.

Grants Received

No grants recorded.

Projects

No linked projects.

People

No linked people.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Last Updated: Apr 2, 2026, 9:53 PM UTC
Created: Mar 19, 2026, 10:30 PM UTC