Alex Infanger
Bio
Alex Infanger is an independent AI safety researcher based in the San Francisco Bay Area. He completed his PhD in 2022 at Stanford University's Institute for Computational and Mathematical Engineering (ICME), where he studied theory and algorithms for Markov chains. He transitioned into AI safety and alignment research, receiving Long-Term Future Fund grants for upskilling in deep learning and working on automated red-teaming and interpretability. He was a MATS (Machine Learning Alignment Theory and Surveys) Fellow, and his research has spanned machine unlearning robustness, sparse autoencoders and superposition, and reward misspecification. Notable works include "Distillation Robustifies Unlearning" (NeurIPS 2025 spotlight), "Misalignment from Treating Means as Ends" (arXiv 2025), and "Eliciting Language Model Behaviors using Reverse Language Models" (NeurIPS SoLaR Workshop 2023 spotlight). He also facilitated AGI safety fundamentals reading groups with the MIT AI Alignment Team in Fall 2022.
Links
- Personal Website
- https://alexinfanger.github.io/
- Twitter / X
- LessWrong
- -
Grants
from Long-Term Future Fund
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 1:54 PM UTC
- Created
- Mar 20, 2026, 2:47 AM UTC