Alex Infanger

San Francisco Bay Area, CA

Bio

Updated 03/22/26

Alex Infanger is an independent AI safety researcher based in the San Francisco Bay Area. He completed his PhD in 2022 at Stanford University's Institute for Computational and Mathematical Engineering (ICME), where he studied theory and algorithms for Markov chains. He transitioned into AI safety and alignment research, receiving Long-Term Future Fund grants for upskilling in deep learning and working on automated red-teaming and interpretability. He was a MATS (Machine Learning Alignment Theory and Surveys) Fellow, and his research has spanned machine unlearning robustness, sparse autoencoders and superposition, and reward misspecification. Notable works include "Distillation Robustifies Unlearning" (NeurIPS 2025 spotlight), "Misalignment from Treating Means as Ends" (arXiv 2025), and "Eliciting Language Model Behaviors using Reverse Language Models" (NeurIPS SoLaR Workshop 2023 spotlight). He also facilitated AGI safety fundamentals reading groups with the MIT AI Alignment Team in Fall 2022.