Aleksandar Makelov
Bio
Aleksandar (Alex) Makelov is a researcher at OpenAI working on mechanistic interpretability of large language models. He earned his PhD in Computer Science from MIT, where he was advised by Prof. Aleksander Madry, and prior to that completed Part III of the Mathematical Tripos at Cambridge University and a BA in mathematics and computer science from Harvard College. His research spans mechanistic interpretability, sparse autoencoders, adversarial robustness, and data poisoning, with papers published at ICLR 2024, ICLR 2025, and ICML 2024. He is known for work on interpretability illusions in subspace activation patching and for developing principled evaluation frameworks for sparse autoencoders. He is a SERI MATS alumnus who worked with Neel Nanda on interpretability research, subsequently joined Guide Labs, and then joined OpenAI where he co-authored work on persona features and emergent misalignment.
Links
- Personal Website
- https://amakelov.github.io/
- Twitter / X
- LessWrong
- alex-makelov
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 1:54 PM UTC
- Created
- Mar 20, 2026, 2:46 AM UTC