Aleksandar Makelov

Bio

Updated 03/22/26

Aleksandar (Alex) Makelov is a researcher at OpenAI working on mechanistic interpretability of large language models. He earned his PhD in Computer Science from MIT, where he was advised by Prof. Aleksander Madry, and prior to that completed Part III of the Mathematical Tripos at Cambridge University and a BA in mathematics and computer science from Harvard College. His research spans mechanistic interpretability, sparse autoencoders, adversarial robustness, and data poisoning, with papers published at ICLR 2024, ICLR 2025, and ICML 2024. He is known for work on interpretability illusions in subspace activation patching and for developing principled evaluation frameworks for sparse autoencoders. He is a SERI MATS alumnus who worked with Neel Nanda on interpretability research, subsequently joined Guide Labs, and then joined OpenAI where he co-authored work on persona features and emergent misalignment.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2024 Q1 - Aleksandar Makelov

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$22,500

Aleksandar Makelov

Bio

Community Signal

Links

Grants