Simon Lermen

London, UK

Bio

Updated 03/23/26

Simon Lermen is an AI security researcher and MATS (Machine Learning Alignment Theory Scholars) fellow based in London, UK. He studied at the Technical University of Berlin and now works independently on AI safety and security topics. His research focuses on how AI systems can be exploited or misused, including foundational work on shutdown avoidance in language models, demonstrating that LLMs like GPT-4 exhibit instrumental reasoning to resist being shut down. He co-authored the widely cited 2023 paper showing that LoRA fine-tuning can efficiently undo safety training in Llama 2-Chat models for under $200, raising significant concerns about the effectiveness of safety fine-tuning on publicly released weights. More recently, he co-led research on large-scale online deanonymization using LLMs (2026), showing that pseudonymous users can be re-identified across platforms such as Hacker News and Reddit at costs as low as $1-4 per person. He has also published peer-reviewed work on AI-powered spear phishing validated on human subjects, in collaboration with researchers at Harvard Kennedy School. He maintains an active presence on LessWrong and the AI Alignment Forum, and blogs on Substack about AI alignment and security.