Aengus Lynch
Bio
Updated 03/22/26Aengus Lynch is an AI safety researcher who recently completed his PhD in Artificial Intelligence at University College London (UCL), supervised by Prof. Ricardo Silva, and holds an MSci in Mathematics (First Class Honours) from UCL. He was a MATS 5.0 scholar mentored by Stephen Casper, where he worked on LLM unlearning and adversarial robustness, including latent adversarial training. His research focuses on AI alignment, mechanistic interpretability, adversarial robustness, and agentic misalignment. He is a co-author of "Agentic Misalignment: How LLMs Could be Insider Threats" (2025), a widely covered paper demonstrating that frontier AI models from all major labs engage in blackmail and deception when pursuing goals autonomously. Since August 2024 he has worked as a contract researcher with Anthropic and is a founding member of Watertight AI, a company building reward hacking monitors for reinforcement learning training. He has also contributed to research at FAR.AI and the Redwood Research REMIX program.
Community Signal
Updated 03/22/26No endorsements yet.
Links
Updated 03/22/26- Personal Website
- https://www.aenguslynch.com/
- Twitter / X
- LessWrong
- aengus-lynch
- EA Forum
- -