Aengus Lynch
Bio
Aengus Lynch is an AI safety researcher who recently completed his PhD in Artificial Intelligence at University College London (UCL), supervised by Prof. Ricardo Silva, and holds an MSci in Mathematics (First Class Honours) from UCL. He was a MATS 5.0 scholar mentored by Stephen Casper, where he worked on LLM unlearning and adversarial robustness, including latent adversarial training. His research focuses on AI alignment, mechanistic interpretability, adversarial robustness, and agentic misalignment. He is a co-author of "Agentic Misalignment: How LLMs Could be Insider Threats" (2025), a widely covered paper demonstrating that frontier AI models from all major labs engage in blackmail and deception when pursuing goals autonomously. Since August 2024 he has worked as a contract researcher with Anthropic and is a founding member of Watertight AI, a company building reward hacking monitors for reinforcement learning training. He has also contributed to research at FAR.AI and the Redwood Research REMIX program.
Links
- Personal Website
- https://www.aenguslynch.com/
- Twitter / X
- LessWrong
- aengus-lynch
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 1:43 PM UTC
- Created
- Mar 20, 2026, 2:46 AM UTC