Aengus Lynch

San Francisco, CA

Bio

Updated 03/22/26

Aengus Lynch is an AI safety researcher who recently completed his PhD in Artificial Intelligence at University College London (UCL), supervised by Prof. Ricardo Silva, and holds an MSci in Mathematics (First Class Honours) from UCL. He was a MATS 5.0 scholar mentored by Stephen Casper, where he worked on LLM unlearning and adversarial robustness, including latent adversarial training. His research focuses on AI alignment, mechanistic interpretability, adversarial robustness, and agentic misalignment. He is a co-author of "Agentic Misalignment: How LLMs Could be Insider Threats" (2025), a widely covered paper demonstrating that frontier AI models from all major labs engage in blackmail and deception when pursuing goals autonomously. Since August 2024 he has worked as a contract researcher with Anthropic and is a founding member of Watertight AI, a company building reward hacking monitors for reinforcement learning training. He has also contributed to research at FAR.AI and the Redwood Research REMIX program.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2024 Q1 - Aengus Lynch

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$52,119

Aengus Lynch

Bio

Community Signal

Links

Grants