Abhay Sheshadri
Bio
Abhay Sheshadri is an AI safety researcher currently serving as a Research Fellow at Anthropic, which he joined in 2025 after completing his undergraduate degree in Computer Science at Georgia Institute of Technology (2021–2025). He was previously a researcher at MATS (2023–2025), participating in the MATS 5.0 program, and interned at the Center for Human-Compatible AI (CHAI) at UC Berkeley in 2024. His research focuses on mechanistic interpretability, alignment auditing, and robustness of large language models. Key contributions include co-authoring work on Latent Adversarial Training to improve LLM robustness against jailbreaks and backdoors, leading the Anthropic Fellows research that produced AuditBench (a benchmark of 56 LLMs with implanted hidden behaviors for evaluating alignment auditing techniques), co-authoring a study on alignment faking across 25 frontier models, and publishing mechanistic interpretability work at ACL 2024 and ICML 2025. He posts on the AI Alignment Forum under the handle abhayesian.
Links
- Personal Website
- https://abhayesian.com/
- Twitter / X
- LessWrong
- abhayesian
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 1:43 PM UTC
- Created
- Mar 20, 2026, 2:46 AM UTC