Abhay Sheshadri

Bio

Updated 03/22/26

Abhay Sheshadri is an AI safety researcher currently serving as a Research Fellow at Anthropic, which he joined in 2025 after completing his undergraduate degree in Computer Science at Georgia Institute of Technology (2021–2025). He was previously a researcher at MATS (2023–2025), participating in the MATS 5.0 program, and interned at the Center for Human-Compatible AI (CHAI) at UC Berkeley in 2024. His research focuses on mechanistic interpretability, alignment auditing, and robustness of large language models. Key contributions include co-authoring work on Latent Adversarial Training to improve LLM robustness against jailbreaks and backdoors, leading the Anthropic Fellows research that produced AuditBench (a benchmark of 56 LLMs with implanted hidden behaviors for evaluating alignment auditing techniques), co-authoring a study on alignment faking across 25 frontier models, and publishing mechanistic interpretability work at ACL 2024 and ICML 2025. He posts on the AI Alignment Forum under the handle abhayesian.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2024 Q1 - Abhay Sheshadri

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$15,075

Abhay Sheshadri

Bio

Community Signal

Links

Grants