Aaquib Syed
Bio
Aaquib Syed is a CS and Mathematics undergraduate student at the University of Maryland, College Park, where he is a Banneker-Key Scholar. He was a fellow in the MATS 5.0 program under Neel Nanda's supervision, conducting mechanistic interpretability research on how refusal is implemented in large language models. His most notable work, "Refusal in Language Models Is Mediated by a Single Direction" (NeurIPS 2024), co-authored with Andy Arditi and others, showed that refusal behavior across 13 open-source chat models is controlled by a single direction in the residual stream. He also co-authored "Attribution Patching Outperforms Automated Circuit Discovery" (NeurIPS 2024 ATTRIB workshop) and work on mechanistic unlearning and model pruning. He is currently a Student Researcher at Google DeepMind on the Frontier Safety team, where he focuses on evaluating and forecasting dangerous AI capabilities.
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 1:43 PM UTC
- Created
- Mar 20, 2026, 2:45 AM UTC