Satvik Golechha
Bio
Satvik Golechha is a Research Scientist at the AI Security Institute (AISI), a directorate of the UK Department for Science, Innovation and Technology, where he works on frontier alignment, mechanistic interpretability, and reinforcement learning. He completed a B.E. in Computer Science from BITS Pilani, India, and previously worked as a researcher at Microsoft Research and as an Associate Research Scientist at Wadhwani AI on applied machine learning for healthcare. He participated in the MATS Summer 2024 program under mentor Nandi Schoots, working on neural network modularity in the interpretability stream, and also conducted independent research at the Center for Human-Compatible AI (CHAI) at UC Berkeley. His published work includes "Challenges in Mechanistically Interpreting Model Representations" (arXiv 2024), "Training Neural Networks for Modularity aids Interpretability" (arXiv 2024), and "Intricacies of Feature Geometry in Large Language Models" (ICLR 2025), as well as collaborative research with Anthropic on auditing language models for hidden objectives. He received a Long-Term Future Fund grant to work on safe and robust reasoning via mechanistic interpretation of model representations.
Links
- Personal Website
- https://7vik.io/
- Twitter / X
- LessWrong
- 7vik
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 23, 2026, 12:55 AM UTC
- Created
- Mar 20, 2026, 2:58 AM UTC