Satvik Golechha
Bio
Updated 03/23/26Satvik Golechha is a Research Scientist at the AI Security Institute (AISI), a directorate of the UK Department for Science, Innovation and Technology, where he works on frontier alignment, mechanistic interpretability, and reinforcement learning. He completed a B.E. in Computer Science from BITS Pilani, India, and previously worked as a researcher at Microsoft Research and as an Associate Research Scientist at Wadhwani AI on applied machine learning for healthcare. He participated in the MATS Summer 2024 program under mentor Nandi Schoots, working on neural network modularity in the interpretability stream, and also conducted independent research at the Center for Human-Compatible AI (CHAI) at UC Berkeley. His published work includes "Challenges in Mechanistically Interpreting Model Representations" (arXiv 2024), "Training Neural Networks for Modularity aids Interpretability" (arXiv 2024), and "Intricacies of Feature Geometry in Large Language Models" (ICLR 2025), as well as collaborative research with Anthropic on auditing language models for hidden objectives. He received a Long-Term Future Fund grant to work on safe and robust reasoning via mechanistic interpretation of model representations.
Community Signal
Updated 03/23/26No endorsements yet.
Links
Updated 03/23/26- Personal Website
- https://7vik.io/
- Twitter / X
- LessWrong
- 7vik
- EA Forum
- -