Satvik Golechha

London, United Kingdom

Bio

Updated 03/23/26

Satvik Golechha is a Research Scientist at the AI Security Institute (AISI), a directorate of the UK Department for Science, Innovation and Technology, where he works on frontier alignment, mechanistic interpretability, and reinforcement learning. He completed a B.E. in Computer Science from BITS Pilani, India, and previously worked as a researcher at Microsoft Research and as an Associate Research Scientist at Wadhwani AI on applied machine learning for healthcare. He participated in the MATS Summer 2024 program under mentor Nandi Schoots, working on neural network modularity in the interpretability stream, and also conducted independent research at the Center for Human-Compatible AI (CHAI) at UC Berkeley. His published work includes "Challenges in Mechanistically Interpreting Model Representations" (arXiv 2024), "Training Neural Networks for Modularity aids Interpretability" (arXiv 2024), and "Intricacies of Feature Geometry in Large Language Models" (ICLR 2025), as well as collaborative research with Anthropic on auditing language models for hidden objectives. He received a Long-Term Future Fund grant to work on safe and robust reasoning via mechanistic interpretation of model representations.

Community Signal

Updated 03/23/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/23/26

LTFF 2024 Q2 - Satvik Golechha

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$30,000

Satvik Golechha

Bio

Community Signal

Links

Grants