Wilson Wu
Bio
Wilson Wu is a mathematician and AI safety researcher currently pursuing a PhD in mathematics at the University of Colorado Boulder and serving as a researcher at the Alignment Research Center (ARC), where he works on a systematic and theoretically grounded approach to mechanistic interpretability. He completed his undergraduate degree in Electrical Engineering and Computer Science at UC Berkeley. His early research involved applications of singular learning theory and compact proofs to interpretability problems, and he received LTFF funding to upskill in mathematics relevant to singular learning theory and to study neural network generalization on algorithmic tasks. He co-authored "Do language models plan ahead for future tokens?" (COLM 2024) and "Towards a unified and verified understanding of group-operation networks" (ICLR 2025), the latter of which reverse-engineers neural networks trained on finite group operations. He also serves as a mentor in the MATS Summer 2026 program under the ARC stream.
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 23, 2026, 2:05 AM UTC
- Created
- Mar 20, 2026, 3:00 AM UTC