Cindy Wu

Bio

Updated 03/22/26

Cindy Wu is an AI safety researcher who participated in the MATS (ML Alignment & Theory Scholars) Winter 2023-24 program (MATS 5.0) and subsequently received funding for an extension period of technical alignment research. She has co-authored two papers in AI safety: "Using Degeneracy in the Loss Landscape for Mechanistic Interpretability" (arXiv:2405.10927, May 2024), which introduces the Interaction Basis technique to address parameter degeneracy that can obscure neural network internal structure, and "Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs" (arXiv:2407.15549, July 2024), where she led experiments on machine unlearning of harmful knowledge. Her research sits at the intersection of mechanistic interpretability and AI safety, with a focus on understanding and improving robustness of large language models. She was affiliated with the MATS program during her research work on both papers.