Hoagy Cunningham

Bio

Updated 03/22/26

Hoagy Cunningham is an AI safety researcher currently working at Anthropic, where he has contributed to both interpretability and safeguards research. He holds a 2:1 in Politics, Philosophy and Economics from The Queen's College, Oxford, and earlier in his career worked as a researcher at Full Fact, the UK fact-checking charity, and as an economist. He became a SERI MATS scholar under Lee Sharkey and is the lead author of "Sparse Autoencoders Find Highly Interpretable Features in Language Models" (ICLR 2024), a foundational paper demonstrating that sparse autoencoders can recover monosemantic, interpretable features from language model activations. This work was independently developed in parallel with similar research published by Anthropic and generated significant excitement in the mechanistic interpretability community. He received Long-Term Future Fund grants supporting his sparse coding research and work on preventing steganography in interpretable representations. At Anthropic, he has contributed to research on scaling monosemanticity, constitutional classifiers for jailbreak defense, and auditing language models for hidden objectives.