Keith Wynroe

Bio

Updated 03/22/26

Keith Wynroe is an independent mechanistic interpretability researcher focused on attention layers in transformer models. He graduated from Trinity College, Cambridge with double first class honours and previously worked as a Research Analyst at the Forethought Foundation for Global Priorities Research, William MacAskill's global priorities research organization. He participated in the SERI MATS (ML Alignment & Theory Scholars) program, working in Lee Sharkey's stream during the Winter 2023-24 cohort, and received LTFF grants to continue independent research afterward. His research outputs include "An OV-Coherent Toy Model of Attention Head Superposition" (co-authored with Lauren Greenspan, 2023) and "Decomposing the QK Circuit with Bilinear Sparse Dictionary Learning" (co-authored with Lee Sharkey, 2024), which applies bilinear sparse dictionary learning methods to understand how query and key features interact in attention circuits.