Thomas Kwa

Berkeley, California

Bio

Updated 03/23/26

Thomas Kwa is a researcher on the technical staff at METR (Model Evaluation & Threat Research), where he focuses on measuring AI capabilities and autonomous task completion. He holds a Computer Science degree from Caltech and previously worked at MIRI and conducted interpretability research through the MATS/SERI MATS program with Adrià Garriga-Alonso and Jason Gross. He is the lead author of the influential METR paper introducing the '50% time horizon' metric—the length of tasks AI models can complete autonomously with 50% probability—which found this metric has been doubling roughly every seven months. He also co-authored 'Catastrophic Goodhart,' a 2024 paper demonstrating that KL divergence regularization in RLHF fails to prevent reward hacking under heavy-tailed reward misspecification, presented at ICML and NeurIPS 2024. Thomas is an active contributor to LessWrong and the AI Alignment Forum, where he publishes research on interpretability, Goodhart's Law, and AI safety methodology.