Joe Kwon
Bio
Joe Kwon is an AI safety researcher and policy analyst based in Washington, DC. He holds a BS in Computer Science and Psychology from Yale University and has conducted research at MIT's Computational Cognitive Science Lab, where he studied moral and social cognition with Josh Tenenbaum and Sydney Levine. His technical background includes early RLHF work at OpenAI, empirical ML research at UC Berkeley with Jacob Steinhardt and Dan Hendrycks focused on evals and out-of-distribution detection, and a stint as a Research Engineer at LG AI Research working on multilingual large language models. He subsequently transitioned to AI governance work, completing a GovAI DC Fellowship focused on risks from internal AI deployment and automated R&D, and serving as a Technical Policy Analyst at the Center for AI Policy (CAIP). Most recently he has been an Astra Fellow working with Tom Davidson and Fabien Roger on threat modeling and ML experiments related to secretly loyal AI. He received a Long-Term Future Fund grant as a stipend to work on an ML safety project with the goal of joining an ML safety team full-time.
Links
- Personal Website
- https://www.joe-kwon.com/
- Twitter / X
- LessWrong
- joe-kwon
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 10:17 PM UTC
- Created
- Mar 20, 2026, 2:52 AM UTC