Kai Fronsdal
Bio
Kai Fronsdal is an AI safety researcher based in the San Francisco Bay Area, currently affiliated with Meridian Research Labs. He completed a masters degree at Stanford University studying mathematics and computer science, where he was also involved with Stanford AI Alignment (SAIA). He participated in MATS 6.0 (Summer 2024) under the mentorship of David Lindner, focusing on measuring instrumental self-reasoning in frontier models as a precursor to deceptive alignment. His primary research output from this period is the paper "MISR: Measuring Instrumental Self-Reasoning in Frontier Models" (NeurIPS 2024), which proposes evaluation tasks for assessing how well LLM agents can engage in instrumental self-reasoning across scenarios including self-modification and knowledge-seeking. He also contributed to Anthropic's alignment auditing tools as an Anthropic Fellow, co-authoring "Petri: An open-source auditing tool to accelerate AI safety research" and the Bloom behavioral evaluation framework, as well as AuditBench, a benchmark for evaluating alignment auditing techniques on models with hidden behaviors. He received a grant from the Long-Term Future Fund to conduct deceptive alignment evaluation research and explore control and mitigation strategies.
Links
- Personal Website
- https://kaifronsdal.github.io/
- Twitter / X
- LessWrong
- -
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 10:39 PM UTC
- Created
- Mar 20, 2026, 2:53 AM UTC