Kai Fronsdal

San Francisco Bay Area, CA

Bio

Updated 03/22/26

Kai Fronsdal is an AI safety researcher based in the San Francisco Bay Area, currently affiliated with Meridian Research Labs. He completed a masters degree at Stanford University studying mathematics and computer science, where he was also involved with Stanford AI Alignment (SAIA). He participated in MATS 6.0 (Summer 2024) under the mentorship of David Lindner, focusing on measuring instrumental self-reasoning in frontier models as a precursor to deceptive alignment. His primary research output from this period is the paper "MISR: Measuring Instrumental Self-Reasoning in Frontier Models" (NeurIPS 2024), which proposes evaluation tasks for assessing how well LLM agents can engage in instrumental self-reasoning across scenarios including self-modification and knowledge-seeking. He also contributed to Anthropic's alignment auditing tools as an Anthropic Fellow, co-authoring "Petri: An open-source auditing tool to accelerate AI safety research" and the Bloom behavioral evaluation framework, as well as AuditBench, a benchmark for evaluating alignment auditing techniques on models with hidden behaviors. He received a grant from the Long-Term Future Fund to conduct deceptive alignment evaluation research and explore control and mitigation strategies.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2024 Q3 - Kai Fronsdal

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$27,000

Kai Fronsdal

Bio

Community Signal

Links

Grants