Avery Griffin

San Francisco, CA

Bio

Avery Griffin is an AI safety researcher who has worked at Anthropic and previously served as a Research Manager at MATS (ML Alignment & Theory Scholars) from 2024 to 2025. Griffin became interested in AI safety through AlphaGo and AlphaZero, and after reading Nick Bostrom's Superintelligence, applied to AI Safety Camp in late 2021, receiving a grant that enabled full-time research focus. As a MATS alumni turned research manager, Griffin supported cohorts of AI safety scholars through the program's training pipeline. Griffin has co-authored several papers in AI safety and interpretability, including work on the Local Interaction Basis for mechanistic interpretability (with Apollo Research), modifying LLM beliefs via synthetic document fine-tuning (Anthropic/MATS, April 2025), and eliciting harmful capabilities by fine-tuning on safeguarded outputs (Anthropic, January 2026). Earlier research contributions include a project on Selection Theorems for Modularity from AI Safety Camp 2022, exploring what factors select for modularity in learned systems.

Community Signal

0Upvotes

0Downvotes

0Endorsements

0Comments

No endorsements yet.

Grants

No grants recorded.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Last Updated: Mar 22, 2026, 2:29 PM UTC
Created: Mar 20, 2026, 3:00 AM UTC

Avery Griffin

Bio

Community Signal

Links

Grants

Discussion

Details