Sam Marks
Bio
Samuel Marks is a Member of Technical Staff at Anthropic, where he leads the Cognitive Oversight team within the alignment science team. His team works on overseeing AI systems by examining the cognitive processes underlying their behavior rather than relying solely on input/output observation, with a particular focus on detecting deception in language models using both interpretability-based (white-box) and behavioral interrogation (black-box) techniques. He holds a PhD in mathematics from Harvard University (2023), where he studied p-adic Hodge theory under Mark Kisin, and subsequently completed a postdoctoral fellowship at Northeastern University under David Bau working on mechanistic interpretability of large language models. He was a scholar in the MATS Summer 2022 program, one of the program's first full-scale cohorts. His notable research includes co-authoring the influential "Alignment Faking in Large Language Models" paper with Redwood Research and work on sparse feature circuits for explaining language model behavior. He received an early grant to spend three weeks with collaborators reviewing AI alignment research agendas.
Links
- Personal Website
- https://people.math.harvard.edu/~smarks/
- Twitter / X
- LessWrong
- sam-marks
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 23, 2026, 12:55 AM UTC
- Created
- Mar 20, 2026, 2:57 AM UTC