Sam Marks

Bio

Updated 03/23/26

Samuel Marks is a Member of Technical Staff at Anthropic, where he leads the Cognitive Oversight team within the alignment science team. His team works on overseeing AI systems by examining the cognitive processes underlying their behavior rather than relying solely on input/output observation, with a particular focus on detecting deception in language models using both interpretability-based (white-box) and behavioral interrogation (black-box) techniques. He holds a PhD in mathematics from Harvard University (2023), where he studied p-adic Hodge theory under Mark Kisin, and subsequently completed a postdoctoral fellowship at Northeastern University under David Bau working on mechanistic interpretability of large language models. He was a scholar in the MATS Summer 2022 program, one of the program's first full-scale cohorts. His notable research includes co-authoring the influential "Alignment Faking in Large Language Models" paper with Redwood Research and work on sparse feature circuits for explaining language model behavior. He received an early grant to spend three weeks with collaborators reviewing AI alignment research agendas.

Community Signal

Updated 03/23/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/23/26

LTFF 2022 Q4 - 2 - Sam Marks

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$26,000

Sam Marks

Bio

Community Signal

Links

Grants