Felix Binder
Bio
Felix Binder is a cognitive scientist and AI safety researcher currently working as a research scientist at Meta AI, where he focuses on AI Safety and Alignment for future superintelligent models. He completed his PhD in Cognitive Science at UC San Diego, with visiting scholar work at Stanford University, advised by Judith Fan, David Kirsh, and Marcelo Mattar. His research broadly falls under high-level interpretability and evaluations: designing experiments to elicit behaviors that reveal the inner workings of frontier models. Key areas of investigation include steganography in large language models — whether models hide information in their outputs such that a human observer cannot detect it — and introspection in LLMs, examining whether models can acquire genuine knowledge about their own internal states. His PhD work investigated agent-environment interactions during planning, exploring how environmental structure supports efficient problem-solving. He received a compute grant to study how steganography in LLMs might arise as a result of benign optimization pressure.
Links
- Personal Website
- https://ac.felixbinder.net/
- Twitter / X
- LessWrong
- felix-j-binder
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 3:56 PM UTC
- Created
- Mar 20, 2026, 2:51 AM UTC