Felix Binder
Bio
Updated 03/22/26Felix Binder is a cognitive scientist and AI safety researcher currently working as a research scientist at Meta AI, where he focuses on AI Safety and Alignment for future superintelligent models. He completed his PhD in Cognitive Science at UC San Diego, with visiting scholar work at Stanford University, advised by Judith Fan, David Kirsh, and Marcelo Mattar. His research broadly falls under high-level interpretability and evaluations: designing experiments to elicit behaviors that reveal the inner workings of frontier models. Key areas of investigation include steganography in large language models — whether models hide information in their outputs such that a human observer cannot detect it — and introspection in LLMs, examining whether models can acquire genuine knowledge about their own internal states. His PhD work investigated agent-environment interactions during planning, exploring how environmental structure supports efficient problem-solving. He received a compute grant to study how steganography in LLMs might arise as a result of benign optimization pressure.
Community Signal
Updated 03/22/26No endorsements yet.
Links
Updated 03/22/26- Personal Website
- https://ac.felixbinder.net/
- Twitter / X
- LessWrong
- felix-j-binder
- EA Forum
- -