Felix Binder

Bio

Updated 03/22/26

Felix Binder is a cognitive scientist and AI safety researcher currently working as a research scientist at Meta AI, where he focuses on AI Safety and Alignment for future superintelligent models. He completed his PhD in Cognitive Science at UC San Diego, with visiting scholar work at Stanford University, advised by Judith Fan, David Kirsh, and Marcelo Mattar. His research broadly falls under high-level interpretability and evaluations: designing experiments to elicit behaviors that reveal the inner workings of frontier models. Key areas of investigation include steganography in large language models — whether models hide information in their outputs such that a human observer cannot detect it — and introspection in LLMs, examining whether models can acquire genuine knowledge about their own internal states. His PhD work investigated agent-environment interactions during planning, exploring how environmental structure supports efficient problem-solving. He received a compute grant to study how steganography in LLMs might arise as a result of benign optimization pressure.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2023 Q4 - Felix Binder

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$2,000

Felix Binder

Bio

Community Signal

Links

Grants