AI safety research organization applying computational mechanics from physics and computational neuroscience to build a rigorous science of intelligence, with a focus on understanding the internal representations and emergent behavior of neural networks.
AI safety research organization applying computational mechanics from physics and computational neuroscience to build a rigorous science of intelligence, with a focus on understanding the internal representations and emergent behavior of neural networks.
People
Updated 05/18/26Co-founder and research lead
Co-founder and research lead
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 05/18/26Simplex is an AI safety research organization that takes a physics-of-information perspective on AI interpretability, aiming to build a rigorous scientific foundation for AI safety. The organization was founded in 2024 by Paul Riechers and Adam Shai, who combine expertise from theoretical physics and computational neuroscience respectively. Paul Riechers holds a PhD in theoretical physics and an MS in electrical and computer engineering from UC Davis. He served as a Research Fellow at Nanyang Technological University in Singapore for five years before turning down a tenure-track position in theoretical physics to commit fully to AI safety research. He is also a co-founder of the Beyond Institute for Theoretical Science (BITS) and has served as both a MATS scholar and mentor. Adam Shai holds a PhD from Caltech and has over a decade of experience investigating the neural basis of intelligent behavior, most recently as a researcher at Stanford. Simplex operates as a research program under the Astera Institute, with offices in Emeryville, California and London, United Kingdom. The organization also maintains nonprofit status through fiscal sponsorship by Epistea, z.s., a Czech organization supporting projects in existential security, epistemics, and effective altruism. The team's research approach is grounded in Computational Mechanics, an established scientific field that studies the information processing structures of dynamic systems. Their foundational result showed that transformers trained on next-token prediction spontaneously organize their activations into geometric structures predicted by Bayesian belief updating over hidden states of a world model. Even when trained on simple token sequences from hidden Markov models, complex fractals emerge in the residual stream that reflect the hidden structure of the world the model is learning. Key research areas include transformer representation geometry and belief state structures, attention mechanisms and Bayesian belief updating, in-context learning emergence, quantum and post-quantum representations in neural networks, factored world model decomposition, and interpretability of neural network reasoning signals. Simplex received a $74,000 grant (plus a $45,000 speculation grant) through the Survival and Flourishing Fund's 2024 S-Process, with general support routed through Epistea. The organization's early research was supported by a Lightspeed Grant and its founders' affiliation with PIBBSS (Principles of Intelligent Behavior in Biological and Social Systems). The team has grown from its two founders to include additional research scientists and collaborators, and both founders serve as MATS mentors for Summer 2026.
Theory of Change
Updated 05/18/26Simplex believes that understanding intelligence is the key to AI safety. Their theory of change holds that by developing a rigorous, principled science of how neural networks internally represent and process information -- drawing on computational mechanics from physics and insights from neuroscience -- researchers can create the theoretical foundations needed to make advanced AI systems interpretable, controllable, and compatible with humanity. By uncovering the geometric structures and computational principles that emerge in trained neural networks, Simplex aims to provide the scientific basis for meaningful AI alignment interventions, safety benchmarks, and a deep understanding of AI cognition that goes beyond surface-level behavioral testing.
Grants Received
Updated 05/18/26Projects– no linked projects
Updated 05/18/26Discussion
Key risk: The main risk is that their physics-of-information agenda, validated primarily on toy setups, fails to scale to frontier models or produce actionable, time-sensitive safety interventions, in which case additional funding beyond Open Philanthropy’s support yields low counterfactual impact.
Case for funding: By grounding interpretability in computational mechanics and demonstrating that transformer activations form geometries consistent with Bayesian belief states, Simplex is unusually well-positioned to develop a rigorous, general theory of network cognition that could translate into scalable, principled alignment tools rather than ad hoc heuristics.