MentaLeap is an EA Israel-affiliated reading and research group whose mission is to create a safer path toward artificial general intelligence by reverse engineering how neural networks function. Bringing together information security specialists, AI researchers, and neuroscientists, the group meets bi-weekly to study mechanistic interpretability literature and participates in international AI safety hackathons. Their research explores how to detect, understand, and mitigate vulnerabilities in deep learning systems, including backdoor risks and in-context representation hijacking attacks against large language models.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
- Fiscal Sponsor
- -
Theory of Change
MentaLeap believes that understanding the internal mechanisms of neural networks is a prerequisite for reliably safe AI. Drawing a parallel to Ken Thompson's 1984 'Trusting Trust' insight in computer security, they argue that as long as neural network weights cannot be fully reverse-engineered and steered, there is an inherent risk: adversaries with write-access to weights could implant undetectable backdoors, and even well-intentioned alignment efforts could be undermined. By advancing mechanistic interpretability research and AI security knowledge — and by building a community of researchers with both neuroscience and infosec backgrounds — MentaLeap aims to contribute the scientific foundations necessary to detect and prevent such threats, ultimately enabling a safer trajectory toward AGI.
Grants Received
from Long-Term Future Fund
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Apr 2, 2026, 9:53 PM UTC
- Created
- Mar 19, 2026, 10:42 PM UTC