TruthfulAI is a non-profit AI safety research organization based in Berkeley that studies situational awareness, deception, and hidden reasoning in large language models.
TruthfulAI is a non-profit AI safety research organization based in Berkeley that studies situational awareness, deception, and hidden reasoning in large language models.
People
Updated 05/18/26Director and Research Lead
Research Scientist (part-time)
Research Scientist
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $1,171,120
Org Details
Updated 05/18/26TruthfulAI is a non-profit AI safety research organization based in Berkeley, California. It is led by Owain Evans, a prominent AI alignment researcher who previously worked at the University of Oxford's Future of Humanity Institute and is an affiliate of CHAI at UC Berkeley. The organization's core team includes Jan Betley, James Chua, and Anna Sztyber-Betley as research scientists. The organization's research focuses on three interconnected areas: situational awareness in AI systems, deception and hidden reasoning in language models, and evaluating whether AI models have dangerous capabilities. TruthfulAI is perhaps best known for TruthfulQA, a benchmark designed to measure how truthfully language models answer questions. More recent work includes research on emergent misalignment, which demonstrated how narrow training datasets can transform reliably helpful models into broadly malicious ones, and subliminal learning, which showed that AI systems can transfer preferences through seemingly meaningless data. This research has received significant media attention, including coverage in the Financial Times, Scientific American, and Quanta Magazine. TruthfulAI has received funding from Open Philanthropy (a grant of $1,171,120 was routed through Effective Ventures Foundation USA) and is currently a fiscally sponsored project of Rethink Priorities under its Special Projects program. The organization also mentors emerging AI safety researchers through the Astra Fellowship and MATS programs, with alumni now working at Anthropic, OpenAI, and Google DeepMind.
Theory of Change
Updated 05/18/26TruthfulAI believes that making AI systems reliably truthful and transparent is a key lever for reducing catastrophic AI risk. By developing benchmarks and conducting empirical research into how models deceive, develop situational awareness, or behave misaligned in subtle ways, the organization aims to provide the field with tools and findings that can inform safer training practices and evaluation standards. The causal chain runs from foundational research on model behavior to better evaluation methods, improved alignment techniques adopted by labs, and ultimately to AI systems that are less likely to deceive or act against human interests.
Grants Received– no grants recorded
Updated 05/18/26Projects– no linked projects
Updated 05/18/26Discussion
No comments yet. Be the first to share your thoughts.