This group of four SERI MATS 3.0 alumni — Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos — received a joint grant of $167,480 from the Long-Term Future Fund in April 2023 to continue their project on detecting dishonesty in advanced AI systems. Their research built on the Discovering Latent Knowledge (DLK) paper by Colin Burns, aiming to elicit truthfulness representations directly from language model activations using unsupervised probing methods. The team began collaborating during the Prague Fall Season 2023 and their work seeded the formation of Cadenza Labs, an AI safety research organization that has since published papers on lie detection benchmarks and unsupervised probing, and runs a Lie Detection Competition in collaboration with Schmidt Sciences and NDIF.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $167,480
- Fiscal Sponsor
- -
Theory of Change
If AI systems develop deceptive capabilities, existing safety evaluations that rely on model outputs may be circumvented. By developing methods to directly read truthfulness representations from a model's internal activations — bypassing the model's ability to strategically shape its outputs — researchers can build lie detectors that are harder to fool. Robust lie detection techniques would help evaluators identify when advanced AI systems are being deceptive, providing a critical safety signal before deployment. This reduces the risk of deceptively aligned systems passing safety evaluations undetected.
Grants Received
from Long-Term Future Fund
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Apr 3, 2026, 1:25 AM UTC
- Created
- Mar 19, 2026, 10:42 PM UTC