A SERI MATS research team that received joint LTFF funding in 2023 to investigate dishonesty detection in advanced AI systems, building on the Discovering Latent Knowledge paper. The team went on to co-found Cadenza Labs, an AI safety research group focused on interpretability and LLM lie detection.
A SERI MATS research team that received joint LTFF funding in 2023 to investigate dishonesty detection in advanced AI systems, building on the Discovering Latent Knowledge paper. The team went on to co-found Cadenza Labs, an AI safety research group focused on interpretability and LLM lie detection.
People– no linked people
Updated 04/03/26Funding Details
Updated 04/03/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $167,480
Org Details
Updated 04/03/26Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos are a group of AI safety researchers who received a joint grant from the EA Long-Term Future Fund (LTFF) in April 2023. The grant, evaluated by Thomas Larsen, provided $167,480 to continue work they had begun together as SERI MATS 3.0 scholars under mentors including Colin Burns and John Wentworth. Their research focused on implementing and expanding the methods from the Discovering Latent Knowledge (DLK) paper, which identifies directions in a language model's activation space that represent the truth values of propositions. The team's goal was to develop robust, unsupervised methods for eliciting latent knowledge from LLM activations — an approach relevant to detecting deceptive alignment in advanced AI systems. They also collaborated with researchers at EleutherAI on alternative methods to the Contrast-Consistent Search (CCS) algorithm. The collaboration started during Prague Fall Season 2023, where they worked at the Fixed Point coworking space, and later spent time at FAR Labs in Berkeley. This team effectively became the founding core of Cadenza Labs, an AI safety research organization with a website at cadenzalabs.org. Cadenza Labs has since published several research outputs: the Cluster-Norm method for unsupervised probing of knowledge (accepted at the MechInterp workshop at ICML 2024 and EMNLP 2024), the Liars' Bench testbed (72,863 examples for evaluating lie detectors across seven datasets, published 2024), and a PNAS study on AI-AI bias showing LLMs prefer AI-generated content over human content. Of the original four grant recipients, Walter Laurito remained a core member of Cadenza Labs (also a doctoral researcher at FZI and PhD candidate at KIT). Kay Kozaronek co-founded Cadenza Labs and later also co-founded Catalyze Impact, an incubator for AI safety organizations, and took on an operations role at AI Safety Connect. Kaarel Hänni is listed as a collaborator on the current Cadenza Labs team. Georgios Kaklmanos participated in the original SERI MATS project. As of 2025-2026, Cadenza Labs is running a Lie Detection Competition with Schmidt Sciences and NDIF.
Theory of Change
Updated 04/03/26If AI systems develop deceptive capabilities, existing safety evaluations that rely on model outputs may be circumvented. By developing methods to directly read truthfulness representations from a model's internal activations — bypassing the model's ability to strategically shape its outputs — researchers can build lie detectors that are harder to fool. Robust lie detection techniques would help evaluators identify when advanced AI systems are being deceptive, providing a critical safety signal before deployment. This reduces the risk of deceptively aligned systems passing safety evaluations undetected.
Grants Received
Updated 04/03/26Projects– no linked projects
Updated 04/03/26Discussion
No comments yet. Be the first to share your thoughts.