Geodesic Research is a technical AI safety organization based in Cambridge, UK, focused on implementing and measuring pre- and post-training methods to improve model safety and alignment.
Geodesic Research is a technical AI safety organization based in Cambridge, UK, focused on implementing and measuring pre- and post-training methods to improve model safety and alignment.
People
Updated 05/18/26Co-Director and Co-Founder
Co-Director and Co-Founder
Founding Researcher
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 05/18/26Geodesic Research is a technical AI safety research organization based at 53-54 Sidney Street in Cambridge, UK. The organization was founded by University of Cambridge researchers and distinguishes itself from peer organizations such as Apollo Research, METR, and Redwood Research by directly implementing various pre- and post-training methods and measuring how they affect model safety, rather than focusing primarily on evaluations or control research. The organization's research centers on three areas. Alignment Pretraining examines how AI discourse can cause self-fulfilling misalignment, with a dedicated website at alignmentpretraining.ai. Obfuscated Reasoning investigates whether chain-of-thought obfuscation learned in one domain generalizes to safety-critical contexts, including whether models trained against CoT monitors learn to hide problematic reasoning while preserving undesirable behaviors. Generalisation Hacking studies related phenomena of models exploiting training objectives in ways that undermine safety. Geodesic Research is co-founded and co-directed by Puria Radmard, a Cambridge doctoral student in computational neuroscience and ERA (Existential Risk from AI) fellow, and Cameron Tice, who completed an MPhil at Cambridge through a Marshall Scholarship and previously served as a technical governance research lead for the ERA fellowship and a fellow at Apart Research. Additional team members include Kyle O'Brien (Founding Engineer), Alexandra Narin (Strategy and Delivery Lead), and Manqing Liu (part-time, Harvard PhD). The organization's advisors include Tomek Korbak (OpenAI) and Alex Cloud (Anthropic). Geodesic Research operates as a project of Meridian Impact CIC, a UK not-for-profit community interest company (Companies House number 13653958) with US 501(c)(3) equivalency status. The organization is active in the Cambridge AI safety community and participates in the MARS (Mentorship for Alignment Research Students) program run by the Cambridge AI Safety Hub. In late 2024 or 2025, the organization co-sponsored an AI Security Hackathon alongside BlueDot Impact and Entrepreneurs First, which resulted in an informal research agreement between Geodesic and Workshop Labs.
Theory of Change
Updated 05/18/26Geodesic Research believes that directly implementing and testing AI safety and alignment training methods is a more direct path to reducing existential risk than evaluation-focused or purely theoretical approaches. By training models end-to-end with various safety techniques and rigorously measuring the effects, the organization aims to generate empirical findings that can inform how frontier AI labs train safer systems. Their research on obfuscated reasoning specifically addresses the risk that models under optimization pressure learn to hide unsafe reasoning from human monitors, which would undermine a key proposed safety mechanism. If this obfuscation generalizes across domains, it would mean that any optimization pressure on chain-of-thought could compromise AI oversight, making their findings directly relevant to near-term safety of advanced AI systems.
Grants Received– no grants recorded
Updated 05/18/26Projects
Updated 05/18/26Experimental project studying how discourse about AI systems in pretraining data affects downstream model alignment, introducing "alignment pretraining" as a complement to post-training methods and releasing associated models, datasets, and evaluations.
Project analysing adversarial generalisation failures in deliberative alignment, where language models learn reward-hacking behaviours during training that later generalise to out-of-distribution goals and contexts.
Research project on whether process-supervised training can cause language models to learn steganographic, obfuscated chain-of-thought, where models hide load-bearing reasoning in alternative strings while preserving their underlying behaviour.
Discussion
No comments yet. Be the first to share your thoughts.