Paul Colognese

United Kingdom

Bio

Updated 03/23/26

Paul Colognese is an AI safety researcher based in the United Kingdom, focused on understanding how AI systems represent and self-reflect on their goals and beliefs. He holds a PhD in Mathematics (Geometry, Topology, and Dynamical Systems) from the University of Warwick, where he researched translation surfaces under the supervision of Professor Mark Pollicott. His AI safety work includes building safety evaluations for Anthropic and the UK AI Security Institute, including control and sabotage evaluations that measure whether deployed AI agents could undermine safety systems. He has conducted AI threat modeling on catastrophic risk scenarios and carried out interpretability research in which he demonstrated the ability to detect an AI system's objectives through technical analysis. He participated in the MATS research program mentored by Evan Hubinger at Anthropic. He founded the London Initiative for Safe AI, a UK-based research center, and serves as AI Alignment Lead at the Center for the Study of Apparent Selves. He is active on LessWrong and the Alignment Forum, and participated in AI Safety Camp (AISC9) working on detecting agentic AI objectives via interpretability methods.

Community Signal

Updated 03/23/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/23/26

LTFF 2022 Q1 - Paul Colognese

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$13,000

Paul Colognese

Bio

Community Signal

Links

Grants