Paul Colognese
Bio
Paul Colognese is an AI safety researcher based in the United Kingdom, focused on understanding how AI systems represent and self-reflect on their goals and beliefs. He holds a PhD in Mathematics (Geometry, Topology, and Dynamical Systems) from the University of Warwick, where he researched translation surfaces under the supervision of Professor Mark Pollicott. His AI safety work includes building safety evaluations for Anthropic and the UK AI Security Institute, including control and sabotage evaluations that measure whether deployed AI agents could undermine safety systems. He has conducted AI threat modeling on catastrophic risk scenarios and carried out interpretability research in which he demonstrated the ability to detect an AI system's objectives through technical analysis. He participated in the MATS research program mentored by Evan Hubinger at Anthropic. He founded the London Initiative for Safe AI, a UK-based research center, and serves as AI Alignment Lead at the Center for the Study of Apparent Selves. He is active on LessWrong and the Alignment Forum, and participated in AI Safety Camp (AISC9) working on detecting agentic AI objectives via interpretability methods.
Links
- Personal Website
- https://www.paulcolognese.com/
- Twitter / X
- LessWrong
- paul-colognese
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 23, 2026, 12:21 AM UTC
- Created
- Mar 20, 2026, 2:56 AM UTC