Robert Kirk

London, UK

Bio

Updated 03/23/26

Robert Kirk is a Research Scientist at the UK AI Security Institute (AISI), where he is the acting lead of the alignment red-teaming sub-team, focusing on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. He completed his PhD at UCL's DARK Lab in January 2025, supervised by Tim Rocktäschel and Edward Grefenstette, with his dissertation on generalisation in LLM fine-tuning and reinforcement learning agents. Prior to his PhD he received an integrated Master's in Mathematics and Computer Science from Somerville College, Oxford, and worked as a software and infrastructure engineer at Smarkets. His research centers on generalisation in reinforcement learning, out-of-distribution robustness, AI safety and alignment, and evaluating the effects of RLHF on language model behaviour and diversity. He received a Long-Term Future Fund grant to perform human evaluations for evaluating different machine learning methods for aligning language models, and his paper "Understanding the Effects of RLHF on LLM Generalisation and Diversity" (2023) is widely cited in the alignment research community. He also contributes to the Alignment Newsletter covering interpretability and reinforcement learning, and serves as a mentor in the MATS program for the UKAISI red-teaming stream.

Community Signal

Updated 03/23/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/23/26

LTFF 2022 Q4 - 2 - Robert Kirk

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$10,000

Robert Kirk

Bio

Community Signal

Links

Grants