An AI safety research group led by David Krueger at the University of Cambridge's Computational and Biological Learning Lab (2021-2024), focused on technical AI alignment, deep learning safety, and reducing existential risk from advanced AI.
An AI safety research group led by David Krueger at the University of Cambridge's Computational and Biological Learning Lab (2021-2024), focused on technical AI alignment, deep learning safety, and reducing existential risk from advanced AI.
People– no linked people
Updated 04/03/26Funding Details
Updated 04/03/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 04/03/26David Krueger's Research Group at Cambridge was an AI safety research lab based at the University of Cambridge's Department of Engineering from 2021 to 2024. The group operated within Cambridge's Computational and Biological Learning Lab (CBL) and Machine Learning Group (MLG), and was formally known as the Krueger AI Safety Lab (KASL). David Krueger, who completed his doctoral studies in deep learning under Yoshua Bengio, Roland Memisevic, and Aaron Courville at Mila (2013-2021), joined Cambridge as an Assistant Professor (Lecturer, tenure track) in Fall 2021. He established his research group with a focus on reducing the risk of human extinction from artificial intelligence through technical research, education, outreach, governance, and advocacy. The group's research spanned multiple areas of deep learning, AI alignment, AI safety, and AI ethics. Key research themes included alignment failure modes, algorithmic manipulation, mechanistic interpretability, robustness, reward gaming, goal misgeneralization, neural scaling laws, and learning from human preferences. Notable publications include seminal work on goal misgeneralization in deep reinforcement learning (ICML 2022), characterizing manipulation from AI systems (2023), broken neural scaling laws (ICLR 2023), and a major collaborative paper on foundational challenges in assuring alignment and safety of large language models with over 35 co-authors. At its peak, the group comprised approximately 15 active members including the PI, seven PhD students, research assistants, affiliates, and visiting researchers, with over 25 additional alumni who passed through the lab as interns and visitors. The lab ran paid research internship programs in collaboration with the ERA (Existential Risk Alliance) Fellowship, covering topics in deep learning, safe reinforcement learning, LLM evaluations, interpretability, and AI governance. During his time at Cambridge, Krueger also served as a Research Director on the founding team of the UK AI Safety Institute (formerly Frontier AI Task Force) in 2023, and initiated the influential CAIS Statement on AI Risk, which was signed by hundreds of AI experts including Geoffrey Hinton and Yoshua Bengio. He was also affiliated with the Centre for the Study of Existential Risk (CSER) at Cambridge. The group received funding from multiple sources, including a $250,000 grant over four years from Open Philanthropy (via Cambridge in America), a $140,050 grant through the Berkeley Existential Risk Initiative for collaboration, and a $200,000 grant from the Long-Term Future Fund for computing resources and researcher salaries. Schmidt Sciences also provided grant funding for research on semantic test set contamination in LLMs. In 2024, Krueger departed Cambridge to become an Assistant Professor in Robust, Reasoning, and Responsible AI at the University of Montreal, where he is also a Core Academic Member at Mila. The research group continues at Mila under the KASL name. Krueger holds a Canada CIFAR AI Chair and the IVADO Professorship in Responsible AI. In 2025, he founded Evitable, a nonprofit focused on informing and organizing the public to confront societal-scale risks of AI.
Theory of Change
Updated 04/03/26The group's theory of change centered on the belief that reducing existential risk from AI requires both deep technical understanding of how AI systems fail and broader efforts in governance and coordination. By researching alignment failure modes such as goal misgeneralization, reward gaming, and algorithmic manipulation, the group aimed to identify and characterize the ways advanced AI systems could become dangerous before those failures occur in high-stakes deployments. Their work on interpretability and robustness sought to make AI systems more transparent and reliable. By publishing at top venues, training new AI safety researchers through PhD supervision and internship programs, and engaging in policy work such as the UK AI Safety Institute and the CAIS Statement on AI Risk, the group aimed to both advance the technical frontier of AI safety and build the human capital and institutional capacity needed to govern transformative AI.
Grants Received
Updated 04/03/26Projects– no linked projects
Updated 04/03/26Discussion
No comments yet. Be the first to share your thoughts.