Safety Research
About
DeepMind Safety Research refers to the AI safety research program within Google DeepMind, the London-based AI research lab founded in 2010 by Demis Hassabis, Mustafa Suleyman, and Shane Legg and acquired by Google in 2014. Google DeepMind was formed in April 2023 through the merger of DeepMind and Google Brain. The safety research effort has roots dating back to 2016, when DeepMind established a formalized AGI safety team, and has grown substantially since then. In February 2024, Google DeepMind announced a new, consolidated AI Safety and Alignment organization under the leadership of Anca Dragan, a robotics and human-AI interaction researcher who joined from UC Berkeley. The leadership team also includes Rohin Shah, Allan Dafoe, and Dave Orr, with Shane Legg serving as executive sponsor. A parallel governance body, the Responsibility and Safety Council (RSC), is co-chaired by COO Lila Ibrahim and VP of Responsibility Helen King. The safety program is organized into several sub-teams: AGI Alignment (covering mechanistic interpretability, scalable oversight, and related research), Frontier Safety (which develops and runs dangerous capability evaluations and maintains the Frontier Safety Framework), Gemini Safety (safety training for current Gemini models), and Voices of All in Alignment (alignment techniques for value and viewpoint pluralism). The team has grown rapidly, with reported growth rates of approximately 39% and 37% in consecutive years, and comprises roughly 30 to 50 researchers across the AGI Alignment and Frontier Safety teams, with around 10 members doing substantial external mentoring and supervising approximately 50 external researchers annually. The Frontier Safety Framework, introduced in May 2024, is a central output of the program. It defines Critical Capability Levels (CCLs) — thresholds at which a model's capabilities could enable severe harm — and establishes early warning evaluations and mitigation protocols keyed to those thresholds. The framework has been updated multiple times, including revisions in February 2025 and September 2025 to address manipulation risks and model shutdown risks. In April 2025, the team also published a comprehensive paper, An Approach to Technical AGI Safety and Security, outlining four categories of risk: misuse, misalignment, mistakes, and structural risks. Google DeepMind collaborates with external organizations on safety, including Apollo Research, Redwood Research, the UK AI Security Institute, and the Frontier Model Forum (co-founded in 2023). The program maintains a public-facing Medium blog at deepmindsafetyresearch.medium.com, launched in September 2018, which publishes research summaries and technical posts.
Theory of Change
Google DeepMind's safety research operates on the premise that being at the frontier of AI development is itself a safety strategy: by building and studying the most capable AI systems, the team can identify dangerous capabilities before they are widely deployed, develop technical alignment and interpretability methods that scale with model capability, and embed safety practices into the development of systems like Gemini. The Frontier Safety Framework embodies this approach by defining Critical Capability Levels and requiring proactive evaluation and mitigation before thresholds are crossed. The team also believes that influencing industry norms, publishing open research, and collaborating with governments and nonprofits multiplies their impact beyond what internal work alone could achieve. Reducing existential risk requires both solving the technical alignment problem and shaping the institutional environment in which powerful AI is developed.
Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -
- Last Updated
- Apr 3, 2026, 2:03 AM UTC
- Created
- Apr 3, 2026, 2:03 AM UTC