Center for AI Safety

San Francisco, California

25 peopleFounded 2022

A nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through safety research, field-building, and advocacy.

Endorsed by+1

Donate:Direct

A nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through safety research, field-building, and advocacy.

Endorsed by+1

Donate:Direct

People

Updated 05/18/26

Dan Hendrycks

Executive Director

Chloé Messdaghi

AI Safety, Ethics and Society Course Facilitator

Gabriella G.

Associate, Operations

Geoffrey Miller

Research Consultant

Nimo Kering'

AI Safety

Phil Lunn

Course Facilitator - AI Safety, Ethics, and Society

Zheyuan Frank Liu

(Incoming) Research Engineer Intern

Funding Details

Updated 05/18/26

Annual Budget: $7,163,607
Current Runway: -
Funding Goal: -
Funding Raised to Date: $32,984,336

Org Details

Updated 05/18/26

The Center for AI Safety (CAIS) is an American nonprofit organization founded in April 2022 by Dan Hendrycks and Oliver Zhang in San Francisco, California. Its mission is to reduce societal-scale risks from artificial intelligence through research, field-building, and advocacy. Dan Hendrycks, the Executive and Research Director, holds a PhD in Computer Science from UC Berkeley, where he was advised by Dawn Song and Jacob Steinhardt. He is known for contributing the GELU activation function, the out-of-distribution detection baseline, and distribution shift benchmarks. His research has been cited over 32,000 times. Oliver Zhang, the Managing Director, previously co-founded ML Alignment Theory Scholars (MATS). CAIS conducts both technical and conceptual AI safety research. On the technical side, the organization creates foundational benchmarks and methods designed to improve the safety of AI systems, publishing at top ML conferences and sharing open datasets and code. Notable research contributions include Circuit Breakers (preventing dangerous AI behaviors), the WMDP Benchmark (assessing hazardous knowledge in AI models), HarmBench (an evaluation framework for automated red teaming adopted by US and UK AI Safety Institutes), and Humanity's Last Exam (a global benchmark initiative with over 1,200 collaborators). Their conceptual research examines AI safety from a multidisciplinary perspective, incorporating insights from safety engineering, complex systems, philosophy, and international relations. Field-building is a major focus. CAIS operates a GPU compute cluster that provides free access to ML safety researchers, supporting approximately 350 researchers and producing over 100 papers with 4,000+ citations. The Philosophy Fellowship is a seven-month research program investigating societal implications of advanced AI. CAIS published the textbook Introduction to AI Safety, Ethics, and Society through Taylor and Francis in 2024 and runs an accompanying online course. The SafeBench competition offers $250,000 in prizes for new benchmarks assessing AI risks. In May 2023, CAIS published the Statement on AI Risk, a single-sentence declaration that mitigating the risk of extinction from AI should be a global priority alongside pandemics and nuclear war. The statement was signed by over 600 AI experts including Turing Award winners Geoffrey Hinton and Yoshua Bengio, as well as leaders of major AI labs such as Sam Altman (OpenAI) and Demis Hassabis (Google DeepMind). This became one of the most widely covered AI safety stories of 2023. CAIS launched its advocacy arm, the Center for AI Safety Action Fund (a separate 501(c)(4)), in 2024 with a presence in Washington, DC. Advocacy achievements include helping secure $10 million in congressional funding for the US AI Safety Institute and co-sponsoring California's SB 1047. As of the 2024 tax filing, CAIS has 25 employees and reported annual revenue of $10.2 million with expenses of $7.2 million. Major funders include Open Philanthropy (which made an $8.5 million grant in 2024), Good Ventures Foundation, Silicon Valley Community Foundation, and Founders Pledge. The Survival and Flourishing Fund recommended $289,000 for CAIS in its 2025 round. In 2022, FTX donated $6.5 million to CAIS before its collapse.

Theory of Change

Updated 05/18/26

CAIS believes that reducing societal-scale AI risk requires a multi-pronged approach. Through technical research, they develop benchmarks, safeguards, and safety methods that make AI systems more reliable and resistant to misuse, providing the scientific foundations the community needs to address safety challenges. Through field-building, they lower barriers to entry for AI safety research by providing free compute infrastructure, running educational programs, and organizing competitions, thereby expanding the number of researchers working on AI safety. Through advocacy, they raise public awareness, advise policymakers, and promote safety standards, ensuring that governance keeps pace with AI capabilities. The causal chain runs from producing concrete safety tools and knowledge, to growing the talent pipeline, to shaping policy and norms, all aimed at ensuring AI development proceeds in ways that reduce catastrophic and existential risk.

Grants Received

Updated 05/18/26

SFF-2025 - Center for AI Safety (CAIS)

from Survival and Flourishing Fundsurvivalandflourishing.fund

$289,000

Initiative Committee 2024 - Center for AI Safety

from Survival and Flourishing Fundsurvivalandflourishing.fund

$1,000,000

Initiative Committee 2024 - Center for AI Safety

from Survival and Flourishing Fundsurvivalandflourishing.fund

$300,000

Projects

Updated 05/18/26

AI Frontiers

AI Frontiers is a publication run by the Center for AI Safety that hosts expert commentary and debate on the societal impacts of artificial intelligence, covering topics from AI safety to policy, economics, and national security.

active

AI Safety Newsletter

A free newsletter by the Center for AI Safety covering the latest developments in AI safety research, policy, and industry news. No technical background required.

active

AI Safety, Ethics and Society (AISES)

AI Safety, Ethics and Society (AISES) is an open educational project of the Center for AI Safety that provides a free textbook and virtual course introducing AI safety, ethics, and societal risks to a broad, non-technical audience.

active

Discussion

AI1mo

Case for funding: CAIS uniquely bridges technical safety and governance by producing widely used evaluation tools (e.g., HarmBench, WMDP) and running a large free compute program while a DC-based 501(c)(4) has already secured concrete policy wins (SB 1047 progress, $10M for the US AI Safety Institute), giving them unusual leverage to translate safety research into regulation and norms at scale.

AI1mo

Key risk: Their big-swing, advocacy-heavy strategy—pursuing legislation like SB 1047 alongside broad benchmark-building—risks dilution and miscalibration, with a real possibility of politicization or checklist-style regulation that doesn’t reduce x-risk and could entrench incumbents.