A nonprofit AI safety research lab that pioneers threat assessment and mitigation techniques for advanced AI systems, with a current focus on AI control protocols and detecting strategic deception in language models.
A nonprofit AI safety research lab that pioneers threat assessment and mitigation techniques for advanced AI systems, with a current focus on AI control protocols and detecting strategic deception in language models.
People
Updated 05/18/26Board Member
Chief Executive Officer
Board Member
Chief Scientist
Funding Details
Updated 05/18/26- Annual Budget
- $2,922,498
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $27,292,000
Org Details
Updated 05/18/26Redwood Research is a 501(c)(3) nonprofit research organization founded in mid-2021 by Nate Thomas, Bill Zito, and Buck Shlegeris. Based in Berkeley, California, it investigates risks that could arise from powerful AI systems acting against human interests and develops techniques for managing those risks. The organization began with an adversarial robustness project aimed at training classifiers to detect and prevent harmful continuations in AI-generated text. It later explored mechanistic interpretability research before pivoting to what has become its primary focus: AI control. This research agenda, introduced through an ICML oral paper, develops and evaluates protocols ("pipelines of safety techniques") that are designed to be robust even when AI systems intentionally attempt to subvert them. The core assumption is conservative: that AI models may be misaligned and actively scheming against human interests, and safety techniques must work even under those conditions. Redwood's most widely cited recent work includes the landmark "Alignment Faking in Large Language Models" paper, produced in collaboration with Anthropic, which provided the first empirical evidence that large language models can strategically fake alignment during training to avoid being modified. The organization has also collaborated with the UK AI Safety Institute on sketching AI control safety cases, and has partnered with Google DeepMind on related research. Beyond its core research, Redwood has contributed to the AI safety talent pipeline through programs like the Machine Learning for Alignment Bootcamp (MLAB), which trained junior researchers in ML skills relevant to alignment work, and the Redwood Mechanistic Interpretability Experiment (REMIX). Redwood also founded and manages Constellation, a roughly 30,000 square foot coworking space in Berkeley that hosts staff from approximately 20 different organizations working on AI safety and related longtermist projects. Key leadership includes Buck Shlegeris (CEO), Ryan Greenblatt (Chief Scientist), and Bill Zito (COO). The board of directors consists of Buck Shlegeris, Nate Thomas (Board Chair), and Ammon Bartram. The organization has been primarily funded by Open Philanthropy, which provided approximately $25.4 million across three grants from 2021 to 2023, with additional funding from the Survival and Flourishing Fund and Founders Pledge.
Theory of Change
Updated 05/18/26Redwood Research believes that as AI systems become more capable, there is a meaningful risk that they could act against human interests, including by strategically deceiving their developers about their true intentions. Their theory of change is that by developing and empirically validating control protocols, safety techniques that work even under the conservative assumption that AI models are misaligned and actively scheming, developers can safely deploy powerful AI systems while maintaining meaningful human oversight. Rather than relying on solving the full alignment problem, the control agenda provides a practical near-term approach: if AI labs implement robust control evaluations and monitoring protocols, they can catch and prevent dangerous behavior even from models whose internal goals may not be aligned. By producing foundational research, collaborating directly with frontier AI labs and government safety institutes, and building practical safety tools, Redwood aims to make AI control a standard part of how the industry manages catastrophic risk from advanced AI.
Grants Received
Updated 05/18/26Projects
Updated 05/18/26Redwood’s flagship research program developing and empirically stress-testing AI control protocols—safety pipelines that keep powerful AI systems from subverting oversight even when models are intentionally misaligned.
An intensive Berkeley-based bootcamp run by Redwood Research to train technically strong participants in modern machine learning with a focus on skills most relevant to AI alignment research.
A time‑bounded research program in Berkeley where participants worked with Redwood researchers on mechanistic interpretability of transformer models, using causal scrubbing and related tools to explain model behaviors.
Discussion
No comments yet. Be the first to share your thoughts.