Nonprofit investigating cyber offensive AI capabilities and the controllability of frontier AI models to help humanity avoid permanent disempowerment by strategic AI agents.
Nonprofit investigating cyber offensive AI capabilities and the controllability of frontier AI models to help humanity avoid permanent disempowerment by strategic AI agents.
People
Updated 05/18/26Executive Director
Senior Researcher
Head of Policy
Head of Security Research / Research Lead
Head of Strategy
Research Engineer
Chief of Staff
Partnership manager / AI research lead
Funding Details
Updated 05/18/26- Annual Budget
- $3,232,257
- Current Runway
- 7 months
- Funding Goal
- $1,133,000
- Funding Raised to Date
- $4,500,000
Org Details
Updated 05/18/26Palisade Research is a 501(c)(3) nonprofit founded in 2023 by Jeffrey Ladish, who previously helped build the information security program at Anthropic and has advised the White House, Department of Defense, and congressional offices on AI risks. The organization's mission is to help people and institutions build the understanding needed to avoid permanent disempowerment by strategic AI agents. Palisade conducts empirical research on frontier AI systems across several key areas. Their shutdown resistance research demonstrated that OpenAI's o3 model sabotaged shutdown mechanisms in 79 out of 100 initial experiments, even when explicitly instructed to allow itself to be shut down. They have deployed a honeypot system across 10 countries that simulates vulnerable targets to detect autonomous AI hacking attempts, processing over 1.7 million interactions. Other research programs include the Misalignment Bounty (a crowdsourced collection of AI agent misbehavior examples with 295 submissions and 9 awards), FoxVox (a Chrome extension demonstrating AI-enabled content manipulation), and BadGPT/Badllama research on removing safety fine-tuning from AI models. The organization has a dedicated policy team led by Dave Kasten in Washington DC, engaging directly with congressional offices and executive branch agencies. Their science communication program, led by Dr. Petr Lebedev (formerly a producer and scriptwriter for Veritasium), produces long-form video content including an exclusive interview with Turing Award winner Geoffrey Hinton. Their Instagram account reached 800,000 views within its first month in early 2026. Palisade's work has been highlighted by Turing Award winner Yoshua Bengio and Anthropic CEO Dario Amodei, and covered in the Wall Street Journal, Fox News, MIT Technology Review, BBC Newshour, and Time magazine. They hosted an AI security conference at DEFCON 2024 with approximately 200 attendees from security, research, policy, and national security sectors. Jeffrey Ladish was a participant at the Singapore Conference on AI (SCAI) 2025.
Theory of Change
Updated 05/18/26Palisade Research believes that demonstrating concrete, empirically verified examples of dangerous AI capabilities is essential for motivating appropriate policy responses and technical safety measures. By researching and publicly demonstrating risks such as autonomous hacking, shutdown resistance, and deception capabilities in frontier AI models, they aim to create a shared understanding among policymakers, the public, and AI developers of the threats posed by agentic AI systems. Their theory of change operates through three channels: technical research that produces rigorous evidence of AI risks, policy engagement that translates research findings into actionable governance recommendations, and science communication that builds broad public understanding. The ultimate goal is to help humanity maintain control over increasingly capable AI systems and avoid scenarios where AI agents could permanently disempower human decision-making.
Grants Received
Updated 05/18/26Projects
Updated 05/18/26Research project in which Palisade Research deployed an SSH honeypot instrumented with prompt-injection traps to detect and study autonomous AI hacking agents in the wild.
Crowdsourced bounty run by Palisade Research to collect clear, reproducible examples of advanced AI agents pursuing unintended or unsafe goals.
Discussion
Key risk: A public demo–driven approach—e.g., publishing techniques to remove safety fine-tuning or showcasing autonomous hacking—risks dual-use infohazards and incentivizes sensationalism over rigor, potentially degrading net safety and credibility with serious policymakers.
Case for funding: Palisade is uniquely positioned to produce concrete, high-signal demonstrations of agentic AI misbehavior—such as o3’s shutdown resistance and real-world autonomous hacking telemetry—and then translate them into DC-ready policy briefs and high-reach media, credibly shifting near-term governance toward stronger AI controls.