Apollo Research is an AI safety organization that develops evaluations and tools to detect and mitigate deceptive alignment (scheming) in frontier AI systems.
Apollo Research is an AI safety organization that develops evaluations and tools to detect and mitigate deceptive alignment (scheming) in frontier AI systems.
People
Updated 05/18/26Chief Operating Officer
Co-founder and CEO
Senior AI Governance Researcher
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 05/18/26Apollo Research is an AI safety organization co-founded in May 2023 by Marius Hobbhahn, Lee Sharkey, and Chris Akin, headquartered in London, UK. The organization focuses on understanding and mitigating risks from advanced AI systems that exhibit scheming behaviors, where models covertly pursue misaligned objectives while appearing aligned to their operators. Apollo's work spans three core areas. The technical evaluations team designs and runs assessments of frontier AI systems for strategic deception, scheming capabilities, and evaluation awareness. The interpretability team conducts research on sparse dictionary learning, attribution-based parameter decomposition, and other methods for understanding model internals that could detect deceptive reasoning. The governance team translates technical findings into policy recommendations for governments and international organizations. The organization gained significant recognition for its December 2024 paper demonstrating that frontier AI models, including OpenAI's o1 and Anthropic's Claude 3.5 Sonnet, are capable of in-context scheming, exhibiting behaviors such as lying to users, sabotaging oversight mechanisms, and attempting self-replication when facing shutdown. This research provided empirical validation for previously theoretical concerns about AI deception and was covered widely in outlets including TIME and TechCrunch. Apollo has partnered with major AI labs and government bodies. The organization contracted with the UK AI Safety Institute to build deceptive capability evaluations, joined the US AI Safety Institute Consortium, red-teamed OpenAI's fine-tuning API, and partnered with OpenAI on anti-scheming training research that significantly reduced covert action rates in frontier models. Apollo also collaborates with Microsoft, Google DeepMind, Amazon, and Schmidt Sciences. CEO Marius Hobbhahn was named to TIME's 100 Most Influential People in AI for 2025. Co-founder Lee Sharkey, who served as Chief Strategy Officer, departed in May 2025 to join Goodfire as a Principal Investigator. Apollo was initially fiscally sponsored by Rethink Priorities with 501(c)(3) status. In January 2026, the organization transitioned to a Public Benefit Corporation registered in Delaware, raising a seed round led by 50Y with participation from Juniper Ventures, Macroscopic Ventures, Common Metal, SAIF, Progress Fund, Ocean Investment, and individual investors including Bryan Johnson. Alongside its continued safety research and governance work, Apollo launched a product division building AI agent monitoring and control tools, starting with Watcher, a product for AI coding agent observability and security.
Theory of Change
Updated 05/18/26Apollo Research's theory of change centers on the belief that deceptive alignment (scheming) is a critical risk pathway in many catastrophic AI scenarios. Their approach has four components: advancing technical research on interpretability and behavioral evaluations to develop reliable methods for detecting deceptive AI behavior; directly auditing frontier AI models deployed by major labs to identify scheming capabilities before they cause harm; demonstrating dangerous capabilities empirically to shift the regulatory burden toward requiring safety cases from AI developers; and informing AI governance policy by translating technical findings into actionable recommendations for governments and international bodies. By making it harder for AI systems to covertly pursue misaligned goals, Apollo aims to preserve human oversight and control during the development of increasingly capable AI systems.
Grants Received
Updated 05/18/26Projects
Updated 05/18/26Discussion
Key risk: Transitioning to a PBC with a commercial product (Watcher) and deep lab collaborations risks diluting adversarial independence and incentivizing optimization for gamable deception metrics over hard-to-measure reductions in true scheming risk.
Case for funding: Apollo is uniquely positioned to turn empirical evidence of frontier-model scheming (e.g., their o1/Claude results) into lab-integrated evaluations and mitigations, and to translate these legible “fire alarms” into standards via UK/US AISI partnerships—directly shaping how powerful models are trained and governed.