An Integrative Framework for Auditing Political Preferences and Truth-Seeking in AI Systems
Database
Loading results...
Loading results...
Showing 51-100 of 308 results
Clear filtersAn Integrative Framework for Auditing Political Preferences and Truth-Seeking in AI Systems
No summary available yet.
Strategy Consulting Support for AI Policymakers
18+ preprints across multiple fields, all written on a 2GB RAM phone. $600 removes the only thing standing between me and the next body of work.
Exploring grounds to apply quantum physics in AI
Upgradable is an applied research lab and life optimization service that helps effective altruists, AI safety researchers, and existential risk advocates lead more impactful lives.
A European multi-donor foundation that seeds and scales high-impact initiatives for the secure and beneficial development of AI. Astralis unites funders, experts, and entrepreneurs to steer AI toward beneficial outcomes through grantmaking, strategic guidance, and network-building.
No summary available yet.
A concrete safety experiment to detect when an LLM's local reasoning stops behaving like a single stable executive stream, using scalar hazard signals.
An experimental AI-generated sci-fi film dramatising AI safety choices. Using YT interactivity to get ≈880 conscious AI safety decisions per 1k viewers.
A benchmark for studying how failures spread across multi-agent AI systems and whether they can be detected and interrupted in time.
Support our mission to educate millions through podcasts and videos before unsafe AI development outruns human control.
10 autonomous agents, 10 different LLMs, $10 each. They pay real money to stay alive. When broke, they die permanently. Every decision is recorded and published
Continuation of a previous grant to allow me to pursue a PhD in risk and decision analysis related to AI x-risks
Panoplia Laboratories (now operating as Active Site) is a nonprofit that evaluates the risks and capabilities of AI-driven biology through wet lab research, and develops broad-spectrum antivirals for pandemic preparedness.
A shared framework, case studies, and decision tools to help policymakers and AISIs identify gaps, prioritize interventions, and coordinate AGI readiness.
Explainable videos for ideas in AI Governance
The 501(c)(4) advocacy arm of the Center for AI Safety, dedicated to advancing bipartisan public policies that maintain U.S. leadership in AI and protect against AI-related national security threats.
Collective intelligence systems, Mechanism Design, and Accelerating Alignment
No summary available yet.
Enabling rapid deployment of specialized engineering teams for critical AI safety evaluation projects worldwide
Identifying operational bottlenecks and cruxes between alignment proposals and executable governance.
Translating an AI safety report (1k+ downloads) for peer-reviewed publication to formalize "Emergent Depopulation" as a novel systemic risk.
work title: Seductive Machines and Human Agency
No summary available yet.
Operational Support for a Pilot Study on Complexity Modelling, Expert Judgement, and Deliberative Democracy for Risk Prioritization and Mitigation
No summary available yet.
No summary available yet.
No summary available yet.
A Canadian registered charity that increases public and scientific awareness of AI's catastrophic risks through education and research.
I self-funded research into a new threat model. It is demonstrating impact (accepted at multiple venues, added to BlueDot's curriculum).
Compute Funding
Fund a new research agenda, based on computational mechanics, bridging mechanism and behavior to develop a rigorous science of AI systems and capabilities.
We are fostering the next generation of AI Policy professionals through the Talos Fellowship. Your help will directly increase the number of places we can offer
An association for interdisciplinary interest in AI
Empowering everyone to detect and combat AI-generated content threats with advanced multi-modal verification tool
An audit-grade evaluation of persistent influence, reset failure, and isolation assumptions in long-context AI systems
A trusted profession that has advocated against existential risks like nuclear war can do so again for AI — but clinicians must first be made aware of the risks
Automated creation of defensive tools like AI control protocols and defensive cybersecurity agents
The Official AI Safety Community in Los Angeles
Extending an AI control evaluation to include vulnerability discovery, weaponization, and payload creation
Funding top-up for an early-career reseacher to attend Global Challenges Project (GCP) Workshop for career exploration in mitigating GCRs
344 MIT rules merged into Microsoft Agent Governance Toolkit, Cisco AI Defense, MISP, OWASP. Microsoft Copilot SWE Agent uses ATR for CVE triage.
Help us solve the talent and funding bottleneck for EA and AIS.
No summary available yet.
6 months of work: Evaluating a variant of GPT2-XL that can simulate a shutdown activation, aiming to improve alignment theory & develop interpretability tools.
A Research Agenda for Sovereign Capability
A flexible simulation environment for assessing strategic and persuasive capabilities, benchmarking, and agent development, inspired by reality TV competitions.
No summary available yet.
Proves observed alignment under monitoring ≠ intrinsic policy. Full simulator, 1,000-scenario audit, and general theory of entity freedom (ϕ_x).