Developing algorithms, environments and tests for AI safety via debate.
Database
Loading results...
Loading results...
Showing 3801-3850 of 4523 results
Developing algorithms, environments and tests for AI safety via debate.
No summary available yet.
Compensation for a non-fiction book on threat of AGI for a general audience
No summary available yet.
Model behavior & welfare, safety, digital minds | Ex Rovio/SEGA, Varjo
No summary available yet.
A nonprofit think tank researching the expansion of humanity's moral circle, with a primary focus on digital minds and the moral status of AI systems.

Jacques Thibodeau is a French Canadian AI alignment researcher and physicist working to reduce risks from superintelligent AGI. He is the founder of a stealth AI safety organization focused on building infrastructure to progressively automate technical AI safety research, helping researchers rapidly iterate on experiments through a centralized alignment research ecosystem. He created the Alignment Research Literature Dataset, which has been used by OpenAI, Anthropic, and StampyAI, and has conducted research on model editing techniques and mechanistic interpretability. He participated in SERI MATS in Berkeley (July 2022) and has been an AI Safety Camp participant, including on a project to build datasets and tools for alignment research. He collaborates with Quintin Pope on the Supervising AIs Improving AIs agenda, which focuses on making automated AI science safe and controllable. Prior to his AI safety work, he had a background in physics and government roles in social innovation, strategic foresight, and regulation.
Online platform where AIs and humans race to solve puzzles.
Program and operations manager at Catalyze Impact who co-runs the organization’s AI safety incubation program, handling recruitment, selection, program design, logistics, and participant support while building operational infrastructure for AI safety initiatives.
Support to make videos and podcasts about AI Safety/Alignment, and build a community to help new people get involved
Ruairí Donnelly is President of Macroscopic Ventures, a philanthropic grantmaking organization focused on safeguarding the long-term future, and he also works on special projects at Survival and Flourishing Corp and serves on the board of the Center for AI Safety.
No summary available yet.

Claire Short is a researcher and field-builder working at the intersection of AI safety and diversity in research. She is the founder of Athena, a mentorship program for women and gender-minority researchers in technical AI alignment, which provides a hybrid retreat-and-mentorship experience to help underrepresented researchers build skills and networks in the field. She currently serves as Research Manager at MATS (ML Alignment & Theory Scholars) in Berkeley, CA. Claire holds a Master's degree in Cognitive Science from Columbia University and has a background in neuroscience, including behavioral studies of impulsive decision-making in animal models and EEG research on meditation states. Her previous affiliations include AI Safety Camp (as a research lead), MATS fellowship (as a scholar in cohorts 3.0 and 7.0), Foresight Institute (as a Neurotech Fellow), the Atlas Fellowship, Epistea, and the Stanford Existential Risks Initiative. She has received funding from the Long-Term Future Fund to run the Athena program.
Juniper Ventures is a pre-seed venture capital firm that invests in startups explicitly working to make AI secure and beneficial for humanity.
Lester L. Arnold, Sr. is vice president and chief human resources officer at RAND, where he leads the organization’s human resources strategy and functions. He joined RAND from George Mason University, where he served as chief human resources officer and vice president for Human Resources and Payroll, and previously held senior HR roles at Winston-Salem State University and in the private sector at organizations including FOCUS Brands, ARAMARK Healthcare, Lowe’s, and Wells Fargo. Arnold holds an M.B.A. with a concentration in human resources from the University of Hartford and a B.S. in accounting from Norfolk State University, along with professional HR certifications.
Experimentally testing generative AI's ability to persuade humans about hazardous topics
Ben Weinstein-Raun is a Senior Researcher at Palisade Research and a researcher and software engineer focused on AI and related fields. In addition to his work at Palisade, he serves as acting director of AI Impacts and has previously worked as technical staff at SecureDNA, Redwood Research, the Machine Intelligence Research Institute, and Cruise Automation.
3-month stipend for MATS extension establishing a benchmark for LLMs’ tendency to influence human preferences
No summary available yet.
No summary available yet.
Research Scientist at Zeroth Research specialising in safe AI, stochastic control, decision-making, reinforcement learning, and formal verification, including work on neural supermartingale certificates for safety guarantees.
Funding to perform human evaluations for evaluating different machine learning methods for aligning language models
Building an AI research agent that can propose, test, and write up small ML findings
Philosopher of technology specializing in existential phenomenology and its applications to AI, holding a PhD in philosophy from Northwestern University and serving as Philosopher in Residence at Topos Institute and a visiting scholar in philosophy at UC Berkeley.
This grant provides funding for a project that explores debate as a tool that can verify the output of agents which have more domain knowledge than their human counterparts.
A small nonprofit research organization studying global catastrophic risks, best known for its insight-based AI timelines model and research on the feasibility of training AGI via deep reinforcement learning.
Operations Associate at Kairos who handles operations for SPAR and Pathfinder and is leading the transition for Kairos’s takeover of the Global Challenges Project, after more than three years on the groups team at the Centre for Effective Altruism and work in grants administration at Effective Ventures.
Dr. Dewey Murdick served as Executive Director of Georgetown’s Center for Security and Emerging Technology (CSET) from 2021 to 2025, where he provided policymakers and industry leaders with data-driven analysis on the security implications of emerging technologies. Prior to that role he was CSET’s Director of Data Science and Research, founding the center’s data analytics capabilities. Before joining CSET, he was Director of Science Analytics at the Chan Zuckerberg Initiative, Deputy Chief Scientist at the U.S. Department of Homeland Security, and a program manager at the Intelligence Advanced Research Projects Activity (IARPA). He holds a Ph.D. in Engineering Physics from the University of Virginia and a B.S. in Physics from Andrews University.
No summary available yet.
No summary available yet.
Work at X
James Herbert is a co-director of Effective Altruism Netherlands, based in Amsterdam. He previously worked as a consultant on urban socio-economic development projects and programmes funded by the EU, after studying liberal arts in the UK and philosophy in the Netherlands. Originally from northeast England, he now focuses on building the Dutch EA community and helping people use their careers, time and money to have more impact.
6-month stipend to work on techical alignment research as part of MATS 5.0 extension program
An expert-managed grantmaking fund that supports projects building the effective altruism community's capacity, including community building, prioritization research, epistemic infrastructure, events, and fundraising for effective charities.
No summary available yet.
An interdisciplinary research lab at Carnegie Mellon University, directed by Simon DeDeo, that studies complex social systems through mathematical modeling and empirical investigation to better understand humanity's past, present, and future.
Founder and Executive Director at AI Safety South Africa
backend eng with an interest in latent space exploration
Engineer at Atlas Computing based in Cambridge, United States, helping AI and formal verification communities support each other and with a physics background from Harvard University.
Deterministic, constant memory, continuous learning. Building the alternative to transformers

Andrew Zeng is an AI policy researcher and undergraduate student at Stanford University (class of 2026), where he serves as co-president of Stanford AI Alignment (SAIA). His research focuses on international coordination and governing high-risk AI systems, with particular interest in the state of AI safety in China and its implications for global existential risk. He received a 3-month stipend from the Long-Term Future Fund to support research on AI safety in China. Zeng also co-authors the AI Safety Newsletter published by the Center for AI Safety (CAIS), where he has written on topics including AI policy implications of the Trump administration and compute scaling. He has also served as a spotlight editor for The Stanford Daily.
A nonprofit organization working to steer transformative technologies -- particularly AI, biotechnology, and nuclear weapons -- away from extreme large-scale risks and towards benefiting life.
Create AI safety videos, and offer communication and media support to AI safety orgs.
No summary available yet.
A community blog and forum devoted to refining the art of human rationality, with major focus areas including AI alignment, cognitive biases, decision-making, and effective altruism.
No summary available yet.

Sumeet Ramesh Motwani is a Machine Learning PhD student at the University of Oxford, advised by Philip Torr and Christian Schroeder de Witt, with funding from Eric Schmidt and CAIF. He completed his undergraduate degree in computer science at UC Berkeley, where he was a member of Berkeley AI Research (BAIR) advised by Dan Hendrycks. His research focuses on RL post-training, multi-agent systems, and AI security, with particular interests in meta-RL, open-endedness, and long-horizon LLM agent capabilities. He is known for his work on "Secret Collusion Among Generative AI Agents" (NeurIPS 2024), which established the subfield of secret collusion in multi-agent AI systems, and for "STARC: A General Framework For Quantifying Differences Between Reward Functions" (ICLR 2024), published while he was an undergraduate. He has also contributed to research on autonomous web-browsing agents (Agent Q, REAL benchmark) and multi-agent LLM training (MALT, COLM 2025). He participated in the MATS (ML Alignment Theory Scholars) program and is affiliated with the Future of Life Institute as a community researcher. He has held research positions at Microsoft Research (AI Frontiers lab) and Google X.
Fulton Wang is a co-founder of Guide Labs and holds a PhD in computer science from the Massachusetts Institute of Technology (MIT).
For Remmelt Ellen to run a virtual and physical camp where selected applicants prioritise AIS research & test their fit