Formalizing the side effect avoidance problem research
Database
Loading results...
Loading results...
Showing 3501-3550 of 4522 results
Formalizing the side effect avoidance problem research
Cryptographic attestation, runtime conscience, and an unfilterable kill switch. Live on the App Store and Google Play in 29 languages. AGPL, mission-locked
Joshua Landes works on helping AI go well at BlueDot Impact, where he focuses on AI safety education and community building, drawing on prior experience in philosophy, political campaigns, and AI governance and regularly organizing AI safety events and meetups.
No summary available yet.
I'm a 50-year-old independent researcher working without institutional backing, funding, or specialized hardware. Everything I've built has been developed on a single Android phone over nine months — by necessity, not choice. That constraint turned out to be clarifying. If it works on a phone, it works anywhere. If the safety mechanics hold under resource limitations, they'll hold under pressure.
A medical doctor and researcher specializing in the integration of AI in clinical medicine, with a focus on large language models, natural language processing, and deep learning.
6-month salary for part-time independent research on LM interpretability for AI alignment
Founder of Holtman Systems Research; systems architect and independent AI safety researcher with a PhD in Software Design from Eindhoven University of Technology for computer science research conducted at CERN, and with 20 years of industrial R&D experience and 10 years of experience in standards creation.
Angel investor and conservationist based in the Netherlands, CEO and founder of Symbiotic Projects, investing in natural capital via nature-based solutions that restore degraded land, conserve ecosystems and support biodiversity.
6-month salary to work with Dan Hendrycks on research projects relevant to AI alignment
Lev Heller is the Operations Manager at the Effective Institutions Project, responsible for improving and maintaining organizational systems, managing compliance, and providing operational support to strategic initiatives. His background spans operational quality management at CAF America, strategy work for healthtech startups in emerging markets, biomedical research at the NIH, and field-based emergency medicine across multiple continents.
Scott Aaronson is the Schlumberger Centennial Chair of Computer Science at The University of Texas at Austin and director of its Quantum Information Center. He is a theoretical computer scientist whose research focuses on quantum computing and computational complexity theory, and from 2022 to 2024 he was on leave at OpenAI working on the theoretical foundations of AI safety. With support from Open Philanthropy, he is now building a research group at UT Austin on theoretical computer science for AI alignment.
Robert Kralisch is an independent conceptual and theoretical AI alignment researcher with a background in cognitive science. He became interested in AI safety in 2014 after reading Nick Bostrom's Superintelligence and later pursued both computer science and cognitive science before leaving formal academia to focus on independent alignment research. He completed the AI Safety Fundamentals course in 2021 and has since received funding from the Long-Term Future Fund for independent research. His work centers on three main areas: conceptual clarity around notions of agency, intelligence, and embodiment; the development of more inherently interpretable cognitive architectures (including his Prop-room and Stage Cognitive Architecture); and Simulator theory as an alternative framework for understanding large language models. He also serves as a research coordinator and organizer for AI Safety Camp, where he evaluates and supports conceptually sound alignment research projects.
Gabriel Recchia is a cognitive scientist and director of Modulo Research, where he works on the evaluation and alignment of large language models and the design of scalable oversight protocols. At Modulo he has led work releasing datasets of expert-annotated valid and invalid long-form solutions for use in scalable oversight experiments and related studies on how humans and models evaluate complex answers. Previously, he led user-testing research and evaluation for patient-friendly genetic reports and the Predict: Breast Cancer prognostic tool at the University of Cambridge’s Winton Centre for Risk and Evidence Communication, and has co-authored widely cited research on risk perception and communication.
Persuading a critical mass of key potential influencers of Trump's AI policy to champion a bold, timely and proper US-China-led global AI treaty

David Reber is a PhD student in Computer Science at the University of Chicago, advised by Victor Veitch and Ari Holtzman. His research centers on precise causal inference over large language models, with a focus on post-hoc internal interpretability and validating human-understandable concepts within these systems. He is motivated by AI safety applications such as monitoring long-term planning and detecting deception, and is also interested in fairness and adversarial robustness. Earlier in his PhD he worked on empirical and theoretical extensions of Cohen and Hutter's pessimistic conservative reinforcement learning agent under the guidance of Michael Cohen. He received multiple grants from the Long-Term Future Fund, beginning in 2021, supporting his early RL safety research and his transition into the AI safety field. He is an active contributor to the AI Alignment Forum and LessWrong under the handle derber, and has published at venues including ICML.
No summary available yet.
No summary available yet.
No summary available yet.
No summary available yet.
Roman Soletskyi is a Ukrainian researcher based in Paris, France, currently working as a researcher at Mistral AI. He completed an MS in Physics at Ecole Normale Superieure – PSL (2022–2024) after undergraduate studies in physics at Moscow Institute of Physics and Technology (2020–2022), and won a gold medal at the International Physics Olympiad in 2017. His research spans AI safety and formal verification, machine learning theory, and applied deep learning. Notable work includes "Training Safe Neural Networks with Global SDP Bounds" (2024, co-authored with David Dalrymple of ARIA), which develops methods for training neural networks with formal safety guarantees using semidefinite programming, with applications to safe reinforcement learning policies; this work was supported by the Long-Term Future Fund and the Machine Learning Alignment Theory Scholars program. At Mistral AI he has contributed to projects including the Pixtral 12B multimodal model and research on variational inference theory.
No summary available yet.
9-month part-time salary for Magdalena Wache to self-study AI safety, test fit for theoretical research
Muslim theologian and scholar of Islamic law who has served as Imam of the Islamic Center of Virginia for many years, previously Imam of the Colorado Muslim Society in Denver, after spending 12 years studying in Saudi Arabia at Umm Al‑Qura University in Mecca and the Graduate Institute for the Preparation of Imams in Mecca.
No summary available yet.
A nonprofit research organization that builds open-source tools and conducts research on forecasting, epistemics, and uncertainty quantification to improve decision-making for the long-term future of humanity.

Cody Rushing is a Member of Technical Staff at Redwood Research, where he works on AI control, model organisms, scheming and deception, and AI security. He completed his Bachelor's in Computer Science from UT Austin in Fall 2024 and attended the MATS Summer 2024 program under mentor Buck Shlegeris, later receiving a stipend extension to continue his AI Control research. He now also serves as a MATS mentor for the Redwood Research stream. His prior research includes mechanistic interpretability work under Neel Nanda and value alignment research with Brad Knox. He is a co-author of several papers including "Ctrl-Z: Controlling AI Agents via Resampling," "Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI," and "Basic Legibility Protocols Improve Trusted Monitoring." He is based in the San Francisco Bay Area.
Kiryl Shantyka is Executive Manager at Effective Altruism Sweden and works on operations and community support, including speaking on operations as a high‑impact career for the effective altruism movement.
Co-Director of the AI Safety Initiative at Georgia Tech, overseeing the group’s funding and strategy and conducting technical AI safety research on representation engineering, applied interpretability, and agentic misalignment.
Robbie McCorkell is the CTO at Leap Laboratories (Leap Labs), as indicated on his GitHub profile, and works on the company’s machine learning and interpretability tooling.
Garrison Lovely is a Brooklyn-based freelance journalist covering the intersection of money, power and artificial intelligence. He publishes Obsolete, an independent Substack newsletter on the economics and geopolitics of the AI industry, and is the author of the forthcoming book Obsolete: The AI Industry’s Trillion Dollar Race to Replace You—and How to Stop It (Nation Books/OR Books, 2026). His reporting on AI and corporate power has appeared in outlets including The New York Times, Nature, Bloomberg, The Verge, TIME, The Guardian US, SF Standard, The Nation, The American Prospect, Jacobin, BBC Future, Vox and Le Monde Diplomatique, and has been referenced by publications such as The New Yorker, The New York Times, The Atlantic and ProPublica. Lovely previously served as a Reporter in Residence at the Omidyar Network and earlier worked at GiveDirectly and as a product manager at Enigma Technologies.
Director and research lead at Truthful AI, an AI safety research non-profit based in Berkeley; affiliate of CHAI at UC Berkeley; previously based at the University of Oxford’s Future of Humanity Institute; holds a PhD from MIT and serves on the board of Ought.
Susie Alegre is an international human rights lawyer and CIGI senior fellow whose work focuses on technology, human rights and the emerging right to freedom of thought in the digital age; she is the author of "Freedom to Think" and "Human Rights, Robot Wrongs: Being Human in the Age of AI."
School Psychologist. AI Enthusiast
Leaf runs online fellowships for exceptional teenagers (ages 15-19) to explore how they can have the most positive impact, including through a flagship course on AI safety called Dilemmas and Dangers in AI.
3-month salary + compute expenses to study and publish on shutdown evasion in LLMs and to use LLMs as tools for alignment
No summary available yet.
Funding to do research on understanding search in transformers at the AI safety camp during 14 weeks
Monica Valcourt is a Bay Area engineer currently working on an FPGA hardware project. She previously founded and ran a small MEV trading firm, has hobby interests in GPUs, semiconductor manufacturing, and other low-level systems, and has completed internships at Uber ATG and Amazon. She graduated from MIT in 2022 with a degree in Computer Science.
12-month salary to study and get into AI Safety Research and work on related EA projects
1-month full-time + 3 months part-time salary to work on two research projects during the MATS 5.0 extension program
No summary available yet.
A Washington, D.C.-based 501(c)(3) nonprofit that educates the public, policymakers, and media about the risks of advanced AI and advocates for bipartisan safeguards before AGI arrives.
No summary available yet.
Support to translate BlueDot Impact’s AI alignment curriculum into (Br) Portuguese to be used in university study groups and an online course
No summary available yet.
No summary available yet.
No summary available yet.