Personal blog of Victoria Krakovna, Senior Research Scientist at Google DeepMind and co-founder of the Future of Life Institute, covering AI alignment research and related topics.
- Team
- 1
- Led by
- ?
Loading results...
Showing 251-300 of 3167 results
Clear filtersPersonal blog of Victoria Krakovna, Senior Research Scientist at Google DeepMind and co-founder of the Future of Life Institute, covering AI alignment research and related topics.
Showing 251-300 of 3167 results
Active filters: Type: Org, Individual
Clear filters to view everything →No summary available yet.
No summary available yet.
No summary available yet.
Matthew Kenney is an AI researcher and the founder of Algorithmic Research Group, where he leads work on benchmarks, environments, and multi-agent systems aimed at understanding recursive self-improvement in AI. Before starting ARG he worked in machine learning roles in academia at Duke University and in industry at Apple and Alethea, focusing on applied machine learning and AI research.
Co-founder of the Buddhism & AI Initiative, previously COO of AI safety company Conjecture; he has been studying Buddhism since 2016 and has spent over four years living in monasteries between London, Canada, and India.
Lauren Lee is a former researcher and instructor at the Center for Applied Rationality (CFAR), where she worked for approximately two years before leaving in Fall 2018. At CFAR she contributed to EA impact metrics collection alongside Dan Keys and helped establish a community dispute resolution council. After leaving CFAR, she received a $20,000 grant (funded by a private donor following LTFF recommendations) to work on preventing burnout and boosting productivity within the EA and x-risk communities. Her proposed work included one-on-one coaching sessions with individuals and organizations in the x-risk community to help them clarify goals and build mental models, as well as developing dependability training, writing, talks, and workshops on topics such as burnout, intentions, and dependability.
Stop AI is a grassroots activist organization that uses non-violent civil disobedience and public advocacy to demand a permanent, enforceable global ban on the further development of frontier AI technology.
CFG is an independent think-and-do tank based in Brussels that helps policymakers anticipate and govern powerful emerging technologies including advanced AI, biotechnology, climate interventions, and neurotechnology.
No summary available yet.
Inda Harahap serves as Outreach & Communications Lead for the AI Whistleblower Initiative (AIWI) within Whistleblower-Netzwerk e.V.
Dan Valentine is a Member of Technical Staff at Anthropic, where he works on AI safety and alignment research with a focus on scalable oversight. He previously worked as a full-stack software engineer in Toronto, Canada, before transitioning into technical alignment research with support from the Long-Term Future Fund. He participated in MATS Summer 2023 (cohort 4.0) under the mentorship of Ethan Perez. During and after MATS, he contributed to research on debate as a scalable oversight method, co-authoring "Debating with More Persuasive LLMs Leads to More Truthful Answers" (ICML 2024 Best Paper/Oral), which demonstrated that LLM debate helps both non-expert models and humans answer difficult questions more accurately. He is also a co-author on "Failures to Find Transferable Image Jailbreaks Between Vision-Language Models" (ICLR 2025), and contributed to earlier work on mesa-optimization using toy models (AISC8, 2023). Prior to focusing on AI safety, he studied at Dublin City University (2009-2013) and was involved in organizing the Toronto AI Safety community.
PhD student in the Algorithmic Alignment Group at MIT advised by Dylan Hadfield-Menell, studying AI agent alignment to human preferences in open-ended settings, human–AI teaming, and decision making across NLP, vision, and robotics.
No summary available yet.
No summary available yet.
Zershaaneh Qureshi is a podcast host and researcher associated with 80,000 Hours, where she co‑hosts episodes of The 80,000 Hours Podcast and authors problem profiles and articles on AI risk, including work on risks from power‑seeking AI systems and AI‑enhanced societal decision making. She holds a master’s degree in mathematics and philosophy from the University of Oxford, and previously worked as a researcher providing market intelligence to the global water industry before shifting her focus to AI safety and strategy.
No summary available yet.
No summary available yet.
Research manager at ACS helping to set up and run ACS Research, with a Master’s degree in theoretical physics from Charles University and previous experience as executive director of an NGO running international physics competitions.
Secretary of the Existential Risk Observatory with a degree in Classics, working as an editor at a publishing house and a literary magazine, and active as a translator and occasional writer.
Robert Kirk is a Research Scientist at the UK AI Security Institute (AISI), where he is the acting lead of the alignment red-teaming sub-team, focusing on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. He completed his PhD at UCL's DARK Lab in January 2025, supervised by Tim Rocktäschel and Edward Grefenstette, with his dissertation on generalisation in LLM fine-tuning and reinforcement learning agents. Prior to his PhD he received an integrated Master's in Mathematics and Computer Science from Somerville College, Oxford, and worked as a software and infrastructure engineer at Smarkets. His research centers on generalisation in reinforcement learning, out-of-distribution robustness, AI safety and alignment, and evaluating the effects of RLHF on language model behaviour and diversity. He received a Long-Term Future Fund grant to perform human evaluations for evaluating different machine learning methods for aligning language models, and his paper "Understanding the Effects of RLHF on LLM Generalisation and Diversity" (2023) is widely cited in the alignment research community. He also contributes to the Alignment Newsletter covering interpretability and reinforcement learning, and serves as a mentor in the MATS program for the UKAISI red-teaming stream.
James is the Director of Community and Partnerships at Giving What We Can, where he helps support and grow the organisation’s network of givers. An experienced aid worker, he has led international development and humanitarian teams and from 2019 to 2024 lived and worked across Somalia, Kenya and Rwanda. He holds a bachelor’s degree in Philosophy, Politics and Economics from the University of Oxford.
No summary available yet.
INESIA is France's national institute for AI evaluation and security, a government coordination structure that federates ANSSI, Inria, LNE, and PEReN to evaluate AI systems, analyze systemic risks, and support AI regulation.
No summary available yet.
A research project that uses game theory and computational modeling to reduce catastrophic risks from competition in the development of transformative AI.
Artyom (Artem) Karpov is an independent AI safety researcher and ML engineer based in Istanbul, Turkey. He holds a degree in Applied Mathematics and has over 15 years of software engineering experience, having previously built real-time emergency response systems and contributed to .NET Core. He transitioned to AI safety research in 2022 after becoming interested in the field through the 80,000 Hours career guide, and has since completed the MATS, ARENA, MLSS, and Apart Fellowship programs. He participated in AI Safety Camp (2023), where he worked on the project "Inducing Human-Like Biases in Moral Reasoning Language Models," which resulted in a paper accepted at a NeurIPS workshop. His subsequent research has focused on LLM steganography and encoded reasoning in chain-of-thought, with papers accepted at AAAI, ICLR, and NeurIPS workshops. He has received early-career funding from Open Philanthropy (via Good Ventures Foundation) and has contributed evaluations to the UK AI Security Institute.
No summary available yet.
AI Prospects is a Substack publication by K. Eric Drexler exploring how advanced AI will transform society and what strategic options humanity has for navigating this transition safely.
Constellation is a nonprofit research center in Berkeley that supports AI safety work through fellowships, an incubator, and a collaborative coworking space hosting researchers and organizations across the field.
A nonprofit dedicated to ensuring that today's most consequential technologies, including AI and social media, actually serve humanity by exposing misaligned incentives and advocating for systemic change through policy, litigation, and public awareness.
Substack newsletter by Helen Toner (Interim Executive Director at Georgetown's Center for Security and Emerging Technology and former OpenAI board member) offering analysis on navigating the transition to a world with extremely advanced AI systems.
No summary available yet.
No summary available yet.
Alexander (Sasha) Bystritsky, M.D., Ph.D., is a psychiatrist and neuroscientist who serves as President of the Institute for Advanced Consciousness Studies. He is Professor Emeritus of Psychiatry and Biobehavioral Sciences at the David Geffen School of Medicine at UCLA and is widely known for his work on anxiety disorders, focused ultrasound, and neuromodulation-based treatments.
Paul Saffo is a Silicon Valley-based forecaster who studies the dynamics of large-scale, long-term technological change. He teaches forecasting as an Adjunct Professor in Stanford University’s School of Engineering, chairs the Future Studies track at Singularity University, and is a non-resident Senior Fellow at the Atlantic Council and a Fellow of the Royal Swedish Academy of Engineering Sciences.
CASA is a research organization working to ensure the benefits of AI can be widely and equitably distributed globally without compromising essential security, with a focus on Global Majority countries.
Tristan Harris is a technology ethicist and co‑founder of the Center for Humane Technology, a nonprofit whose mission is to align technology with humanity’s best interests. A former Google design ethicist, he now focuses on how major platforms and AI systems shape society, co‑hosts the podcast Your Undivided Attention, and was a prominent voice in the Netflix documentary The Social Dilemma.
The Compendium is a living document and website that presents a comprehensive, accessible argument for why artificial general intelligence poses an extinction risk to humanity and what can be done about it.
No summary available yet.
Kurt Brown is a software developer who received a grant from the Long-Term Future Fund in October 2023 to build a cryptographic tool enabling anonymous whistleblowers to prove their credentials. The grant, valued at $15,000, funded approximately four weeks of development time. The project addresses the challenge of allowing whistleblowers to demonstrate their insider status or professional credentials without revealing their identity, a problem relevant to accountability and transparency efforts in AI labs and other organizations. No further public profile information was found that could be reliably attributed to this individual.
An annual 4-day academic summer school held in Prague focused on teaching AI alignment research frameworks to PhD students, ML researchers, and advanced students.
Yoav Tzfati is an AI safety researcher and software engineer based in Berkeley, California. He is a MATS 5.0 alumnus who worked on scalable oversight research, specifically on experimental methodology for evaluating AI alignment techniques including Consultancy and Critiques in synthetic settings, mentored by Julian Michael of the NYU Alignment Research Group. He subsequently joined the Security Level 5 (SL5) Task Force at the Institute for Security and Technology as a Member of Technical Staff, focusing on supply chain and machine security, and contributes to developing the SL5 standard for securing AI data centers. He is also a mentor for SPAR Spring 2026 projects related to AI security and safety infrastructure. Prior to his AI safety work, he drove engineering for attack surface discovery automation at CyCognito and served as Tech Lead at Arbor Trading Bootcamp. He has spoken at the Berlin AI Safety Meetup on his scalable oversight research and has also developed educational programs teaching non-programmers to build full-stack applications using AI tools.
Lukas Berglund is an AI safety researcher currently serving as Technical Staff at the U.S. Center for AI Standards and Innovation (CAISI) at NIST. He is best known as the lead author of "The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'," published at ICLR 2024, which demonstrated a fundamental generalization failure in autoregressive large language models. He also co-authored "Taken out of context: On measuring situational awareness in LLMs," an influential paper exploring how models recognize whether they are in training or deployment. His research was conducted in part as a MATS Fellow through the SERI MATS program, with support from Open Philanthropy. He has an undergraduate background from Vanderbilt University and his work spans AI evaluation, AI security, and empirical research on the capabilities and failure modes of frontier AI systems.
Meghna Mann is President and Chief Operating Officer at Constellation Institute, overseeing programs and operations that strengthen AI safety talent pipelines and support the launch and growth of mission-aligned organizations. Previously, she held senior leadership roles at MetaMap—including serving as COO and later CEO of the identity-verification company—after earlier positions at BlackRock and the Brookings Institution, and she advises high-growth technology ventures through the Endeavor Global network.
No summary available yet.
Research Scientist at the UK AI Security Institute whose work focuses on bridging immediate AI harms and longer-term catastrophic risks in AI safety.
An independent research project focused on proving formal impossibility results in AI alignment using theoretical computer science methods, led by Alexander Bistagne as a Ronin Institute Fellow.
No summary available yet.
No summary available yet.