181,448 evaluations proving no production AI model reliably maintains corrections. Expanding coverage and pursuing multi pass validation.
Database
Loading results...
Loading results...
Showing 1901-1950 of 3952 results
Clear filters181,448 evaluations proving no production AI model reliably maintains corrections. Expanding coverage and pursuing multi pass validation.
Showing 1901-1950 of 3952 results
Active filters: Type: Individual, Project
Clear filters to view everything →Tony Blair is the Executive Chairman and founder of the Tony Blair Institute for Global Change and served as Prime Minister of the United Kingdom from 1997 to 2007, winning three consecutive general elections as Labour leader. In his current role he works with political leaders worldwide on strategy, policy and delivery, with technology as a central enabler of reform.
Director of U.S. Policy at the Institute for Law & AI whose research focuses on administrative law, agency decision‑making, and liability. He previously clerked on the U.S. Court of Appeals for the Third Circuit, worked in public‑health law at a New York nonprofit, completed a Fulbright grant in Ourense, Spain, and graduated cum laude from Harvard Law School.
Travel support to attend the Symposium on AGI Safety in Oxford in May
Trooper Sanders is an expert in AI, social policy, and financial health who founded the advisory practice Predawn.ai and serves as president of the State AI Safety Roundtable, a nonprofit supporting state-level AI safety efforts. He previously was CEO of Benefits Data Trust and has held senior roles in the White House, philanthropy, and the nonprofit sector.
Pablo is an Ambassador at Giving What We Can and co‑founder and chairman of Regalador.com. He has extensive experience in the nonprofit and consultancy sectors and previously served as president of Effective Altruism Spain.

Felix Hofstätter is a Research Scientist on the evaluations team at Apollo Research, an AI safety organization based in London. He was previously a MATS Fellow (MATS 5.0 program), where he conducted research on AI alignment with a focus on how AI systems can strategically underperform on capability evaluations, a phenomenon known as sandbagging. He is best known for co-authoring the paper "AI Sandbagging: Language Models can Strategically Underperform on Evaluations," which demonstrated that frontier models like GPT-4 and Claude can be prompted or fine-tuned to selectively hide capabilities during assessments, undermining the trustworthiness of AI safety evaluations. Prior to his research career, he worked as a Software Consultant at TNG Technology Consulting and studied at Imperial College London. He writes about AI alignment topics on Medium and the Alignment Forum, aiming to make technical alignment research accessible to ML practitioners.
Creating a contest for Robust, Detailed Proposals and Redteaming of AI Safety Plans: Fast Action for Safe Transformative AI
No summary available yet.
No summary available yet.
Amon Elders is a PhD student in machine learning at the University of Oxford, supervised by Prof. Michael Osborne at the Computational and Biological Learning Lab. He received a $250,000 grant from the Long-Term Future Fund in May 2021 to support his PhD, which focuses on AI safety topics including robustness and distributional shift. He holds a Master's degree in Computer Science and Machine Learning (with distinction) from University College London (UCL). Prior to his PhD, he worked as an ML engineer at Spark Wave and completed a year-long research position at the Italian Institute of Technology in Genoa, which resulted in a co-authored publication at AIES 2019 on multitask learning for fair classification. During his undergraduate studies at the University of Amsterdam, he co-founded the EA society, and he has also served as a Summer Research Fellow at the Stanford Existential Risks Initiative (SERI).
No summary available yet.
No summary available yet.
Randima (Randy) Fernando is a co‑founder and former executive director of the Center for Humane Technology, where he has helped shape global understanding of extractive technology and humane alternatives. Working at the intersection of technology, mindfulness, and social impact, he previously led award‑winning graphics projects and authored several books at NVIDIA, and later served as executive director of Mindful Schools.
Sharon Hammond is the Chief Operating Officer on the Society Library’s executive/core team, helping lead the organization’s operations.
No summary available yet.
No summary available yet.
LLM persona researcher in the Alignment of Complex Systems group, creator of the open-source representation-engineering library repeng, 2024 New Science fellow, and AI researcher with a background in computational linguistics.
No summary available yet.

Benedikt Höltgen (Ben) is a researcher with interdisciplinary training in mathematics, philosophy, and computer science, currently affiliated with the Hasso Plattner Institute's Data & AI cluster in Potsdam, Germany. He completed an MSc in Mathematical Philosophy at the Munich Center for Mathematical Philosophy (MCMP) and an MSc in Computer Science at the University of Oxford with a focus on machine learning, before starting a PhD at the University of Tübingen in 2022 under Bob Williamson as part of the ELLIS PhD program, co-supervised by Nuria Oliver at the University of Alicante. His research focuses on the mathematical assumptions and technical modeling choices underlying AI systems and their societal implications, including probability interpretation, individual-group dynamics, and algorithmic fairness. Earlier in his career, following advice from 80,000 Hours, he transitioned from philosophy to ML research and worked with the OATML group at Oxford (Yarin Gal's lab) alongside Sören Mindermann and Jan Brauner, contributing to the RHO-Loss paper on prioritized training published at ICML 2022. He received a Long-Term Future Fund grant in December 2021 to support 10 months of research on AI safety and alignment, with a focus on scaling laws and interpretability, during this Oxford period.
No summary available yet.
Milo McBride is a fellow in the Sustainability, Climate, and Geopolitics Program at the Carnegie Endowment for International Peace in Washington, DC, researching the geopolitics of energy‑transition technologies, critical minerals, and next‑generation innovations that can accelerate global decarbonization.
Inaugural Program Manager at Brown University’s Center for Technological Responsibility, Reimagination, and Redesign (CNTR) and CNTR AISLE Product Director, leading short- and long-term strategic planning and product development for CNTR’s projects, with a background in program management, digital innovation, and technology for social impact and a focus on integrating technological ethics and critical disability studies.
Abe Smith is a Silicon Valley enterprise software executive and Chief of Global Field Operations at Freshworks, leading worldwide field sales after prior leadership roles at Zoom, Oracle, Cisco/WebEx, and Cision.
Kyle Herndon is a software engineer on the Softmax team specializing in ML compilers and high-performance systems, with contributions to IREE, torch-mlir, and the ROCm ecosystem, and works on performance engineering and the multi-agent reinforcement learning training stack.
Executive Director of the Center for AI Safety Action Fund, leveraging over 15 years of policy and advocacy experience and previously serving as a Chief of Staff on Capitol Hill, where he helped shape the NIST Risk Management Framework and the CHIPS and Science Act.
Andrew Doris is a Senior Policy Analyst at the Secure AI Project, where he works on state‑level AI safety legislation, including bills in California, Michigan, Utah, and other states that address transparency, safety plans, and liability for frontier AI developers. Previously he has served as a Senior Policy and Research Analyst at FP Analytics and a National Security Fellow for Senator Bob Casey, and earlier was a U.S. Army logistics officer.
3-month stipend to support research on the state of AI safety in China and implications for AI existential risk
Kuhan Jeyapragasan is the co-founder and Executive Director of the Cambridge Boston Alignment Initiative, where he leads AI safety and governance programming for students and early-career researchers in Cambridge. He previously co-founded and ran the Stanford Existential Risks Initiative and has been active in effective altruism community-building and AI policy outreach.
Co-founder and former CEO of Probably Good, and co-founder and CTO of Pattern Labs, an AI security firm focused on protecting advanced technologies from theft and misuse; previously led Google research efforts to predict and detect wildfires.
No summary available yet.
5:31 PMClaude responded: Independent researcher building a new foundation model architecture from first principles.Independent researcher building a new foundation model architecture from first principles.
No summary available yet.
No summary available yet.
I plan to investigate what realistic RL training conditions might lead to LLMs developing steganographic capabilities.
Senior Researcher at Probably Good and PhD student in philosophy at the University of Edinburgh, working on global priorities research, ethics, and longtermism, and a recipient of a global priorities fellowship from the Forethought Foundation.
I love patterns and reasoning. Probably that's why I want to teach the same.
$10,120 for most of the ops expenses of the research phase (June-Sep 2024) of WhiteBox’s AI Interpretability Fellowship
Julia Zatariano is Hiring Manager at AE Studio, having previously worked there as a Research Associate, after earlier experience as a tax trainee at PwC and completing a bachelor’s degree in Accounting and Finance at Universidade Federal de Santa Catarina.
No summary available yet.
Theodore Chapman is an independent AI safety researcher focused on the nature and limits of capability elicitation in large language models. He holds degrees in data science and physics from the University of Rochester, where he also built machine learning pipelines for NASA satellite imagery analysis. He participated in the ML Alignment & Theory Scholars (MATS) Winter 2023-24 cohort under the supervision of Evan Hubinger, producing research on fine-tuning-based capability elicitation in GPT-3.5. His key finding was that the performance achieved by fine-tuning an LLM on a task using one prompt format does not reliably bound the performance achievable with a different prompt format, complicating safety evaluations that rely on fine-tuning to elicit hidden capabilities. He subsequently received a 6-month researcher stipend to continue this line of work, exploring how chat fine-tuning affects LLM capability elicitation, and has published related work on LessWrong and the Alignment Forum.
No summary available yet.
Operations specialist at AI Standards Lab and former Programme Associate at the Centre for Effective Altruism, with a background in operations, programme coordination, and community research and a BSc from Indiana University Bloomington.
Community liaison supporting the effective altruism community at the Centre for Effective Altruism; serves on the board of GiveWell, previously served as president of Giving What We Can, and formerly worked as a social worker after studying sociology at Bryn Mawr College.
I am a passionate ICT graduate with a diverse skill set in tech support, systems implementation, and network training, I have honed my expertise while working with Tharaka Nithi County. My experience includes web design and video editing, where I blend creativity with technical proficiency. An avid chess player, I thrive on strategic thinking and problem-solving. I’m dedicated to fostering innovation and excellence in every project I undertake, always eager to explore new technologies and methodologies.
Sam Bowman works on technical AI safety at Anthropic and is on long-term leave from New York University, where he is an Associate Professor of Data Science and Computer Science and previously led the NYU Alignment Research Group from 2022 to 2024.
No summary available yet.
No summary available yet.