Benchmark for agent safety when spending users money. How often do they violate user intent and rules?
Database
Loading results...
Loading results...
Showing 1201-1250 of 3954 results
Clear filtersBenchmark for agent safety when spending users money. How often do they violate user intent and rules?
Showing 1201-1250 of 3954 results
Active filters: Type: Individual, Project
Clear filters to view everything →H. Andrew Schwartz (1968–2025) was chief communications officer at the Center for Strategic and International Studies, where for roughly 20 years he directed CSIS’s media relations, digital strategy, events, publications, website, and other external engagement. A former Fox News producer and print journalist, he cohosted several CSIS podcasts—including The Truth of the Matter, The Trade Guys, The Impossible State, and The AI Policy Podcast—and coauthored Overload: Finding the Truth in Today’s Deluge of News with CSIS trustee Bob Schieffer.
Aysja Johnson is an AI safety researcher and policy analyst focused on AI lab scaling policies and responsible scaling frameworks. She holds a background in cognitive science, having completed undergraduate studies in Mathematics at UC Berkeley and graduate work in NYU's Computation and Cognition Lab under Todd Gureckis, where she studied human sense-making, open-ended reasoning, and human-machine intelligence. She was hired as a Research Analyst at AI Impacts in 2022, selected from over 250 applicants, contributing research on comparative cognition and technology adoption patterns relevant to AI risk. In 2023 she was a PIBBSS Summer Fellow, working on a project titled 'Towards a Science of Abstraction' exploring why natural abstractions are favored by agents and what this implies for AI alignment. She received a Long-Term Future Fund stipend for 1.5 years to conduct a thorough investigation and analysis of AI lab scaling policies, and has published critical analyses on LessWrong arguing that current responsible scaling policies lack rigor, fail to specify measurable evidence thresholds, and that behavioral evaluations alone are insufficient for safety assurance. She is active on LessWrong under the handle 'aysja' and has co-authored posts on AI lab governance topics including OpenAI's non-disparagement practices.
No summary available yet.
Surveying neuroscience for tools to analyze and understand neural networks and building a natural science of deep learning
No summary available yet.
No summary available yet.
No summary available yet.
The first AI safety evaluation benchmark for Nigerian indigenous livestock systems testing whether frontier models are safe to deploy in African food systems.

Funding to cover our expenses for 3 months during unexpected shortfall
Pauline Charazac is Head of Policy Engagement at CeSIA, where she leads international efforts to promote responsible and inclusive AI governance. Conference bios describe her as a senior public policy adviser with experience at institutions such as the OECD and the Bank of Mauritius, working at the intersection of AI ethics, global governance, and financial inclusion.
Yuxiao Li is an AI safety and mechanistic interpretability researcher currently based in Bilbao, Spain, where she is a postdoctoral researcher at the Basque Center for Applied Mathematics (BCAM). She holds a PhD in Computer and Information Sciences from Tsinghua University (2018-2023). Her research focuses on understanding the internal representations of large language models through techniques such as sparse autoencoders, variational inference, and geometric analysis of feature spaces. She was previously affiliated with MIT's Tegmark group and the Beneficial AI Foundation, where she was first author on "The Geometry of Concepts: Sparse Autoencoder Feature Structure" (arXiv 2024), a study of how concepts are geometrically organized in LLM activations. She has also participated in the ML Alignment & Theory Scholars (MATS) program and the Supervised Program for Alignment Research (SPAR), contributing multi-part research on structured priors and block-diagonal geometry in language model activations. She currently serves as a mentor in the Algoverse AI Safety Fellowship and has received independent research funding for inference-based AI interpretability work.
Catherine Régis is a full professor of law at Université de Montréal whose work spans health law, artificial intelligence, and digital innovation. She holds a Canada CIFAR AI Chair, is an associate academic member at Mila, serves as Director of Social Innovation and International Policy at IVADO, and is Co-Director of the Canadian AI Safety Institute Research Program at CIFAR.
No summary available yet.
No summary available yet.
This round of funding will be used primarily for prototype hardening, artifact packaging, runtime evaluation, and preparation for external review.
Advocating for U.S. federal AI safety legislation to reduce catastrophic AI risk.
Sviatoslav (Slava) Chalnev is an AI researcher based in Sydney, Australia, with a background in mechanistic interpretability and AI safety. He studied at The Australian National University and subsequently pursued independent interpretability research funded by two Long-Term Future Fund stipends totaling $75,000, focused on mechanistic interpretability methods and open-source tooling. He participated in the MATS 6.0 program under Arthur Conmy, resulting in the paper "Improving Steering Vectors by Targeting Sparse Autoencoder Features" (arXiv:2411.02193, 2024), which introduced SAE-Targeted Steering (SAE-TS), a method for constructing steering vectors that target specific sparse autoencoder features while minimizing unintended side effects. He also co-authored "A Single Direction of Truth" (arXiv:2507.23221, 2025), demonstrating that a linear probe on an observer model's residual stream can detect and causally steer contextual hallucinations in language models. More recently, Chalnev co-founded Integuide, an AI startup building tools to capture and disseminate expert technician knowledge, which was part of the Startmate Winter 2025 accelerator cohort.
12 month stipend and expenses to research in AI Safety (Unlearning; Modularity; Probing Long-term behaviour)
Tomislav Kurtovic (Tomislav Kurtović) is a researcher and Computer Vision PhD candidate at the Faculty of Electrical Engineering and Computing (FER), University of Zagreb, Croatia. He holds a university master's degree in computer engineering (univ. mag. ing. comp.) and works in the Department of Electronic Systems and Information Processing. At FER, he teaches laboratory exercises for Information Processing and Statistical Data Analysis at the undergraduate level, and Deep Learning 2 at the graduate level. In Q4 2022, he received a grant from the Long-Term Future Fund (LTFF) to skill up in machine learning and AI alignment, with the goal of developing a streamlined course in mathematics and AI for an alignment-focused audience.
Jenna Peters is Chief of Staff for the Career Services Team at 80,000 Hours. Before joining 80,000 Hours she worked as a project manager at the Centre for Effective Altruism and as a Post‑Baccalaureate Fellow at the Center for Global Women’s Health Technologies at Duke University. Jenna graduated summa cum laude from Duke University with a BS in neuroscience.
Oliver Patel is the Enterprise AI Governance Lead at AstraZeneca, where he leads the global framework of policies, standards and processes to ensure the company can realise the benefits of AI while managing associated risks. He writes the "Enterprise AI Governance" Substack and is a frequent speaker on practical frameworks for scaling AI governance in large organisations.
I've self funded my ramp up for six months and interview/grant processes are taking longer than expected.
No summary available yet.
Cover participant stipends for AI Safety Camp Virtual 2023
6-months of part-time stipend to launch a new science journalism outlet focused on AI Safety
Promoting better management of Global Catastrophic Risks in Spanish-Speaking countries.
No summary available yet.
Org director studying how social change happens | Climate, animal welfare, AI safety movements
Itay Yona is an AI security researcher and mechanistic interpretability specialist who founded MentaLeap and serves as its founder and principal investigator while also working as an AI security researcher at Google DeepMind.
Chair of the UK’s AI Security Institute, overseeing its work to evaluate and mitigate serious risks from advanced AI systems.
Hoagy Cunningham is an AI safety researcher currently working at Anthropic, where he has contributed to both interpretability and safeguards research. He holds a 2:1 in Politics, Philosophy and Economics from The Queen's College, Oxford, and earlier in his career worked as a researcher at Full Fact, the UK fact-checking charity, and as an economist. He became a SERI MATS scholar under Lee Sharkey and is the lead author of "Sparse Autoencoders Find Highly Interpretable Features in Language Models" (ICLR 2024), a foundational paper demonstrating that sparse autoencoders can recover monosemantic, interpretable features from language model activations. This work was independently developed in parallel with similar research published by Anthropic and generated significant excitement in the mechanistic interpretability community. He received Long-Term Future Fund grants supporting his sparse coding research and work on preventing steganography in interpretable representations. At Anthropic, he has contributed to research on scaling monosemanticity, constitutional classifiers for jailbreak defense, and auditing language models for hidden objectives.
No summary available yet.
No summary available yet.
No summary available yet.
Member of Technical Staff on Sage’s Epistemics team, alongside independent work as a machine learning and drug discovery researcher known for the Lo-Hi benchmark and related ML drug discovery tools.
Chris Mathwin is a mechanistic interpretability researcher based in Sydney, Australia. He holds a Master of Engineering in Civil Engineering from the University of Melbourne and transitioned into AI safety research through programs including AI Safety Camp (AISC8, 2023) and the ML Alignment Theory Scholars (MATS) program, where he worked under Lee Sharkey at Apollo Research. His primary research focuses on understanding how representations are distributed across attention heads in transformer models; this work produced the 2024 paper "Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition" (co-authored with Dennis Akar). He has also participated in multiple mechanistic interpretability hackathons, including a top-ranked submission identifying a circuit for predicting gendered pronouns in GPT-2 Small (with Guillaume Corlouer, London EA Hub). He received a grant from the Long-Term Future Fund to support a 6-month salary for an AI Safety Camp project and continuing independent mechanistic interpretability research. He is currently a Founding Research Engineer at Harmony Intelligence, an AI safety startup focused on evaluations and red teaming.
AI safety researcher at Aether focusing on work related to LLM agent safety.
No summary available yet.
No summary available yet.
No summary available yet.
No summary available yet.
Stephanie Ifayemi is Senior Managing Director of Policy at Partnership on AI, where she founded the organization’s policy department and leads global engagement with policymakers and international organizations on responsible AI governance. Previously she was Head of Digital Standards Policy in the UK government’s Department for Digital, Culture, Media and Sport, leading work on international technical standards for AI and other emerging technologies.
No summary available yet.
No summary available yet.
Dr. Ali Akbari is Director of AI Practice at Gradient Institute, bringing a background in software engineering and more than 20 years’ experience building and operationalising AI systems. He holds an MSc in Artificial Intelligence and Robotics and a PhD in Computer Vision from the Tokyo Institute of Technology, and has led major AI projects across sectors including banking, government, transport and manufacturing. He previously led development of KPMG’s Trustworthy AI Model, helped implement the NSW AI Assurance Framework at Transport for NSW, and serves on Standards Australia’s AI Committee.
No summary available yet.
Selamawit Tezera Chaka is a Pan‑African feminist and digital rights advocate from Ethiopia, serving as a United Nations Foundation Peace Next Generation Fellow and leading the sheEsecures initiative to advance women’s safety, peacebuilding and secure digital activism.
Mike Belinsky is Director of the AI Institute at Schmidt Sciences, where he helps lead strategy, management, and program design for AI initiatives; previously he was a principal at The Bridgespan Group and co-founded Instiglio, designing impact bonds such as the Educate Girls Development Impact Bond, and he holds a BA from Dartmouth College and an MPP from Harvard Kennedy School.