Finn Metz
Co-Founder @ Seldon
Loading results...
Showing 3001-3050 of 3786 results
Clear filtersShowing 3001-3050 of 3786 results
Active filters: Type: Org, Individual
Clear filters to view everything →Postdoc at Northeastern University
No summary available yet.
Japan's national AI Safety Institute (J-AISI), established within IPA under METI, develops AI safety evaluation methodologies and standards and serves as a hub for domestic and international AI safety coordination.
No summary available yet.
CIGI is an independent, non-partisan Canadian think tank that produces research and policy recommendations on international governance challenges, with a dedicated program focused on managing global-scale risks from advanced AI systems.

Daniel Herrmann is an Assistant Professor in the Department of Philosophy at the University of North Carolina at Chapel Hill, where he is also Core Faculty in the Philosophy, Politics, and Economics program. He holds a PhD in Logic and Philosophy of Science from the University of California, Irvine, and conducted postdoctoral research at the University of Groningen. His research specializes in decision theory, formal epistemology, and the philosophy of artificial intelligence, with a focus on mathematical and computational models of optimal reasoning and learning as applied to artificial agents and self-reasoning systems. He was a fellow at PIBBSS (Program for Integrated Research in Alignment), and received funding from the Long-Term Future Fund to support the final year of his PhD research on embedded agency, a core topic in AI alignment concerning how agents reason about themselves as part of the world they act in. His work on the Alignment Forum includes co-authored research on subjective naturalism in decision theory and puzzles related to wireheading and utility functions for embedded agents.
An umbrella organization for applied mathematics research in AI alignment, now operating under the name Iliad. Organizes the ILIAD conference series, runs fellowship and intensive programs, incubates research organizations, and manages scientific publishing.
Susan Etlinger is a senior fellow at CIGI and Director of AI and Innovation at Microsoft, where she leads work on the business and societal impacts of artificial intelligence, data and technology ethics.
No summary available yet.
No summary available yet.
No summary available yet.
William Wang is a Professor in the Department of Computer Science at the University of California, Santa Barbara, the Mellichamp Endowed Chair in Mind and Machine Intelligence, Director of UC Santa Barbara’s Natural Language Processing Group, and the founding Director of the Center for Responsible Machine Learning.
No summary available yet.
No summary available yet.
No summary available yet.
No summary available yet.

Jay Bailey is a former software engineer from Brisbane, Australia who transitioned into AI safety research after several years working in software. He participated in the SERI MATS Summer 2022 cohort, studying mechanistic interpretability under Neel Nanda, and subsequently received grants to upskill in ML for AI safety and to collaborate with Joseph Bloom on the Decision Transformer Interpretability project, co-authoring work on feature representations in memory-augmented gridworld agents. After struggling with direct research contributions, he leveraged his engineering background to accelerate his collaborator's research. Recognizing a stronger theory of change in evaluations as governments and labs committed to AI red-teaming, he joined the UK AI Safety Institute (AISI) as a Research Engineer, spending approximately 18 months doing frontier LLM evaluation. He currently works at Arcadia Impact as Head of Technology and Standards, where he contributes to technical AI safety efforts and supports researchers transitioning into the field.
No summary available yet.
No summary available yet.
Paula Quigley is a Community Researcher with the Ada Lovelace Institute, working with communities to explore public perspectives on artificial intelligence and its societal impacts. She designs and facilitates workshops that bring diverse and underrepresented voices into AI policy conversations and governance, drawing on senior leadership experience across housing, social enterprise and community development in Northern Ireland.
No summary available yet.
No summary available yet.
Community member bio
Machine learning PhD student at the University of Oxford whose research focuses on reasoning, multi-agent systems, post-training and AI safety, and a co-author on work with Contramont Research on cryptographic backdoors in language models accepted at NeurIPS 2024.
Heron co-founder and senior AI security and policy researcher at the Institute for AI Policy and Strategy (IAPS).
Shane Tews is a nonresident senior fellow at the American Enterprise Institute, where she focuses on cybersecurity, internet governance, and technology and innovation policy, and president of Logan Circle Strategies, advising clients on global public policy for information and communications technologies.
No summary available yet.

David Quarel is a PhD student at the Australian National University (ANU), supervised by Marcus Hutter, where he researches AI safety, Universal Artificial Intelligence, and Mechanistic Interpretability. He holds a BSc in Physics and Mathematics and an MComp specialising in AI and Machine Learning. He co-authored the textbook "An Introduction to Universal Artificial Intelligence" (Routledge, 2024) alongside Marcus Hutter and Elliot Catt. Quarel serves as Head TA at ARENA, an AI safety education programme run by the London Initiative for Safe AI (LISA), where he develops course content and teaches technical AI safety topics. He previously worked as a research assistant at the Krueger AI Safety Lab (KASL) at the University of Cambridge, and received funding to support that residency period. He has several years of teaching experience at ANU across mathematics, theoretical computer science, and digital hardware design.
No summary available yet.
No summary available yet.
French-American computer scientist and pioneer of deep learning; Turing Award laureate and professor at New York University, known for work in artificial intelligence, machine learning, computer vision, robotics and image compression.

Alan Chan is a Research Fellow at the Centre for the Governance of AI (GovAI) in London, where he focuses on AI agent governance, transparency, and technical AI governance more broadly. He completed his PhD in Computer Science at Université de Montréal / Mila (Quebec AI Institute) in 2024, advised by Nicolas Le Roux and David Krueger, and holds an MSc and BSc from the University of Alberta. During his doctoral work, he conducted a research visit with David Krueger at Cambridge focused on evaluating non-myopia in language models and RLHF systems, work motivated by the view that non-myopia is a precursor to dangerous emergent properties like deceptive alignment. His research spans development alignment evaluations (cooperativeness, corrigibility), capability evaluations (non-myopia, deception), AI agent infrastructure and governance, model transparency, and incident analysis for autonomous systems. He has also been affiliated with the Bennett School of Public Policy at the University of Cambridge as a visiting researcher.
ARENA is a 4-5 week intensive ML engineering bootcamp in London that trains technically skilled individuals to contribute to AI safety research. It covers deep learning fundamentals, mechanistic interpretability, reinforcement learning, and model evaluations.
No summary available yet.
No summary available yet.
No summary available yet.
Conrad Stosz is Head of Governance at Transluce, where he leads work on AI evaluation standards and policy. He previously led the U.S. Center for Standards and Innovation and has held AI policy roles across the White House, Congress, and the Department of Defense, building on prior experience as a machine learning engineer.
No summary available yet.
PhD student at ETH Zurich, advised by Florian Tramèr, focusing on security and failure modes of artificial intelligence.
AIS researcher, PhD student at CHAI
No summary available yet.
Logan Riggs Smith is an independent AI safety and mechanistic interpretability researcher who goes by the handle "elriggs" on LessWrong and the Alignment Forum. He earned a BS and MS in electrical and computer engineering from Mississippi State University (2014-2021), where he focused on machine learning and wireless signal processing. He is best known as a co-author of "Sparse Autoencoders Find Highly Interpretable Features in Language Models" (ICLR 2024), alongside Hoagy Cunningham, Aidan Ewart, Robert Huben, and Lee Sharkey, an influential paper that helped establish sparse autoencoders as a core technique for mechanistic interpretability. He also contributed to shard theory research with Quintin Pope, Alex Turner, and Charles Foster. The Long-Term Future Fund supported Logan for over two years with six-month stipends of $40,000 each, funding his work on sparse autoencoders and language model tools for alignment research.

Kush Bhatia is a Research Scientist at Google DeepMind in San Francisco, having previously completed a postdoctoral fellowship at Stanford University under Christopher Ré. He earned his PhD in Electrical Engineering and Computer Sciences from UC Berkeley in 2022, where he was co-advised by Peter Bartlett and Anca Dragan, and his dissertation was titled "Learning when Objectives are Hard to Specify." Before Berkeley, he completed his undergraduate degree in Computer Science at IIT Delhi and spent two years as a research fellow at Microsoft Research India working with Prateek Jain and Manik Varma. His research spans statistical machine learning, high-dimensional statistics, optimization, and AI alignment, with a particular focus on problems at the intersection of human feedback and learning system objectives, including reward misspecification, reward hacking, and developing value-aligned systems. Notable works include "The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models" (ICLR 2022), "On the Sensitivity of Reward Inference to Misspecified Human Models" (ICLR 2023), and contributions to large language model prompting and training methodology. His postdoctoral work on safety in AI and value-aligned systems was supported by the Long-Term Future Fund.
Intercultural philosopher and co-founder of the Buddhism & AI Initiative, Adjunct Senior Fellow and former director of the Asian Studies Development Program at the East-West Center in Honolulu, and author of works including Buddhism and Intelligent Technology (2021) and Consciousness Mattering (2023).
Lukas Fluri is a PhD student in Computer Science at ETH Zurich, supervised by Prof. Florian Tramèr in the SPY Lab, where he researches when and how AI systems fail and how to prevent this. He holds a BSc in Computer Science and an MSc in Data Science, both from ETH Zurich, and was awarded an ETH Medal for his Master's thesis "Evaluating Superhuman Models with Consistency Checks," which proposed a framework for surfacing mistakes in superhuman AI models using logical consistency checks. His research spans AI safety, interpretability, model evaluation, red-teaming, reinforcement learning, and the science of deep learning, covering both theoretical and empirical approaches. Prior to his PhD, he completed research internships at the University of Cambridge and UC Berkeley, during which he received Long-Term Future Fund support for an unpaid internship focused on using theory and interpretability to increase the safety of AI systems. He is also involved with Zurich AI Safety (ZAIS), a community organization focused on AI safety capacity building in Switzerland.
Dmitrii (Dima) Krasheninnikov is an AI safety researcher who completed his PhD in machine learning at the University of Cambridge in December 2025, supervised by David Krueger and Rich Turner, and subsequently joined Anthropic. He holds an MSc in AI from the University of Amsterdam (cum laude) and previously held research positions at UC Berkeley's Center for Human-Compatible AI and Sony AI Zurich. His research spans interpretability, the science of deep learning, control, and security, with a focus on ensuring advanced AI systems remain aligned with human values. He is known for coining the term "out-of-context learning" and for demonstrating that language models linearly encode the training-order of facts in their activations. He also co-authored "Defining and Characterizing Reward Hacking" (NeurIPS 2022) and has published work at NeurIPS 2024/2025, ICML 2024, and ICLR 2026. He has received funding from the Long-Term Future Fund for his PhD research in AI alignment.
James Balzer is an Australian strategic foresight practitioner and the Foresight Lead at the Odyssean Institute, where he works with governments, international organisations and businesses on scenario mapping, horizon scanning and sense-making to build long-term resilience. He serves on the steering committee of the Next Generation Foresight Practitioners network, leads the Intergenerational Fairness in Cities community of practice at the School of International Futures, and previously helped found the World Economic Forum’s Future 50 Initiative to upskill young people in foresight. He also conducts research on anticipatory governance and carbon market reform with the Disruptive Futures Institute and holds teaching and advisory roles with institutions including Macquarie University and the Lee Kuan Yew School of Public Policy.

Thomas M. Kehrenberg is a machine learning researcher currently based at the Basque Center for Applied Mathematics (BCAMATH) in Bilbao, Spain, where he works as a researcher in machine learning. He completed his PhD at the University of Sussex in 2021 with a thesis titled "Learning with biased data: invariant representations and target labels," and subsequently held a visiting research fellowship there. His primary academic research focuses on fairness and bias mitigation in machine learning, including adversarial support-matching and null-sampling techniques for interpretable and fair representations, with publications at venues such as ECCV and TMLR. In 2022, he received a grant from the Long-Term Future Fund (LTFF) for a six-month self-study period to build background knowledge for AI alignment research, during which he studied topics including VNM rationality, type theory, and topology. He subsequently wrote a post on LessWrong sharing advice for others undertaking similar alignment self-study, and has also published on the Alignment Forum exploring finite factored sets.