WhiteBox Research is a Manila-based nonprofit that trains early-career researchers in mechanistic interpretability and AI safety, with a focus on building research capacity in Southeast Asia.
WhiteBox Research is a Manila-based nonprofit that trains early-career researchers in mechanistic interpretability and AI safety, with a focus on building research capacity in Southeast Asia.
People– no linked people
Updated 04/02/26Funding Details
Updated 04/02/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 04/02/26WhiteBox Research is a nonprofit organization based in Quezon City, Manila, Philippines, founded in August 2023 by Brian Tan and Clark Urzo. Brian Tan previously co-founded Effective Altruism Philippines, and Clark Urzo has a background in software engineering and AI safety research. The organization's core mission is to develop more AI safety and mechanistic interpretability researchers, particularly in Southeast Asia — a region that has historically been underrepresented in the global AI safety research community. WhiteBox's flagship program is its free AI Interpretability Fellowship, a five-month part-time program requiring approximately 15 hours of commitment per week. The fellowship curriculum draws heavily from ARENA (a widely used AI safety upskilling curriculum) and includes model training, interpretation techniques, and hands-on AI safety research through hackathons. Both in-person tracks (in Quezon City) and virtual tracks (open to South, Southeast, and East Asia) are offered. Cohort 1 produced notable results: five participants won 1st and 3rd place across two Apart Research hackathons and were accepted into the Apart Lab fellowship. Their research included work on refusal mechanisms and hazard mitigation in open-source models. Cohort 2 ran from January 30 to May 18, 2025, with a $200 completion reward for participants. The organization has been funded primarily through the Long-Term Future Fund and Manifund. The LTFF awarded $61,460 in December 2023 for Cohort 1 (covering approximately 9 months at 1.9 FTE), and $93,200 in late 2024 for Cohort 2. Additional funding has come from Manifund regranting. The team includes Learning Director Kyle Reynoso, Research Engineer Angel Martinez, and Engineering Intern Ivan Enclonar (a Cohort 1 participant). Advisers include Callum McDougall (Google DeepMind, ARENA founder) and Lee Sharkey (Apollo Research co-founder). WhiteBox Research's long-term goal is to help build a thriving AI safety research hub in Southeast Asia, with approximately 70% of effort devoted to finding and training mechanistic interpretability researchers, 20% to upskilling the team itself, and 10% on broader community-building.
Theory of Change
Updated 04/02/26WhiteBox Research believes that one constraint on AI safety progress is the global supply of skilled researchers, and that Southeast Asia is an underutilized talent pool. By identifying motivated early-career individuals in the region and providing structured, intensive training in mechanistic interpretability — a technical discipline seen as particularly promising for understanding and controlling AI systems — WhiteBox aims to produce researchers who can contribute directly to solving open problems in AI safety. Mechanistic interpretability research helps make AI models more transparent and understandable, which is a prerequisite for reliably verifying that advanced AI systems are behaving safely. More researchers in this field, especially in geographically diverse locations, increases the probability of important breakthroughs.
Grants Received– no grants recorded
Updated 04/02/26Projects– no linked projects
Updated 04/02/26Discussion
No comments yet. Be the first to share your thoughts.