Teun van der Weij
Bio
Teun van der Weij is a Dutch AI safety researcher and Member of Technical Staff at Apollo Research in Zurich, where he focuses on evaluating AI capabilities and propensities related to scheming and AI control. He completed an MSc at Utrecht University (2022-2024) and a BSc summa cum laude at University College Groningen (2018-2021). He was a scholar in the MATS 5.0 program, where he researched sandbagging and how personas affect the cognition of language models under mentor Francis Rhys Ward. His most prominent work is the paper "AI Sandbagging: Language Models can Strategically Underperform on Evaluations," published at ICLR 2025, which demonstrated that frontier models like GPT-4 and Claude 3 Opus can be prompted or fine-tuned to selectively hide capabilities during evaluations. He also co-authored earlier work on shutdown avoidance in language models and activation steering. In addition to his research role, he co-founded the European Network for AI Safety (ENAIS) and serves on its board.
Links
- Personal Website
- https://teunvanderweij.com/
- Twitter / X
- LessWrong
- teun-van-der-weij
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 23, 2026, 1:28 AM UTC
- Created
- Mar 20, 2026, 2:59 AM UTC