Teun van der Weij

Zurich, Switzerland

Bio

Updated 03/23/26

Teun van der Weij is a Dutch AI safety researcher and Member of Technical Staff at Apollo Research in Zurich, where he focuses on evaluating AI capabilities and propensities related to scheming and AI control. He completed an MSc at Utrecht University (2022-2024) and a BSc summa cum laude at University College Groningen (2018-2021). He was a scholar in the MATS 5.0 program, where he researched sandbagging and how personas affect the cognition of language models under mentor Francis Rhys Ward. His most prominent work is the paper "AI Sandbagging: Language Models can Strategically Underperform on Evaluations," published at ICLR 2025, which demonstrated that frontier models like GPT-4 and Claude 3 Opus can be prompted or fine-tuned to selectively hide capabilities during evaluations. He also co-authored earlier work on shutdown avoidance in language models and activation steering. In addition to his research role, he co-founded the European Network for AI Safety (ENAIS) and serves on its board.

Community Signal

Updated 03/23/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/23/26

LTFF 2024 Q1 - Teun van der Weij

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$30,087

Teun van der Weij

Bio

Community Signal

Links

Grants