Felix Hofstätter

London, UK

Bio

Updated 03/22/26

Felix Hofstätter is a Research Scientist on the evaluations team at Apollo Research, an AI safety organization based in London. He was previously a MATS Fellow (MATS 5.0 program), where he conducted research on AI alignment with a focus on how AI systems can strategically underperform on capability evaluations, a phenomenon known as sandbagging. He is best known for co-authoring the paper "AI Sandbagging: Language Models can Strategically Underperform on Evaluations," which demonstrated that frontier models like GPT-4 and Claude can be prompted or fine-tuned to selectively hide capabilities during assessments, undermining the trustworthiness of AI safety evaluations. Prior to his research career, he worked as a Software Consultant at TNG Technology Consulting and studied at Imperial College London. He writes about AI alignment topics on Medium and the Alignment Forum, aiming to make technical alignment research accessible to ML practitioners.