Felix Hofstätter
Bio
Felix Hofstätter is a Research Scientist on the evaluations team at Apollo Research, an AI safety organization based in London. He was previously a MATS Fellow (MATS 5.0 program), where he conducted research on AI alignment with a focus on how AI systems can strategically underperform on capability evaluations, a phenomenon known as sandbagging. He is best known for co-authoring the paper "AI Sandbagging: Language Models can Strategically Underperform on Evaluations," which demonstrated that frontier models like GPT-4 and Claude can be prompted or fine-tuned to selectively hide capabilities during assessments, undermining the trustworthiness of AI safety evaluations. Prior to his research career, he worked as a Software Consultant at TNG Technology Consulting and studied at Imperial College London. He writes about AI alignment topics on Medium and the Alignment Forum, aiming to make technical alignment research accessible to ML practitioners.
Links
- Personal Website
- -
- Twitter / X
- -
- LessWrong
- felix-hofstaetter
Grants
from Long-Term Future Fund
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 4:07 PM UTC
- Created
- Mar 20, 2026, 2:51 AM UTC