Marcus Williams
Bio
Marcus Williams is an AI safety researcher currently working on deception and scheming monitoring at OpenAI. He completed his Master's degree in engineering physics and machine learning at Lund University in 2023. After graduating, he worked at AI Safety Hub Oxford on a theoretical reinforcement learning project, co-authoring "On the Expressivity of Objective-Specification Formalisms in Reinforcement Learning," which was accepted at ICLR 2024. He received a Long-Term Future Fund grant for a six-month independent project on Multi-Objective Reinforcement Learning from AI Feedback (MORLAIF), which produced an arXiv paper demonstrating that decomposing preference modeling into multiple principles outperforms standard RLAIF baselines. In the MATS Summer 2024 cohort under mentor Micah Carroll, he researched annotator vulnerabilities and LLM influence on human preferences, resulting in the paper "Targeted Manipulation and Deception Emerge in LLMs Trained on User Feedback" (accepted at NeurIPS workshops). He is also a co-author on "Stress Testing Deliberative Alignment for Anti-Scheming Training" alongside researchers from OpenAI and Apollo Research.
Links
- Personal Website
- -
- Twitter / X
- -
- LessWrong
- -
Grants
from Long-Term Future Fund
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Mar 22, 2026, 11:14 PM UTC
- Created
- Mar 20, 2026, 2:54 AM UTC