Marcus Williams

Bio

Updated 03/22/26

Marcus Williams is an AI safety researcher currently working on deception and scheming monitoring at OpenAI. He completed his Master's degree in engineering physics and machine learning at Lund University in 2023. After graduating, he worked at AI Safety Hub Oxford on a theoretical reinforcement learning project, co-authoring "On the Expressivity of Objective-Specification Formalisms in Reinforcement Learning," which was accepted at ICLR 2024. He received a Long-Term Future Fund grant for a six-month independent project on Multi-Objective Reinforcement Learning from AI Feedback (MORLAIF), which produced an arXiv paper demonstrating that decomposing preference modeling into multiple principles outperforms standard RLAIF baselines. In the MATS Summer 2024 cohort under mentor Micah Carroll, he researched annotator vulnerabilities and LLM influence on human preferences, resulting in the paper "Targeted Manipulation and Deception Emerge in LLMs Trained on User Feedback" (accepted at NeurIPS workshops). He is also a co-author on "Stress Testing Deliberative Alignment for Anti-Scheming Training" alongside researchers from OpenAI and Apollo Research.

Community Signal

Updated 03/22/26

0Upvotes

0Downvotes

0Endorsements

No endorsements yet.

Grants

Updated 03/22/26

LTFF 2023 Q4 - Marcus Williams

from Long-Term Future Fundfunds.effectivealtruism.org

recipient$42,000

Marcus Williams

Bio

Community Signal

Links

Grants