Team Shard is an informal alignment research group centered on shard theory, a framework developed by Alex Turner and Quintin Pope starting in 2022 to explain the relationship between reinforcement schedules and the values learned by RL agents. The group operates as a mentorship stream within the MATS (ML Alignment Theory Scholars) program, with Turner and co-mentor Alex Cloud guiding scholars through empirical and theoretical alignment research. Key research outputs include pioneering work on steering vectors for large language models, gradient routing, and robust unlearning (including a NeurIPS 2025 spotlight paper). The group is based in the Berkeley AI safety community and meets regularly at Lighthaven.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
- Fiscal Sponsor
- -
Theory of Change
Team Shard believes that understanding the mechanistic relationship between training procedures and learned values is a prerequisite for reliably aligning advanced AI systems. By studying how reinforcement learning shapes internal value representations (shards) in neural networks, the group aims to develop principled techniques for instilling human-compatible values during training. If successful, this line of research would enable AI developers to predictably steer what values emerge in powerful models, reducing the risk of misaligned behavior in deployed systems. The group also aims to develop interpretability tools that allow researchers to audit and verify the values present in trained models.
Grants Received
No grants recorded.
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Apr 2, 2026, 10:10 PM UTC
- Created
- Mar 19, 2026, 10:30 PM UTC
