Team Shard is a small alignment research collective led by Alex Turner (TurnTrout) that studies how reinforcement learning induces values in trained agents, with the goal of learning to reliably instill human-compatible values in AI systems.
Team Shard is a small alignment research collective led by Alex Turner (TurnTrout) that studies how reinforcement learning induces values in trained agents, with the goal of learning to reliably instill human-compatible values in AI systems.
People
Updated 05/18/26Co-mentor, Team Shard (MATS)
Lead mentor, Team Shard (MATS)
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 05/18/26Team Shard is an alignment research collective built around shard theory, a framework that Alex Turner (known online as TurnTrout) and Quintin Pope began developing in early 2022. The core idea of shard theory is that trained neural networks are best understood as ensembles of contextually activated computations called shards, each shaped by historical reinforcement events. Under this view, both human values and AI-learned values arise from the same kind of process, which opens a principled path to studying and potentially controlling what values emerge in trained systems. The group originated informally out of the Berkeley AI safety community, initially described as siloed under John Wentworth's SERI MATS cohort. Over time it evolved into a recognized mentorship stream within the broader MATS program. Alex Turner leads the stream as a research scientist at Google DeepMind's Scalable Alignment team, with Alex Cloud (Anthropic) serving as co-mentor. Core early members included Quintin Pope, David Udell, and Michael Einhorn. Team Shard's research agenda involves three tracks: formalizing and experimentally testing shard theory's predictions about learned values; building interpretability tools to study how values are encoded in neural networks; and developing practical techniques to influence or constrain the values that emerge during training. The group has produced notable empirical work, including early research on activation steering vectors for language models, gradient routing as a form of weak supervision, and a NeurIPS 2025 spotlight paper on distillation as a method for robustifying unlearning. Former mentees have gone on to positions at Anthropic, MIRI, and Redwood Research. Team Shard is not an incorporated nonprofit and has no independent budget or fiscal sponsorship. Scholars are funded through the MATS program, and Turner and Cloud are employed by their respective organizations. Turner's personal website (turntrout.com) serves as the primary hub for shard theory writing and Team Shard information.
Theory of Change
Updated 05/18/26Team Shard believes that understanding the mechanistic relationship between training procedures and learned values is a prerequisite for reliably aligning advanced AI systems. By studying how reinforcement learning shapes internal value representations (shards) in neural networks, the group aims to develop principled techniques for instilling human-compatible values during training. If successful, this line of research would enable AI developers to predictably steer what values emerge in powerful models, reducing the risk of misaligned behavior in deployed systems. The group also aims to develop interpretability tools that allow researchers to audit and verify the values present in trained models.
Grants Received– no grants recorded
Updated 05/18/26Projects– no linked projects
Updated 05/18/26Discussion
No comments yet. Be the first to share your thoughts.