Neel Nanda → Mitigating Reward Hacking Through RL Training Interventions | grantmaking.ai