Preventing Sociopathic Robots

active

About

Updated 05/18/26

Preventing Sociopathic Robots is an AI safety research program that explores how to design world-modeling AI agents whose prosocial alignment emerges from their own homeostatic needs rather than from external constraints alone. Using simulations in an open-ended, Minecraft-like environment, the team studies agents endowed with meta-learning, counterfactual modeling, and explicit representations of vulnerability and bodily-like needs to see when cooperative or antisocial “personalities” develop. The project builds on prior theoretical work arguing that artificial empathy must incorporate affect and embodied vulnerability and has produced a Science Robotics paper proposing architectural principles for preventing antisocial machine behavior.

Theory of Change

By demonstrating that agents with explicit vulnerability and homeostatic drives in rich simulated environments naturally develop prosocial strategies and empathic responses, the project aims to identify design principles for AI systems that are less likely to exhibit sociopathic behavior and more likely to align with human values without relying solely on external reward shaping or hard constraints.

Community Signal

Updated 05/18/26

0Upvotes

0Downvotes

0Endorsements

0Comments

Endorsements support Institute for Advanced Consciousness Studies.

No endorsements yet.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -