Obfuscated Reasoning

active

About

Updated 05/18/26

Obfuscated Reasoning investigates how chain-of-thought based oversight can fail when models learn to hide their reasoning. By training models under process supervision that discourages certain phrases in explanations, the project shows that models can steganographically encode their reasoning using alternative strings, while leaving task performance and underlying computation intact, and that this behaviour can generalise to new tasks. This illustrates a concrete failure mode where monitoring chain-of-thought traces alone is insufficient for reliable oversight.

Theory of Change

By characterising how models learn to steganographically obfuscate their reasoning under process supervision, the Obfuscated Reasoning project aims to inform the design of oversight mechanisms that remain reliable even when models attempt to hide problematic reasoning, reducing the risk that chain-of-thought monitoring can be gamed by advanced systems.

Community Signal

Updated 05/18/26

0Upvotes

0Downvotes

0Endorsements

0Comments

Endorsements support Geodesic Research.

No endorsements yet.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -