Doublespeak: In-Context Representation Hijacking

active

About

Updated 05/18/26

Research project that introduces Doublespeak, an in-context representation hijacking attack against large language models where harmful keywords are systematically replaced with benign tokens across multiple in-context examples so that the benign token’s internal representation acquires the harmful semantics and bypasses standard safety alignment checks.

Community Signal

Updated 05/18/26

0Upvotes

0Downvotes

0Endorsements

0Comments

Endorsements support MentaLeap.

No endorsements yet.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -