Doublespeak: In-Context Representation Hijacking
active
About
Updated 05/18/26Research project that introduces Doublespeak, an in-context representation hijacking attack against large language models where harmful keywords are systematically replaced with benign tokens across multiple in-context examples so that the benign token’s internal representation acquires the harmful semantics and bypasses standard safety alignment checks.
Discussion
Sign in to comment
No comments yet. Be the first to share your thoughts.
Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -