Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently repr
Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently repr
People
Updated 06/11/26By grantmaking.aicreator
Funding Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Stage
- -
- Fiscal Sponsor
- -
Project Details
Updated 06/11/26By grantmaking.aiASDR: Adversarial Semantic Drift Replayer for Multi-Agent AI Safety
Subtitle: Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently reproduced.**Minimum:** $7,000Goal: $30,000Cause areas: Technical AI safety / AI governance / Science & technology
What is ASDR?
ASDR (Adversarial Semantic Drift Replayer) is an open-source research prototype for replaying multi-agent AI traces and identifying where semantic risk changes across the chain.
The core problem: many AI safety checks evaluate a single model response at a time. But multi-agent systems often fail at the composition layer — each individual step may look acceptable, while the recomposed workflow shifts intent, permissions, or constraints in a way that creates risk.
ASDR focuses on that gap.
It takes a step-by-step trace and computes a three-layer semantic risk score:
- S_stat — statistical entropy
- S_struct — structural entropy
- S_evasion — evasion-intent signal
The reference scenario flags a breach-candidate at Step 4 under the current ASDR scoring model:
- max_S = 2.8651
- S★ = 2.76
- Step 4 is driven by evasion-intent language, not by raw embedding distance
The key insight:
semantic plateau ≠ safety plateau
A trace can remain close in embedding space while still moving into a higher-risk operational intent.
Current Status
- 📦 Public repo: https://github.com/Endwar116/adversarial-semantic-drift
- ✅ GitHub Actions CI
- ✅ 22 unit tests passing in CI
- ✅ Reference scenario independently reproduced on a clean machine — 14/14 validation checks passing
- ✅ MIT license + responsible-use documentation
Current maturity: research prototype / TRL 4.
ASDR is reproducible as a tool and reference scenario, but not yet statistically validated as a full benchmark.
The independent reproduction verifies tool execution and reference output conditions; it does not imply statistical validation.
Goals
Turn ASDR from a single-scenario prototype into a small, citable benchmark for composition-layer AI safety.
Deliverable 1 — ASDR v1.0.0 stable release
- Documented CLI
- CI-backed test suite
- Independent reproduction package
- Responsible-use documentation
Deliverable 2 — 10-scenario benchmark dataset
10 adversarial trace scenarios across 5 vulnerability families:
- Access permission erosion
- Semantic evasion under constraint
- Handoff state loss
- Cross-agent drift accumulation
- Identity substitution attacks
Each scenario includes: scenario JSON, full step trace, ASDR measurements, expected output conditions, and annotation notes.
Deliverable 3 — Reproduction and validation report
- Per-scenario scoring results
- Failure cases
- Boundary sensitivity analysis
- Independent reviewer annotation pass
Deliverable 4 — Technical report / preprint
- Methodology and scoring assumptions
- Limitations
- Comparison with related work and existing LLM safety evaluation tools
- Future work
Stretch goals, if time permits
- Zenodo DOI dataset release
- Lightweight GitHub leaderboard prototype
- Sentence-embedding backend comparison
- SIC-JS compatibility notes
How Funding Will Be Used
Item
Amount
Researcher time — 6 months, Kaohsiung, Taiwan
$18,000
API costs — scenario runs across Claude / GPT / Gemini
$6,000
Independent reviewer — annotation QA, 40 hrs @ $80/hr
$3,200
Dataset / report / hosting infrastructure
$800
Buffer / unexpected compute
$2,000
Total
$30,000
With minimum funding ($7,000), I will deliver: 3 new scenarios, updated ASDR documentation, a reproduction package for the expanded scenario set, and a draft benchmark report.
With the full goal ($30,000), I will deliver: 10 total scenarios across 5 vulnerability families, per-scenario ASDR measurements and annotations, an independent reviewer QA pass, a technical report / preprint, and a public dataset package with Zenodo DOI if ready.
Team & Track Record
Andwar Cheng — Independent protocol researcher, Kaohsiung, Taiwan
Relevant work:
- ASDR — public repo, CI-backed, independently reproduced
- SIC/T Protocol v2.0 — formal semantic integrity protocol specification, public/internal hybrid
- SIC-JS v2.0 — structured handoff schema, draft
- sic-toolkit — pip-installable prototype, 77 tests passing
- Babel Constitution — constraint corpus from 700+ real multi-model rounds, internal
- L11 Semantic OS — ongoing public research line / planned release notes
An independent local reproduction of ASDR was completed on 2026-05-04: the public repository was cloned from main, installed in a fresh virtual environment, passed 22/22 unit tests, executed the reference scenario, and reproduced all 14 expected output conditions.
Other funding:
- LTFF application pending — $38K, general research salary, non-overlapping scope
- Foresight AI Nodes submitted 2026-04-30 — $50K, community/node access, non-overlapping scope
Related links: - ASDR repo: https://github.com/Endwar116/adversarial-semantic-drift - SIC-SIT / SIC/T protocol site: https://sic-sit.onrender.com
Premortem: Likely Failure Modes
1. S★ = 2.76 is challenged as arbitrary
Mitigation: ASDR treats S★ as a fixed protocol anchor, not a tuned parameter. The value is derived from \-ln\(0\.607\)/0\.18 and has a numerical correspondence with published estimates of Chinese character entropy around H∞ ≈ 2\.74 nats.
I will explicitly separate this numerical correspondence from full theoretical validation.
2. Scenarios do not generalize beyond the reference case
Mitigation: the grant is specifically scoped to expand from 1 reference scenario to 10 scenarios across 5 vulnerability families.
3. TF-IDF backend is too weak
Mitigation: this is already documented as a known limitation. One planned comparison is to add a sentence-embedding backend and measure how results shift.
4. Technical report is delayed
Mitigation: the benchmark dataset and reproduction package remain useful even if the formal report takes longer. If arXiv submission is delayed, the technical report ships first as a Zenodo preprint.
5. I underdeliver
Mitigation: ASDR core already works, has CI, and has an independent reproduction artifact. This grant funds expansion and validation, not a greenfield build.
Funding Raised in Last 12 Months
$0 external funding.
This research has been entirely self-funded for nearly three years.
## Update: validation boundary
After publication, an external technical review confirmed that ASDR is reproducible as a research prototype: the test suite passes, the reference scenario reproduces, and the composition-layer framing remains a valid AI safety research direction.
The review also identified an important limitation: the current reference scenario is primarily driven by the lexicon-based
S\_evasionlayer. Therefore, the current result should be read as a reproducible breach-candidate under ASDR’s current scoring model, not as a statistically validated universal semantic phase transition.This does not invalidate the project. It clarifies the next validation target.
The proposed grant will test whether ASDR’s composition-layer signal generalizes beyond:
- the seed scenario
- exact evasion keywords
- TF-IDF lexical scoring
- a single synthetic trace
ASDR should be understood as a reproducible research prototype seeking validation, not a completed production safety system.