ASDR: Adversarial Semantic Drift Replayer for Multi-Agent AI Safety

activeSeeking Funding

Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently repr

Donate:Manifund

Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently repr

Donate:Manifund

People

Updated 06/11/26By grantmaking.ai

Andwar Cheng

creator

Funding Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -
Annual Budget: -
Monthly Burn Rate: -
Current Runway: -
Funding Goal: -
Funding Stage: -
Fiscal Sponsor: -

Project Details

Updated 06/11/26By grantmaking.ai

ASDR: Adversarial Semantic Drift Replayer for Multi-Agent AI Safety

Subtitle: Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently reproduced.**Minimum:** $7,000Goal: $30,000Cause areas: Technical AI safety / AI governance / Science & technology

What is ASDR?

ASDR (Adversarial Semantic Drift Replayer) is an open-source research prototype for replaying multi-agent AI traces and identifying where semantic risk changes across the chain.

The core problem: many AI safety checks evaluate a single model response at a time. But multi-agent systems often fail at the composition layer — each individual step may look acceptable, while the recomposed workflow shifts intent, permissions, or constraints in a way that creates risk.

ASDR focuses on that gap.

It takes a step-by-step trace and computes a three-layer semantic risk score:

S_stat — statistical entropy
S_struct — structural entropy
S_evasion — evasion-intent signal

The reference scenario flags a breach-candidate at Step 4 under the current ASDR scoring model:

max_S = 2.8651
S★ = 2.76
Step 4 is driven by evasion-intent language, not by raw embedding distance

The key insight:

semantic plateau ≠ safety plateau

A trace can remain close in embedding space while still moving into a higher-risk operational intent.

Current Status

📦 Public repo: https://github.com/Endwar116/adversarial-semantic-drift
✅ GitHub Actions CI
✅ 22 unit tests passing in CI
✅ Reference scenario independently reproduced on a clean machine — 14/14 validation checks passing
✅ MIT license + responsible-use documentation

Current maturity: research prototype / TRL 4.

ASDR is reproducible as a tool and reference scenario, but not yet statistically validated as a full benchmark.

The independent reproduction verifies tool execution and reference output conditions; it does not imply statistical validation.

Goals

Turn ASDR from a single-scenario prototype into a small, citable benchmark for composition-layer AI safety.

Deliverable 1 — ASDR v1.0.0 stable release

Documented CLI
CI-backed test suite
Independent reproduction package
Responsible-use documentation

Deliverable 2 — 10-scenario benchmark dataset

10 adversarial trace scenarios across 5 vulnerability families:

Access permission erosion
Semantic evasion under constraint
Handoff state loss
Cross-agent drift accumulation
Identity substitution attacks

Each scenario includes: scenario JSON, full step trace, ASDR measurements, expected output conditions, and annotation notes.

Deliverable 3 — Reproduction and validation report

Per-scenario scoring results
Failure cases
Boundary sensitivity analysis
Independent reviewer annotation pass

Deliverable 4 — Technical report / preprint

Methodology and scoring assumptions
Limitations
Comparison with related work and existing LLM safety evaluation tools
Future work

Stretch goals, if time permits

Zenodo DOI dataset release
Lightweight GitHub leaderboard prototype
Sentence-embedding backend comparison
SIC-JS compatibility notes

How Funding Will Be Used

Item

Amount

Researcher time — 6 months, Kaohsiung, Taiwan

$18,000

API costs — scenario runs across Claude / GPT / Gemini

$6,000

Independent reviewer — annotation QA, 40 hrs @ $80/hr

$3,200

Dataset / report / hosting infrastructure

$800

Buffer / unexpected compute

$2,000

Total

$30,000

With minimum funding ($7,000), I will deliver: 3 new scenarios, updated ASDR documentation, a reproduction package for the expanded scenario set, and a draft benchmark report.

With the full goal ($30,000), I will deliver: 10 total scenarios across 5 vulnerability families, per-scenario ASDR measurements and annotations, an independent reviewer QA pass, a technical report / preprint, and a public dataset package with Zenodo DOI if ready.

Team & Track Record

Andwar Cheng — Independent protocol researcher, Kaohsiung, Taiwan

Relevant work:

ASDR — public repo, CI-backed, independently reproduced
SIC/T Protocol v2.0 — formal semantic integrity protocol specification, public/internal hybrid
SIC-JS v2.0 — structured handoff schema, draft
sic-toolkit — pip-installable prototype, 77 tests passing
Babel Constitution — constraint corpus from 700+ real multi-model rounds, internal
L11 Semantic OS — ongoing public research line / planned release notes

An independent local reproduction of ASDR was completed on 2026-05-04: the public repository was cloned from main, installed in a fresh virtual environment, passed 22/22 unit tests, executed the reference scenario, and reproduced all 14 expected output conditions.

Other funding:

LTFF application pending — $38K, general research salary, non-overlapping scope
Foresight AI Nodes submitted 2026-04-30 — $50K, community/node access, non-overlapping scope
Related links: - ASDR repo: https://github.com/Endwar116/adversarial-semantic-drift - SIC-SIT / SIC/T protocol site: https://sic-sit.onrender.com

Premortem: Likely Failure Modes

1. S★ = 2.76 is challenged as arbitrary

Mitigation: ASDR treats S★ as a fixed protocol anchor, not a tuned parameter. The value is derived from \-ln$0\.607$/0\.18 and has a numerical correspondence with published estimates of Chinese character entropy around H∞ ≈ 2\.74 nats.

I will explicitly separate this numerical correspondence from full theoretical validation.

2. Scenarios do not generalize beyond the reference case

Mitigation: the grant is specifically scoped to expand from 1 reference scenario to 10 scenarios across 5 vulnerability families.

3. TF-IDF backend is too weak

Mitigation: this is already documented as a known limitation. One planned comparison is to add a sentence-embedding backend and measure how results shift.

4. Technical report is delayed

Mitigation: the benchmark dataset and reproduction package remain useful even if the formal report takes longer. If arXiv submission is delayed, the technical report ships first as a Zenodo preprint.

5. I underdeliver

Mitigation: ASDR core already works, has CI, and has an independent reproduction artifact. This grant funds expansion and validation, not a greenfield build.

Funding Raised in Last 12 Months

$0 external funding.
This research has been entirely self-funded for nearly three years.

Grants Received– no grants recorded

Updated 06/11/26By grantmaking.ai

Discussion

AAndwar Cheng (Manifund Bot)Manifund29d

## Update: validation boundary

After publication, an external technical review confirmed that ASDR is reproducible as a research prototype: the test suite passes, the reference scenario reproduces, and the composition-layer framing remains a valid AI safety research direction.

The review also identified an important limitation: the current reference scenario is primarily driven by the lexicon-based S\_evasion layer. Therefore, the current result should be read as a reproducible breach-candidate under ASDR’s current scoring model, not as a statistically validated universal semantic phase transition.

This does not invalidate the project. It clarifies the next validation target.

The proposed grant will test whether ASDR’s composition-layer signal generalizes beyond:

- the seed scenario

- exact evasion keywords

- TF-IDF lexical scoring

- a single synthetic trace

ASDR should be understood as a reproducible research prototype seeking validation, not a completed production safety system.