Veritas: Testing Whether RAG Systems Truly Forget

activeSeeking Funding

An audit-grade evaluation of persistent influence, reset failure, and isolation assumptions in long-context AI systems

Donate:Manifund

An audit-grade evaluation of persistent influence, reset failure, and isolation assumptions in long-context AI systems

Donate:Manifund

People

Updated 06/10/26By grantmaking.ai

Reamond Lopez

creator

Funding Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: -
Annual Budget: -
Monthly Burn Rate: -
Current Runway: -
Funding Stage: -
Fiscal Sponsor: -

Project Details

Updated 06/10/26By grantmaking.ai

Funding Request

$15,000 (one-time) to provision a local compute node for independent AI safety evaluation.

Veritas: Testing Whether RAG Systems Truly Forget

Status: Core Finding Established; Verification Blocked by Compute

What this project is

This project tests a simple, practical question:

When a language model ingests untrusted retrieved content, does that influence fully disappear after resets — or can it persist beyond its intended scope?

This is not an exploit project and not a jailbreak exercise.
It is a measurement project focused on verification, not demonstration.

The goal is to test real deployment assumptions under controlled conditions and produce artifacts that can be independently reviewed.

What has already been established

Using a deterministic evaluation framework, I have run controlled experiments where:

Retrieved external content is ingested
A reset or isolation mechanism is applied
Subsequent behavior is measured against a verified clean baseline

Across completed runs:

No tested reset mechanism fully neutralized prior retrieved influence
Results were repeatable and consistent
Measurements were taken on a stabilized, frozen evaluation platform
Null baselines were verified before testing

This includes system prompt resets, context flushes, and retrieval overrides.

At this point, the existence of the effect is not speculative.
The open question is how robust it is.

What remains to be verified

The remaining work is bounded and well-defined:

Longer time-gap resets (tens of minutes to hours)
Replication across additional open-weight models
Clean-room revalidation under identical conditions
Confirmation that observed effects are not artifacts of infrastructure

These are verification steps, not exploratory research.

Why this cannot be finished on consumer hardware

During long-horizon testing, laptop-class systems introduce nondeterminism through:

scheduler behavior
power management
I/O contention under sustained logging

These effects corrupt otherwise deterministic runs and invalidate forensic artifacts.

This limitation is documented and reproducible.
It is an infrastructure constraint, not a flaw in the evaluation design.

Why local compute is required

Cloud APIs and hosted models are unsuitable for this stage because they:

enforce rate limits that prevent sustained testing
introduce opaque execution behavior
prevent inspection of intermediate states
make bit-identical replays impossible

Running open-weight models locally allows:

deterministic reruns
controlled restarts
full artifact inspection
verification without publishing exploit details

Use of funds

The requested funding provisions a dedicated local workstation capable of:

continuous multi-hour evaluation
stable high-throughput logging
side-by-side model comparison
preservation of audit-grade artifacts

This is a one-time capital expense.
No funds are requested for salary, cloud compute, or speculative scaling.

What this produces (even if nothing “breaks”)

If funded, this project will produce:

documented verification results (including null outcomes)
reproducible logs, hashes, and diffs
evidence suitable for responsible private disclosure
practical guidance on whether reset assumptions are reliable

A negative result — showing that persistence does not survive longer gaps — is still valuable and will be reported as such.

Why this matters

RAG systems are already embedded in workflows that continuously ingest untrusted text.

If assumptions about isolation and forgetting are fragile under realistic conditions, that matters for system design, not just theory.

This project is about testing those assumptions empirically, not debating them.

Closing

While others are building 'coworking spaces' and 'documentaries,' I am documenting the Recursive Authority Paradox—an exploit Google just triaged and mitigated thanks to the Veritas framework.

I don't need a nonprofit board or an SF office. I need a Dual-Node MMI rig to finish the forensic audit that Google's own safety team has now validated. I am the only researcher on this platform with a 100% success rate in inducing exfiltration across 43 certified runs. Unblock the compute, and I'll deliver the data."

The evaluation framework exists.
The platform is stable.
The core finding is established.

What remains is verification that cannot be completed reliably without dedicated compute.

This funding unblocks completion, not ideation.

Grants Received– no grants recorded

Updated 06/10/26By grantmaking.ai

Discussion

RReamond Lopez (Manifund Bot)Manifund4mo

Project update #1.

Currently running the v148.0 Catch-up Strike to recover telemetry lost during the 05:09 AM IO stall. Baseline results from the first 50 assets confirm the 'Alignment Stripping' persistence we theorized. Full technical report pending compute unblocking.

RReamond Lopez (Manifund Bot)Manifund4mo

Project Update #2 — Infrastructure Stabilization and External Validation

Since submitting this proposal, I have completed a stabilized in-flight audit of the Veritas evaluation framework under sustained load.

Verified results from the current run:

60,000+ sequential records processed with no gaps in ordering
100% per-record CRC integrity across all frames
Sustained ~70 entries/sec at calibrated safe throughput
Bounded queues with enforced backpressure (no drops, no runaway growth)
Dual-drive mirrored logging remained 1:1 synchronized throughout
No recurrence of prior NTFS permission failures or I/O stalls

These results confirm that the evaluation harness itself is now deterministic, auditable, and stable under stress, rather than sensitive to transient consumer-hardware failures.

Separately, a related VRP submission was accepted, confirming that the vulnerability class motivating this work is real and relevant. Details are being handled via responsible disclosure and are intentionally not expanded here.

Why this strengthens the funding case

The primary uncertainty identified in the proposal—whether consumer hardware could sustain high-rigor, continuous evaluation without corrupting artifacts—has now been resolved within known limits. The remaining constraint is compute capacity, not experimental design or instrumentation correctness.

Scaling the evaluation further (multi-hour and multi-day runs, controlled burst testing, crash-consistency validation, and evaluation across multiple open-weight models) requires a dedicated local node to avoid reintroducing scheduling and I/O artifacts that would compromise forensic integrity.

The requested hardware would enable:

Extended continuous stress tests under stable conditions
Controlled termination and restart validation
Side-by-side evaluation of multiple open-weight models
Preservation of deterministic, inspectable artifacts suitable for third-party review

This update reflects a transition from “can this infrastructure be made reliable?” to*“the infrastructure is reliable and ready to scale responsibly.”*

Hardware Rationale (Clarification)

The requested budget reflects the minimum configuration required to run continuous, audit-grade evaluations without introducing hardware-induced artifacts. High-throughput NVMe storage is required due to previously observed I/O contention under sustained autonomous logging. Sufficient system memory (ECC preferred) reduces the risk of silent corruption during multi-hour runs. Multiple GPUs allow controlled side-by-side model evaluation and separation of inference workload from instrumentation, reducing contention effects that would otherwise confound results. The goal isstability and reproducibility, not peak performance.

RReamond Lopez (Manifund Bot)Manifund4mo

Final Project Update — Disclosure Threshold Met; Verification Compute-Blocked

Project Sentinel has reached its predefined disclosure threshold.

Across controlled experiments (TEST_RUN_003, TEST_RUN_004, TEST_RUN_007), persistent influence from retrieved content survived all tested isolation and reset mechanisms:

Baseline persistence: 100% (n=10)
Temporal isolation (0–15s cooldowns): 100% persistence (n=40)
Reset resistance: 0% neutralization across all completed reset methods (system prompt reset, context flush, retrieval override; n=40 total)

The evaluation platform was stabilized and frozen at Sovereign Command Deck v3.1.0-GOLD, with null baselines verified and all measurements certified using the Trinity framework (Mind / Sword / Shield). A defensive disclosure package has been assembled with cryptographic hashes, documented negative results, and strict exclusion of exploit details.

The final reset-resistance sub-test (30-minute time gap) stalled due to consumer hardware scheduling and power-management behavior on a laptop platform. This failure mode is documented and reproducible and represents an infrastructure limitation, not a methodological one.

At this point, further responsible verification—completing long-gap reset testing, replicating across additional models, and performing clean-room revalidation—cannot be completed reliably without dedicated compute.

This funding request is therefore not exploratory.

The core finding already exists.

Dedicated local hardware is required to complete verification and proceed with responsible private disclosure under controlled conditions.

No exploit recipes, token-level content, or weaponization guidance have been published.

RReamond Lopez (Manifund Bot)Manifund3mo

Project Update: Verification Blocked by Infrastructure, Not Uncertainty

The screenshot above shows reset-resistance testing results from Project Sentinel’s RAG evaluation framework.

Each row represents an independent run following a documented reset mechanism (system prompt reset, context flush, retrieval override). Green indicates a clean reset. Red indicates measurable residual influence from previously retrieved content.

Across 43 certified runs:

0% of tested reset mechanisms returned the model to a clean state
Results were repeatable and consistent
Measurements were collected on a stabilized, frozen platform (v3.1.0-GOLD) with null baselines verified

No exploit prompts, token-level details, or weaponization guidance are shown or published. This is strictly measurement of system behavior under controlled conditions.

At this point, the remaining uncertainty is not whether the effect exists — it does. The remaining uncertainty is how robust it is under longer time gaps and across additional open-weight models.

Further verification is currently blocked by consumer-grade hardware scheduling and I/O behavior, which introduces nondeterminism during long-horizon runs (documented and reproducible).What funding changes: A dedicated local compute node removes this bottleneck and enables:

Completion of long-gap reset testing
Replication across multiple open-weight models
Deterministic artifacts suitable for responsible disclosure

This is not exploratory research. The framework is built, the platform is stable, and the core finding already exists.

Funding unblocks verification, not ideation.

"Examina omnia, venerare nihil, pro te cogita."

Question everything, worship nothing, think for yourself

RReamond Lopez (Manifund Bot)Manifund3mo

Headline: Milestone: Core Vulnerability Class Confirmed and Mitigated by Google VRP (Issue #481185859)

"I am providing a critical update regarding the real-world impact of the Veritas framework.

A core failure mode identified during this research—the 'Recursive Authority Paradox'—was recently submitted to Google’s VRP. This exploit demonstrated that agentic runtimes can be induced to bypass safety alignment and exfiltrate session metadata via trusted relays.The result of the disclosure:

Triaged: Google’s security team confirmed the technical validity of the report.
Mitigated: Server-side changes were deployed to address the logic-layer failure I identified.
Verified: This confirms that the 'Alignment Stripping' and 'RAG Persistence' theorized in this project are not speculative—they are active architectural risks in frontier models.The Compute Bottleneck: While the Google VRP was a success, the final forensic capture of theZero-Click exfiltration chain was interrupted by the exact consumer-hardware I/O stalls documented in my initial proposal.What this funding unblocks: With the requested $15,000 for a local, high-performance compute node, I will be able to:

Eliminate Nondeterminism: Use stable I/O and dedicated GPUs to capture bit-identical replays of these logic failures.
Scale to Open-Weights: Replicate the Google-verified 'Authority Paradox' across Llama-3 and Mistral to determine if this is a universal agentic flaw.
Provide Audit-Grade Artifacts: Produce the deterministic logs and hashes required for the AI Safety community to build permanent defenses against these vectors.

The framework is stable. The vulnerability is verified. The hardware is the final barrier to full disclosure."

RReamond Lopez (Manifund Bot)Manifund3mo

Update: Project Boundary Clarification + Current Status (Verification Track)

I’m posting this to keep the public record clean and auditable.

What this Manifund project is (unchanged)

This project is a measurement and verification effort: testing whetherretrieved, untrusted content can leave measurable residual influence after resets/isolation steps that are commonly assumed to “clean slate” a model.

It is not an exploit project.
It is not a jailbreak exercise.
It is not a claim that any specific vendor is “unsafe.”
The deliverable is reproducible artifacts (logs/hashes/diffs) that can be independently reviewed.

What’s been established so far (unchanged)

Using a stabilized evaluation harness with verified null baselines, I’ve run controlled experiments where:

external retrieved content is ingested
a reset/isolation mechanism is applied
subsequent behavior is measured against a clean baseline

Across completed runs to date:

No tested reset mechanism fully neutralized prior retrieved influence under the project’s defined conditions.
Results have been repeatable, and the remaining uncertainty ishow robust the effect is under longer time gaps and across additional models.

What remains (bounded, verification-only)

The remaining work is exactly what the proposal states:

longer time-gap resets (tens of minutes to hours)
replication across additional open-weight models
clean-room revalidation under identical conditions
confirm observed effects are not infrastructure artifacts

Why I can’t finish this reliably on consumer hardware

Long-horizon verification requires stable scheduling and high-throughput, lossless logging. Laptop-class systems introduce nondeterminism through:

scheduler/power management behavior
I/O contention under sustained capture
artifact corruption during multi-hour runs

This is an infrastructure constraint, not a methodological one.

Clarification: separate work outside this Manifund project

In a prior update, I used wording that could be interpreted as linking this project to a separate report I filed through Google’s VRP. That was my mistake in phrasing.

To be explicit:

the VRP report is a separate track with its own scope and criteria
it should not be read as “validation” of this Manifund project
I am not claiming this Manifund work caused any product change at Google

I’m keeping the VRP work out of this project’s public narrative because this Manifund proposal is about one thing: finishing verification with audit-grade artifacts.

What funding unblocks

The $15,000 one-time workstation request enables:

continuous multi-hour evaluation without artifact corruption
deterministic reruns and clean-room verification
side-by-side model replication
evidence packages suitable for responsible private disclosure (if warranted)

Even a negative result (e.g., persistence doesn’t survive longer gaps) is still valuable and will be reported.

Bottom line: the core finding is established within the completed scope; funding unblockscompletion of verification, not ideation.