Early-stage work on a small internal-control layer that tracks instability in LLM reasoning and switches between SAFE / WARN / BREAK modes.
Early-stage work on a small internal-control layer that tracks instability in LLM reasoning and switches between SAFE / WARN / BREAK modes.
People
Updated 06/10/26By grantmaking.aicreator
Funding Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Stage
- -
- Fiscal Sponsor
- -
Project Details
Updated 06/10/26By grantmaking.aiProject Summary
This is an early-stage, single-person research project exploring whether a single-scalar “hazard” signal can track internal instability in large language models.
The framework is called ZTGI-Pro v3.3 (Tek-Throne / Single-FPS model).
The core idea is that inside any short causal-closed region (CCR) of reasoning, a model should behave as if there is one stable executive trajectory (Single-FPS).
When the model is pulled into mutually incompatible directions—contradiction, “multiple voices”, incoherent reasoning—the Single-FPS constraint begins to break, and we can treat the system as internally unstable.
ZTGI-Pro models this pressure with a hazard scalar:
H=I=−lnQH = I = -\ln QH=I=−lnQ
fed by four internal signals:
- σ — jitter (unstable token-to-token transitions)
- ε — dissonance (self-contradiction, “two voices”)
- ρ — robustness
- χ — coherence
These feed into HHH.
As inconsistency grows, HHH increases; a small state machine switches between:
- SAFE
- WARN
- BREAK (Ω = 1)
When E ≈ Q drops near zero and Ω = 1, the CCR is interpreted as no longer behaving like a single stable executive stream.
So far, I have built a working prototype on top of a local LLaMA model (“ZTGI-AC v3.3”).
It exposes live metrics (H, dual EMA, risk r, p_break, gates) in a web UI and has already produced one full BREAK event with Ω = 1.
This is not a full safety solution—just an exploratory attempt to see whether such signals are useful at all.
Additionally, I recently published the ZTGI-V5 Book (Zenodo DOI: 10.5281/zenodo.17670650), which expands the conceptual model, formalizes CCR/SFPS dynamics, and clarifies the theoretical motivation behind the hazard signal.
What are this project’s goals? How will you achieve them?
Goals (exploratory)
- Finalize and “freeze” the mathematical core of ZTGI-Pro v3.3
(hazard equations, hysteresis, EMA structure, CCR / Single-FPS interpretation). - Turn the prototype into a small reproducible library others can test.
- Design simple evaluation scenarios where the shield either helps or clearly fails.
- Write a short, honest technical report summarizing results and limitations.
How I plan to achieve this
- Split the current prototype into:
- ztgi-core (math, transforms, state machine)
- ztgi-shield (integration with LLM backends)
- Build 3–4 stress-test scenarios:
- contradiction prompts
- “multi-executor” / multiple-voice prompts
- emotional content
- coherence-stress tests
- Log hazard traces with and without the shield, compare patterns.
- Document all limitations clearly (false positives, flat hazard, runaway hazard).
- Produce a small technical note or arXiv preprint as the final deliverable.
This is intentionally scoped:
The goal is to test viability, not claim guarantees.
What has been built so far?
The prototype currently supports:
- A LLaMA-based assistant wrapped in ZTGI-Shield
- Real-time computation of:
- hazard HHH
- dual EMA Hs,Hl,H^H_s, H_l, \hat{H}Hs,Hl,H^
- risk r=H^−H∗r = \hat{H} - H^*r=H^−H∗
- collapse probability pbreakp_\text{break}pbreak
- mode labels (SAFE / WARN / BREAK)
- INT/EXT gates
- A live UI that updates metrics as conversation progresses
Stress test outcomes
- For emotionally difficult messages (“I hate myself”), the shield remained in SAFE, producing supportive responses without panicking.
- For contradiction and “multi-voice” prompts, hazard increased as expected.
- In one extreme contradiction test, the system entered a full BREAK state with:
- high H
- near-zero Q / E
- p_break ≈ 1
- INT gate
- collapse flag Ω = 1 set
These are early single-user tests, but they show interpretable signal behavior.
How will this funding be used?
Request: $20,000–$30,000 for 3–6 months.
Breakdown
- $10,000 — Researcher time
To work full-time without immediate financial pressure. - $6,000 — Engineering & refactor
Packaging, examples, evaluation scripts, dashboard polish. - $2,000–$3,000 — Compute & infra
GPU/CPU time, storage, logs, testing. - $2,000 — Documentation & design
Technical note, diagrams, reproducible examples.
Deliverables include:
- cleaned-up codebase,
- simple eval suite,
- reproducible dashboard,
- and a short technical write-up.
Roadmap (high-level)
Month 1–2 — Core cleanup
- Standardize v3.3 equations (ρ family, calibrations).
- Refactor into ztgi-core / ztgi-shield.
- Add tests & examples.
Month 2–3 — Evaluations
- Define 3–4 stress scenarios.
- Collect hazard traces.
- Compare with/without shield.
- Summarize failures + successes.
Month 3–6 — Packaging & report
- Release code + dashboard.
- Publish a short technical note (or arXiv preprint).
- Document limitations + open problems.
How does this contribute to AI safety?
This project asks a narrow but important question:
“Can a single scalar hazard signal + a small state machine
give useful information about when an LLM’s local CCR
stops behaving like a single stable executive stream?”
If no, the negative result is useful.
If yes, ZTGI-Pro may become a small building block for:
- agentic system monitors,
- inconsistency detectors,
- collapse warnings,
- or more principled hazard models.
All code, metrics, and results will be publicly available for critique.
Links
Primary Materials
- ZTGI-V5 Book (Zenodo, DOI):
https://doi.org/10.5281/zenodo.17670650 - ZTGI-Pro v3.3 Whitepaper (DOI):
https://doi.org/10.5281/zenodo.17537160
Live Demo (Experimental — Desktop Only)
https://indianapolis-statements-transparency-golden.trycloudflare.com
This Cloudflare Tunnel demo loads reliably on desktop browsers (Chrome/Edge).*Mobile access may not work. If the demo is offline, please refer to the Zenodo reports.*
Update:
The full ZTGI-Pro v3.3 prototype is now open-source under an MIT License.
GitHub repository (hazard layer, shield, CCR state machine, server, demo code):
👉 https://github.com/capterr/ZTGI-Pro-v3.3
If anyone wants a minimal working example or guidance on how the shield integrates with LLaMA (GGUF), I’m happy to provide it.
Model path + installation instructions are included in the README.
— Furkan
Screenshots
https://drive.google.com/file/d/1v5-71UgjWvSco1I7x_Vl2fbx7vbJ_O9n/view?usp=sharing
https://drive.google.com/file/d/1P0XcGK_V-WoJ_zyt4xIeSukXTLjOst7b/view?usp=sharing
- SAFE / WARN / BREAK transitions
- p_break and H/E trace examples
- UI screenshots
Grants Received– no grants recorded
Updated 06/10/26By grantmaking.aiDiscussion
The ZTGI-V5 Book (DOI: 10.5281/zenodo.17670650) contains the full theoretical background behind the hazard signal and Single-FPS interpretation. This proposal focuses only on v3.3, the minimum operational concept — not consciousness claims or broad theory. Happy to provide an even more minimal working example if needed.
Hi everyone — Furkan here.
Thanks for taking the time to look at this project. Just to add a bit of context:
If reviewers have any questions, technical or conceptual, I’m very happy to answer them. I can also share:
Thanks again for considering this small exploratory project — even negative results could help clarify whether this direction is worth deeper investigation.
— Furkan