SRT-introspect: Live Token-by-Token Readout of LLM Internal Reasoning
I have released SRT-introspect, a new public demonstration that makes the hidden reasoning process of a frozen large language model visible in real time.
The interface runs a Qwen-2.5-7B backbone equipped with the SRT Adapter and Activation Verbalizer. As the model generates each token, the system continuously measures divergence across attention heads, identifies high-signal moments, and translates the corresponding hidden-state object representations into natural-language verbalizations. You see exactly what the model is internally representing at the precise points where its computation is most active, complete with divergence scores, reflexivity estimates, and per-layer traces.
This is not a summary of the final output. It is a direct window into the model’s latent conceptual landscape, showing the dominant training-data attractors that activate even when the prompt asks for first-principles reasoning. The adaptive scheduler concentrates verbalizations precisely where the real internal work occurs, turning what used to be opaque black-box generation into observable, analyzable data.
The result is the clearest public demonstration yet that modern LLMs possess a rich, structured semiotic infrastructure that can now be audited without retraining or fine-tuning.
SRT-introspect: Live Token-by-Token Readout of LLM Internal Reasoning
I have released SRT-introspect, a new public demonstration that makes the hidden reasoning process of a frozen large language model visible in real time.
The interface runs a Qwen-2.5-7B backbone equipped with the SRT Adapter and Activation Verbalizer. As the model generates each token, the system continuously measures divergence across attention heads, identifies high-signal moments, and translates the corresponding hidden-state object representations into natural-language verbalizations. You see exactly what the model is internally representing at the precise points where its computation is most active, complete with divergence scores, reflexivity estimates, and per-layer traces.
This is not a summary of the final output. It is a direct window into the model’s latent conceptual landscape, showing the dominant training-data attractors that activate even when the prompt asks for first-principles reasoning. The adaptive scheduler concentrates verbalizations precisely where the real internal work occurs, turning what used to be opaque black-box generation into observable, analyzable data.
The result is the clearest public demonstration yet that modern LLMs possess a rich, structured semiotic infrastructure that can now be audited without retraining or fine-tuning.
A single forward pass of the frozen Qwen-2.5-7B model plus a lightweight classifier reaches 0.866 plus or minus 0.011 AUC on the full TruthfulQA-MC2 benchmark. No adapters. No fine-tuning. No extra parameters on the backbone.
This is the strongest hidden-state truthfulness detector reported on the benchmark to date.
The same latent features that the SRT-NLA-AV-v1 demo reads out as coherent natural-language verbalizations turn out to be rich enough to support production-grade auditing for honesty versus hallucination. The internal semiotic infrastructure we have been exploring in public is already information-dense enough to solve hard downstream problems with almost trivial overhead.
A single forward pass of the frozen Qwen-2.5-7B model plus a lightweight classifier reaches 0.866 plus or minus 0.011 AUC on the full TruthfulQA-MC2 benchmark. No adapters. No fine-tuning. No extra parameters on the backbone.
This is the strongest hidden-state truthfulness detector reported on the benchmark to date.
The same latent features that the SRT-NLA-AV-v1 demo reads out as coherent natural-language verbalizations turn out to be rich enough to support production-grade auditing for honesty versus hallucination. The internal semiotic infrastructure we have been exploring in public is already information-dense enough to solve hard downstream problems with almost trivial overhead.
🧠New Space: MindReader-NLA — ask a frozen LM what it's thinking, in plain English.
A trained Activation Verbalizer (~5–13M params, frozen backbone) over Qwen-2.5-7B, Llama-3.2-3B, and Gemma-2-2B. Three demos in one Space:
Playground — sample K verbalizations of the layer-L hidden state and score how well each reproduces the original activation when fed back through the same frozen model (raw + anisotropy-centred cosine FVE).
Live Thought Trace — stream a verbalization per token as the model writes, side-by-side with the generation.
Steer-by-Editing — edit the verbalized thought, project it back into hidden-state space, and watch the continuation change.
🧠New Space: MindReader-NLA — ask a frozen LM what it's thinking, in plain English.
A trained Activation Verbalizer (~5–13M params, frozen backbone) over Qwen-2.5-7B, Llama-3.2-3B, and Gemma-2-2B. Three demos in one Space:
Playground — sample K verbalizations of the layer-L hidden state and score how well each reproduces the original activation when fed back through the same frozen model (raw + anisotropy-centred cosine FVE).
Live Thought Trace — stream a verbalization per token as the model writes, side-by-side with the generation.
Steer-by-Editing — edit the verbalized thought, project it back into hidden-state space, and watch the continuation change.