We investigate whether graded states of mind form spectrum-like structure in transformer representation spaces. To do so, we construct a dataset of 636 short natural-language sentences annotated with both a continuous score from -5 to 5 and one of seven ordered tiers, ranging from collapsed or scarcity-driven expressions to more coherent, reflective, and integrative ones. We evaluate five frozen transformer representations: four sentence-embedding models and one decoder-only residual-stream representation. Across all representations, simple probes reliably recover both the continuous score and the discrete tier labels, and permutation tests show that performance significantly exceeds shuffled-label baselines. Additional analyses reveal a consistent geometric pattern: UMAP projections show low-to-high organization, confusion matrices concentrate errors between neighboring tiers, and directional ablation identifies a prominent score-aligned component. These results suggest that transformer representations contain statistically significant, spectrum-like organization aligned with the annotated state-of-mind structure. The annotations are used only as an operational framework for representation analysis, not as a clinical or diagnostic measure.
Probing Spectrum-Like Organization of States of Mind in Transformer Representation Spaces
We investigate whether graded states of mind form spectrum-like structure in transformer representation spaces. To do so, we construct a dataset of 636 short natural-language sentences annotated with both a continuous score from $-5$ to $5$ and one of seven ordered tiers,…
- Preview

- Year
- 2025
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2512.22227ARXIV-DEFAULT
- TL;DR
- Semantic Scholar