0

Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

Transformers used for evidence-grounded binary adjudication (e.g., support/refute, yes/no, or verifier-backed pass/fail decisions) can be sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers…

Preview
Year
2025
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2509.11208CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Transformers used for evidence-grounded binary adjudication (e.g., support/refute, yes/no, or verifier-backed pass/fail decisions) can be sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers under a verifier-relative Bernoulli predicate. We treat evidence order as a nuisance variable and formalize an expectation-realization gap: next-token training can minimize expected conditional description length over orderings while a fixed ordering remains position-sensitive. Our Quantified Martingale Violation (QMV) bound predicts the dispersion induced by adjacent-rank positional sensitivity, with O(\log n) growth in the harmonic regime; our Expectation-level Decompression Law (EDFL) specializes a KL convexity/data-processing bound to Bernoulli predicates, yielding Bits-to-Trust (B2T), Risk-of-Hallucination (RoH), and an Information Sufficiency Ratio (ISR) gate for answer/abstain decisions. On 3,059 grounded items from FEVER, HotpotQA, NQ-Open, PopQA, and Controls, we observe logarithmic dispersion and positive Jensen gains from uniform permutation mixtures. In one pre-specified held-out audit (528 items), the analytically fixed ISR=1 gate attains 0.0-0.7% hallucination with 20.6-27.9% abstention (95% CIs), supporting the operating point without claiming universal calibration across all model families or unrestricted generation.