Entropy-Gated Latent Recursion

Inference-time scaling has become the dominant lever for improving language-model reasoning, but existing methods derive rollout diversity from a single source: stochastic token-level sampling. We argue that this single-axis sampling space is fundamentally limiting, and identify a second, fully deterministic and complementary axis: the layer span L at which a frozen model's top decoder layers are recursively re-applied at high-uncertainty tokens. Different choices of L produce distinct rollouts that solve different subsets of problems, with no stochasticity. We instantiate this axis through Entropy-Gated Latent Recursion (EGLR), a training-free decoding procedure that re-applies the top-L layers for at most K_{\max} iterations until the next-token distribution converges. Combined with T temperature samples, EGLR turns a single-axis stochastic rollout pool into an L\times T Cartesian sampling space at almost the same per-rollout cost. We characterize this space across 8 instruction-tuned models and 6 math reasoning benchmarks, and show that the L-axis is genuinely complementary to temperature: on MATH-500 with Qwen2.5-3B-Instruct, the joint L\times T oracle reaches 91.6%, +8.2 percentage points beyond the temperature-only oracle (83.4%) and +10.4 points beyond the layer-only oracle (81.2%), confirming that the two axes capture genuinely complementary problems. The expanded rollout pool provides richer per-prompt candidates for any downstream procedure that consumes rollouts, including self-consistency, best-of-N with verifiers, and group-relative RL training (GRPO), opening a new direction for inference-time scaling that does not rely on stochastic noise.