We initiate a resource-aware theory of language generation in the limit under the minimal constraint of space efficiency. In our framework, a learner observes an adversarial positive stream from a target language K and must eventually output a hallucination-free hypothesis language L \subseteq K while omitting at most Δ strings of K. We focus on C_{s,k}, the collection of languages recognized by DFAs with at most s states over an alphabet of size k, as the natural hypothesis class for memory-bounded learners. In the exponential-space regime, we prove that a learner can exactly identify the target K. Under a stricter memory budget, we characterize the strongest possible generation guarantees. In particular, we present a streaming algorithm using poly(s,k) space that converges to a hypothesis with generation gap Δ= O(k^{2s-2}). Moreover, the learned hypothesis captures every string in K of length at least 2s-1. We complement this result with a near-matching lower bound through a reduction from a standard communication complexity problem. Specifically, achieving generation gap Δ\le k^{(1-\varepsilon)s} requires k^{Ω(\varepsilon s)} memory. Together, these results reveal a sharp transition between polynomial-space generation and exponential-space exact identification.
Space-Efficient Language Generation in the Limit
We initiate a resource-aware theory of \textit{language generation in the limit} under the minimal constraint of space efficiency. In our framework, a learner observes an adversarial positive stream from a target language $K$ and must eventually output a hallucination-free…
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2606.25777CC-BY-4.0
- TL;DR
- Semantic Scholar