Space-Efficient Language Generation in the Limit

We initiate a resource-aware theory of language generation in the limit under the minimal constraint of space efficiency. In our framework, a learner observes an adversarial positive stream from a target language K and must eventually output a hallucination-free hypothesis language L \subseteq K while omitting at most Δ strings of K. We focus on C_{s,k}, the collection of languages recognized by DFAs with at most s states over an alphabet of size k, as the natural hypothesis class for memory-bounded learners. In the exponential-space regime, we prove that a learner can exactly identify the target K. Under a stricter memory budget, we characterize the strongest possible generation guarantees. In particular, we present a streaming algorithm using poly(s,k) space that converges to a hypothesis with generation gap Δ= O(k^{2s-2}). Moreover, the learned hypothesis captures every string in K of length at least 2s-1. We complement this result with a near-matching lower bound through a reduction from a standard communication complexity problem. Specifically, achieving generation gap Δ\le k^{(1-\varepsilon)s} requires k^{Ω(\varepsilon s)} memory. Together, these results reveal a sharp transition between polynomial-space generation and exponential-space exact identification.