This paper proposes a non-lexicalized grammar-induction procedure that separates two tests: recognition of the observed finite presentation, and rejection of short preterminal strings generated by a hypothesis but unsupported by the evidence. The central object is the rule-coverage bound \ell^(G): the maximum, over rules in G, of the length of the shortest preterminal string whose derivation uses that rule. This bound induces the comparison universe Σ_{pre}^{\le \ell^(G)}, where unsupported generated strings serve as indirect evidence against overgenerating hypotheses. We give a greedy search algorithm over rule sets and prove a conditional weak-recovery theorem: under explicit reachability conditions and sufficient saturation of the presentation, the exact learner reaches a grammar weakly equivalent to the unknown target. The complexity analysis is slice-wise: for each fixed incrementality radius k, the search explores polynomially many rule-set extensions in the finite rule universe. Across 31 benchmark runs spanning Dyck-k languages (1\le k\le4), palindromes, a^n b^n, English-like recursive fragments, and an inherently ambiguous union language, grammar-level analysis establishes weak equivalence between every returned grammar and its target.
Greedy Grammar Induction with Indirect Negative Evidence
This paper proposes a non-lexicalized grammar-induction procedure that separates two tests: recognition of the observed finite presentation, and rejection of short preterminal strings generated by a hypothesis but unsupported by the evidence.
- Year
- 2023
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2312.15321CC-BY-4.0
- TL;DR
- Semantic Scholar