0

Greedy Grammar Induction with Indirect Negative Evidence

This paper proposes a non-lexicalized grammar-induction procedure that separates two tests: recognition of the observed finite presentation, and rejection of short preterminal strings generated by a hypothesis but unsupported by the evidence.

Year
2023
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2312.15321CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

This paper proposes a non-lexicalized grammar-induction procedure that separates two tests: recognition of the observed finite presentation, and rejection of short preterminal strings generated by a hypothesis but unsupported by the evidence. The central object is the rule-coverage bound \ell^(G): the maximum, over rules in G, of the length of the shortest preterminal string whose derivation uses that rule. This bound induces the comparison universe Σ_{pre}^{\le \ell^(G)}, where unsupported generated strings serve as indirect evidence against overgenerating hypotheses. We give a greedy search algorithm over rule sets and prove a conditional weak-recovery theorem: under explicit reachability conditions and sufficient saturation of the presentation, the exact learner reaches a grammar weakly equivalent to the unknown target. The complexity analysis is slice-wise: for each fixed incrementality radius k, the search explores polynomially many rule-set extensions in the finite rule universe. Across 31 benchmark runs spanning Dyck-k languages (1\le k\le4), palindromes, a^n b^n, English-like recursive fragments, and an inherently ambiguous union language, grammar-level analysis establishes weak equivalence between every returned grammar and its target.