Bearing Syntactic Fruit with Stack-Augmented Neural Networks

When children learn language, they make syntactic generalizations based on hierarchical rules. A recent line of work has inquired as to whether common neural network architectures share this inductive bias for hierarchical syntax, finding that they do so only under special conditions: when augmented with ground-truth parse tree structures, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that generalize in human-like fashion when trained only on surface forms: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack, one of which leverages nondeterminism. We find that transformers with nondeterministic stacks generalize best on multiple tasks designed to measure hierarchical inductive bias. This suggests that stack-augmented neural networks may be more accurate models of human syntax acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.