0

Better Neural Machine Translation by Extracting Linguistic Information from BERT

Using dense vector-based linguistic information extracted from BERT improves neural machine translation performance and generalization without increasing training complexity.

Year
2021
Venue
EACL 2021 2
Authors
2
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2104.02831ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Adding linguistic information (syntax or semantics) to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models. Directly using the capacity of massive pre-trained contextual word embedding models such as BERT (Devlin et al., 2019) has been marginally useful in NMT because effective fine-tuning is difficult to obtain for NMT without making training brittle and unreliable. We augment NMT by extracting dense fine-tuned vector-based linguistic information from BERT instead of using point estimates. Experimental results show that our method of incorporating linguistic information helps NMT to generalize better in a variety of training contexts and is no more difficult to train than conventional Transformer-based NMT.

Authors

2