0

KaWAT: A Word Analogy Task Dataset for Indonesian

Evaluation of pretrained Indonesian word embeddings and embeddings trained on Indonesian online news shows they improve performance in downstream tasks.

Year
2019
Venue
arXiv 2019
Authors
1
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/1906.09912ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We introduced KaWAT (Kata Word Analogy Task), a new word analogy task dataset for Indonesian. We evaluated on it several existing pretrained Indonesian word embeddings and embeddings trained on Indonesian online news corpus. We also tested them on two downstream tasks and found that pretrained word embeddings helped either by reducing the training epochs or yielding significant performance gains.

Authors

1