We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.
Restoring Hebrew Diacritics Without a Dictionary
NAKDIMON, a two-layer character-level LSTM, achieves similar diacritization performance as complex systems without using human-curated resources.
- Year
- 2021
- Venue
- Findings (NAACL) 2022 7
- Authors
- 2
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2105.05209v4ARXIV-DEFAULT
- TL;DR
- Semantic Scholar