Reusing Weights in Subword-aware Neural Language Models
Abstract
We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2018
- DOI:
- 10.48550/arXiv.1802.08375
- arXiv:
- arXiv:1802.08375
- Bibcode:
- 2018arXiv180208375A
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Neural and Evolutionary Computing;
- Statistics - Machine Learning;
- 68T50;
- I.2.7
- E-Print:
- accepted to NAACL 2018