Recurrent Stacking of Layers for Compact Neural Machine Translation Models

doi:10.48550/arXiv.1807.05353

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder. As a result, the addition of each new layer improves the translation quality significantly. However, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all the layers thereby leading to a recurrently stacked NMT model. We empirically show that the translation quality of a model that recurrently stacks a single layer 6 times is comparable to the translation quality of a model that stacks 6 separate layers. We also show that using pseudo-parallel corpora by back-translation leads to further significant improvements in translation quality.

Publication:

arXiv e-prints

Pub Date:

July 2018

DOI:

10.48550/arXiv.1807.05353

arXiv:

arXiv:1807.05353

Bibcode:

2018arXiv180705353D

Keywords:

Computer Science - Computation and Language

E-Print:

Version 2 (Current): Fixed Typos. Additional Results for models using back-translated data. Resized the figure. Better explanations of some parts. Version 1: Initial version

NASA/ADS

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Abstract