Non-autoregressive Streaming Transformer for Simultaneous Translation
Abstract
Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. We enable NAST to generate the blank token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and train it to maximize the non-monotonic latent alignment with an alignment-based latency loss. Experiments on various SiMT benchmarks demonstrate that NAST outperforms previous strong autoregressive SiMT baselines.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2023
- DOI:
- arXiv:
- arXiv:2310.14883
- Bibcode:
- 2023arXiv231014883M
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Artificial Intelligence;
- I.2.7
- E-Print:
- EMNLP 2023 main conference