Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

doi:10.48550/arXiv.2208.09770

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.

Publication:

arXiv e-prints

Pub Date:

August 2022

DOI:

10.48550/arXiv.2208.09770

arXiv:

arXiv:2208.09770

Bibcode:

2022arXiv220809770H

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
cs.CL;
cs.GL;
I.2;
I.7

E-Print:

16 pages, 3 figures. Accepted as long paper in main conference of ACL 2023

NASA/ADS

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

Abstract