Summarising Historical Text in Modern Languages

doi:10.48550/arXiv.2101.10759

Summarising Historical Text in Modern Languages

We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.

Publication:

arXiv e-prints

Pub Date:

January 2021

DOI:

10.48550/arXiv.2101.10759

arXiv:

arXiv:2101.10759

Bibcode:

2021arXiv210110759P

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
Computer Science - Computers and Society;
Computer Science - Machine Learning

E-Print:

To appear at EACL 2021

NASA/ADS

Summarising Historical Text in Modern Languages

Abstract