Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer
Abstract
Style transfer is the task of automatically transforming a piece of text in one particular style into another. A major barrier to progress in this field has been a lack of training and evaluation datasets, as well as benchmarks and automatic metrics. In this work, we create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work. We also discuss challenges of using automatic metrics.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2018
- DOI:
- 10.48550/arXiv.1803.06535
- arXiv:
- arXiv:1803.06535
- Bibcode:
- 2018arXiv180306535R
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- To appear in the proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018