Hierarchical Transformer Network for Utterance-level Emotion Recognition
Abstract
While there have been significant advances in de-tecting emotions in text, in the field of utter-ance-level emotion recognition (ULER), there are still many problems to be solved. In this paper, we address some challenges in ULER in dialog sys-tems. (1) The same utterance can deliver different emotions when it is in different contexts or from different speakers. (2) Long-range contextual in-formation is hard to effectively capture. (3) Unlike the traditional text classification problem, this task is supported by a limited number of datasets, among which most contain inadequate conversa-tions or speech. To address these problems, we propose a hierarchical transformer framework (apart from the description of other studies, the "transformer" in this paper usually refers to the encoder part of the transformer) with a lower-level transformer to model the word-level input and an upper-level transformer to capture the context of utterance-level embeddings. We use a pretrained language model bidirectional encoder representa-tions from transformers (BERT) as the lower-level transformer, which is equivalent to introducing external data into the model and solve the problem of data shortage to some extent. In addition, we add speaker embeddings to the model for the first time, which enables our model to capture the in-teraction between speakers. Experiments on three dialog emotion datasets, Friends, EmotionPush, and EmoryNLP, demonstrate that our proposed hierarchical transformer network models achieve 1.98%, 2.83%, and 3.94% improvement, respec-tively, over the state-of-the-art methods on each dataset in terms of macro-F1.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2020
- DOI:
- 10.48550/arXiv.2002.07551
- arXiv:
- arXiv:2002.07551
- Bibcode:
- 2020arXiv200207551L
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Machine Learning;
- Computer Science - Sound;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- 7 pages