NoReC: The Norwegian Review Corpus
Abstract
This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1-6, as provided by the rating of the original author. This first release of the corpus comprises more than 35,000 reviews. It is distributed using the CoNLL-U format, pre-processed using UDPipe, along with a rich set of metadata. The work reported in this paper forms part of the SANT initiative (Sentiment Analysis for Norwegian Text), a project seeking to provide resources and tools for sentiment analysis and opinion mining for Norwegian. As resources for sentiment analysis have so far been unavailable for Norwegian, NoReC represents a highly valuable and sought-after addition to Norwegian language technology.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2017
- DOI:
- 10.48550/arXiv.1710.05370
- arXiv:
- arXiv:1710.05370
- Bibcode:
- 2017arXiv171005370V
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Pending (non-anonymous) review for LREC 2018