Evaluating Informal-Domain Word Representations With UrbanDictionary

doi:10.48550/arXiv.1606.08270

Evaluating Informal-Domain Word Representations With UrbanDictionary

Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.

Publication:

arXiv e-prints

Pub Date:

June 2016

DOI:

10.48550/arXiv.1606.08270

arXiv:

arXiv:1606.08270

Bibcode:

2016arXiv160608270S

Keywords:

Computer Science - Computation and Language

NASA/ADS

Evaluating Informal-Domain Word Representations With UrbanDictionary

Abstract