HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of ngram Statistics
Abstract
Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of ngram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional ngram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speedups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, the memory reduction was about 100 times and train and test speedups were over 100 times. More importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting the strict dependency between the dimensionality of the representation and the parameters of ngram statistics, thus, opening a room for tradeoffs.
 Publication:

arXiv eprints
 Pub Date:
 March 2020
 arXiv:
 arXiv:2003.01821
 Bibcode:
 2020arXiv200301821A
 Keywords:

 Computer Science  Computation and Language
 EPrint:
 17 pages, 1 figure, 12 tables