Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm
Abstract
A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2013
- DOI:
- 10.48550/arXiv.1312.4824
- arXiv:
- arXiv:1312.4824
- Bibcode:
- 2013arXiv1312.4824P
- Keywords:
-
- Computer Science - Information Retrieval;
- Computer Science - Computation and Language
- E-Print:
- 10 pages