Entropy-Based Incremental Indexing Term Selection Method for N-gram Full Text Search System
Abstract
N-gram indexing is the most popular algorithm for the full text search system where each index consists of serial N characters. Especially the full text search for Japanese text usually has the 2-gram characters index as base. The additional higher-gram index is expected to improve the performance. This paper presents the entropy-based method for mining additional indexing terms from DB in order to reduce the waste of AND operation for 2gram.
- Publication:
-
IEEJ Transactions on Electronics, Information and Systems
- Pub Date:
- 2005
- DOI:
- 10.1541/ieejeiss.125.730
- Bibcode:
- 2005ITEIS.125..730Y
- Keywords:
-
- Full Text Search;
- n-gram index;
- Knowledge Management;
- Enterprise Information Systems