Efficient Computation of Positional Population Counts Using SIMD Instructions
Abstract
In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as onehot encoded vectors. For example, given 8 distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of kbit words, we seek to compute k distinct sums corresponding to bit values at indexes 0, 1, 2, ..., k1. If the kbit words are onehot encoded then the sums correspond to a frequency histogram. This multiplesum problem is a generalization of the populationcount problem where we seek the sum of all bit values. Accordingly, we refer to the multiplesum problem as a positional populationcount. Using SIMD (Single Instruction, Multiple Data) instructions from recent Intel processors, we describe algorithms for computing the 16bit position population count using less than half of a CPU cycle per 16bit word. Our best approach uses up to 400 times fewer instructions and is up to 50 times faster than baseline code using only regular (nonSIMD) instructions, for sufficiently large inputs.
 Publication:

arXiv eprints
 Pub Date:
 November 2019
 arXiv:
 arXiv:1911.02696
 Bibcode:
 2019arXiv191102696K
 Keywords:

 Computer Science  Data Structures and Algorithms