HardwareEfficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation
Abstract
This paper presents a structural design of the hardwareefficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naive method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a highperformance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect.
 Publication:

arXiv eprints
 Pub Date:
 November 2018
 arXiv:
 arXiv:1811.03458
 Bibcode:
 2018arXiv181103458C
 Keywords:

 Electrical Engineering and Systems Science  Signal Processing;
 Computer Science  Hardware Architecture
 EPrint:
 3 pages, 5 figures