Efficient FFT mapping on GPU for radar processing application: modeling and implementation
Abstract
General-purpose multiprocessors (as, in our case, Intel IvyBridge and Intel Haswell) increasingly add GPU computing power to the former multicore architectures. When used for embedded applications (for us, Synthetic aperture radar) with intensive signal processing requirements, they must constantly compute convolution algorithms, such as the famous Fast Fourier Transform. Due to its "fractal" nature (the typical butterfly shape, with larger FFTs defined as combination of smaller ones with auxiliary data array transpose functions), one can hope to compute analytically the size of the largest FFT that can be performed locally on an elementary GPU compute block. Then, the full application must be organized around this given building block size. Now, due to phenomena involved in the data transfers between various memory levels across CPUs and GPUs, the optimality of such a scheme is only loosely predictable (as communications tend to overcome in time the complexity of computations). Therefore a mix of (theoretical) analytic approach and (practical) runtime validation is here needed. As we shall illustrate, this occurs at both stage, first at the level of deciding on a given elementary FFT block size, then at the full application level.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2015
- DOI:
- 10.48550/arXiv.1505.08067
- arXiv:
- arXiv:1505.08067
- Bibcode:
- 2015arXiv150508067A
- Keywords:
-
- Computer Science - Mathematical Software;
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Computer Science - Performance