In this case study, we investigate the impact of workload balance on the performance of multi-FPGA codes. We start with an application in which two distinct kernels run in parallel on two SRC-6 MAP processors. We observe that one of the MAP processors is idle 18% of the time while the other processor is fully utilized. We investigate a task redistribution schema which serializes the execution of the two kernels, yet parallelizes execution of each individual kernel by spreading the workload between two MAP processors. This implementation results in a near 100% utilization of both MAP processors and the overall application performance is improved by 9%.
- Pub Date:
- November 2007
- On speeding up 2PCF calculations using field-programmable gate arrays, appeared in Proc. 3rd Annual Reconfigurable Systems Summer Institute - RSSI'07, 2007, 8 pages