A comparative analysis of OpenMP and CUDA performance as exemplified by the computation of Fourier transform
Keywords:
parallel computing, Fourier transform, NVIDIA CUDA, OpenMP, digital processingAbstract
A comparative analysis of the performance of the two technologies of parallel computing, OpenMP and nVidia CUDA have been carried out as exemplified by the computation of Fourier transform. It was obtained that the execution time for the Fourier transform on multi-core central processor depends on the number of cores nonlinearly. In addition, the form of this dependence changes because of the number of threads: for the threads whose number is lower than that of hard cores the dependence is powerlike whereas for the threads whose number is higher than the hard cores number the dependence is exponential. The maximum efficiency of computation with the use of OpenMP can be achieved when the number of threads used in the program is twice the number of hard cores. The comparison conducted for this case showed that for a small number of frames OpenMP is more efficient in terms of execution time, otherwise, CUDA offers an advantage.
References
2. H.Ü. Dinkelbach, J. Vitay, F. Beuth and Hamker Fred H., Computation in Neural Systems, 23(4), 212-236, (2012).
3. Cleverson Lopes Ledur, Carlos M. D. Zeve, Julio C. S. dos Anjos, 11th Workshop on Parallel and Distributed Processing (WSPPD), 2013.
4. K.M. Khankin, Messenger of SUSU, Computer technology, management, electronics series, 13(1), 34-41, (2013).
5. Yang C.-T., Huang C.-L., Lin C.-F. Hybrid, Computer Physics Communications, 182, 266–269, (2011).
6. L.R. Rabiner and R.W. “Schafer Digital processing of speech signals”, (Prentice-Hall, 1978).
7. OpenMP Application Program Interface. Version 3.1 July 2011. http://www.openmp.org/mp-documents/OpenMP3.1.pdf
8. What is CUDA. – http://developer.nvidia.com/what-cuda
9. Hastie, Tibshirani and Friedman, “The Elements of Statistical Learning” (2nd edition). (Springer-Verlag, 2009, 763 p.)