Auto-Tuning 3-D FFT Library for CUDA GPUs

Akira Nukada; Satoshi Matsuoka

doi:10.1145/1654059.1654090

論文・著書情報

タイトル

和文:
英文:	Auto-Tuning 3-D FFT Library for CUDA GPUs

著者

和文:	額田彰, 松岡聡.
英文:	Akira Nukada, Satoshi Matsuoka.

言語

English

掲載誌/書名

和文:
英文:	Proceedings of the 2009 ACM/IEEE conference on Supercomputing (SC09)

巻, 号, ページ

出版年月

2009年11月

出版者

和文:
英文:	ACM

会議名称

和文:
英文:	2009 ACM/IEEE conference on Supercomputing (SC09)

開催地

和文:
英文:	Portland, Oregon

公式リンク

http://sc09.supercomputing.org/

DOI

https://doi.org/10.1145/1654059.1654090

アブストラクト

Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not perform as well in other sizes that appear in practice. Our new auto-tuning 3-D FFT on CUDA generates high performance CUDA kernels for FFTs of varying transform sizes, alleviating this problem. Although auto-tuning has been implemented on GPUs for dense kernels such as DGEMM and stencils, this is the first instance that has been applied comprehensively to bandwidth intensive and complex kernels such as 3-D FFTs. Bandwidth intensive optimizations such as selecting the number of threads and inserting padding to avoid bank conflicts on shared memory are systematically applied. Our resulting autotuner is fast and results in performance that essentially beats all 3-D FFT implementations on a single processor to date, and moreover exhibits stable performance irrespective of problem sizes or the underlying GPU hardware.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報