Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Akira Nukada; Yasuhiko Ogata; Toshio Endo; Satoshi Matsuoka

doi:10.1109/SC.2008.5213210

論文・著書情報

タイトル

和文:
英文:	Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

著者

和文:	額田彰, 尾形泰彦, 遠藤敏夫, 松岡聡.
英文:	Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka.

言語

English

掲載誌/書名

和文:
英文:	Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC08)

巻, 号, ページ

pp. 1-11

出版年月

2008年11月

出版者

和文:
英文:	IEEE

会議名称

和文:
英文:	2008 ACM/IEEE conference on Supercomputing (SC08)

開催地

和文:
英文:	Austin, Texas

DOI

https://doi.org/10.1109/SC.2008.5213210

アブストラクト

Most GPU performance "hypes" have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being more than three times faster than any existing FFT implementations on GPUs including CUFFT. Careful programming techniques are employed to fully exploit modern GPU hardware characteristics while overcoming their limitations, including on-chip shared memory utilization, optimizing the number of threads and registers through appropriate localization, and avoiding low-speed stride memory accesses. Our kernel applied to real applications achieves orders of magnitude boost in power&cost vs. performance metrics. The off-card bandwidth limitation is still an issue, which could be alleviated somewhat with application kernels confinement within the card, while ideal solution being facilitation of faster GPU interfaces.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報