High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Hamid Reza ZOHOURI; Artur Podobas; SATOSHI MATSUOKA

doi:https://doi.org/10.1109/IPDPSW.2018.00027

論文・著書情報

タイトル

和文:
英文:	High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

著者

和文:	ハミドレザゾフーリ, Artur Podobas, 松岡聡.
英文:	Hamid Reza ZOHOURI, Artur Podobas, SATOSHI MATSUOKA.

言語

English

掲載誌/書名

和文:
英文:

巻, 号, ページ

出版年月

2018年8月6日

出版者

和文:
英文:

会議名称

和文:
英文:	2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

開催地

和文:
英文:	Vancouver, BC

公式リンク

https://ieeexplore.ieee.org/abstract/document/8425394/

DOI

https://doi.org/10.1109/IPDPSW.2018.00027

アブストラクト

In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective. This allows us to reach similar, or even higher, compute performance compared to first-order stencils. We use an OpenCL-based design that, apart from parameterizing performance knobs, also parameterizes the stencil radius. Furthermore, we show that our performance model exhibits the same accuracy as first-order stencils in predicting the performance of high-order ones. On an Intel Arria 10 GX 1150 device, for 2D and 3D star-shaped stencils, we achieve over 700 and 270 GFLOP/s of compute performance, respectively, up to a stencil radius of four. These results outperform the state-of-the-art YASK framework on a modern Xeon for 2D and 3D stencils, and outperform a modern Xeon Phi for 2D stencils, while achieving competitive performance in 3D. Furthermore, our FPGA design achieves better power efficiency in almost all cases.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報