Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Yusuke Nagasaka; Akira Nukada; Satoshi Matsuoka

doi:http://dx.doi.org/10.1016/j.procs.2016.05.304

論文・著書情報

タイトル

和文:
英文:	Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

著者

和文:	長坂侑亮, 額田彰, 松岡聡.
英文:	Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka.

言語

English

掲載誌/書名

和文:
英文:	Procedia Computer Science

巻, 号, ページ

Volume 80 pp. 131-142

出版年月

2016年6月1日

出版者

和文:
英文:

会議名称

和文:
英文:	International Conference on Computational Science (ICCS 2016)

開催地

和文:
英文:	San Diego, CA

DOI

http://dx.doi.org/10.1016/j.procs.2016.05.304

アブストラクト

Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core processors the performance of SpMV is still strongly limited by memory bandwidth and lower locality of memory access to input vector causes further performance degradation. We propose a new sparse matrix format called the Adaptive Multi-level Blocking (AMB) format, which aggressively reduces the memory traffic in SpMV computation to improve performance. By several optimization techniques such as division and blocking of the given matrix, the column indices are compressed and the reusability of input vector element in the cache is highly improved. An auto-tuning mechanism determines the best set of parameters for each matrix data by estimating the memory traffic and predicting the performance of a given SpMV computation. For 32 matrix datasets taken from the Sparse Matrix Collection collected by the University of Florida, AMB format achieves speedups of up to x2.92 compared to NVIDIA's cuSparse library and up to x1.40 compared to yaSpMV, which was recently proposed and has been the best known library to date for fast SpMV computation.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報