Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core Utilization

Taisuke Boku; Masatake Sugita; Ryohei Kobayashi; Shinnosuke Furuya; Takuya Fujie; Masahito Ohue; Yutaka Akiyama

doi:10.1145/3673038.3673097

Publication Information

Title

Japanese:
English:	Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core Utilization

Author

Japanese:	Taisuke Boku, 杉田昌岳, Ryohei Kobayashi, Shinnosuke Furuya, 藤江拓哉, 大上雅史, 秋山泰.
English:	Taisuke Boku, Masatake Sugita, Ryohei Kobayashi, Shinnosuke Furuya, Takuya Fujie, Masahito Ohue, Yutaka Akiyama.

Language

English

Journal/Book name

Japanese:
English:	Proceedings of the 53rd International Conference on Parallel Processing (ICPP2024)

Volume, Number, Page

Page 1082-1091

Published date

Aug. 12, 2024

Publisher

Japanese:
English:	Association for Computing Machinery

Conference name

Japanese:
English:	53rd International Conference on Parallel Processing (ICPP2024)

Conference site

Japanese:
English:	Gotland

Official URL

https://dl.acm.org/doi/10.1145/3673038.3673097

DOI

https://doi.org/10.1145/3673038.3673097

Abstract

While GPUs are the main players of the accelerating devices on high performance computing systems, their performance depends on how to utilize a numerous number of cores in parallel on each device. Typically, a loop structure with a number of iterations is assigned to a device to utilize their cores to map calculations in iterations so that there must be enough count of iterations to fill the thousands of GPU cores in the high-end GPUs. In the advanced GPU represented by NVIDIA H100, several techniques, such as Multi-Process Service (MPS) or Multi-Instance GPU (MIG), which divides GPU cores to be mapped to the multiple user processes, are provided to enhance the core utilization even in a case with a small degree of parallelism. We apply MPS to a practical Molecular Dynamics (MD) simulation with AMBER software for improving the efficiency of GPU core utilization to save the computation resources. The critical issue here is to analyze the core utilization and overhead when running multiple processes on a GPU device as well as the multi-GPU and multi-node parallel execution for overall performance improvement. In this paper, we introduce a method to apply MPS for AMBER to simulate the membrane permeation process of a drug candidate peptide by a two-dimensional replica-exchange method on an advanced supercomputer with NVIDIA H100. We applied several optimizations on parameter settings with NVIDIA H100 and V100 GPUs investigating their performance behavior. Finally, we found that the GPU core utilization improves up to twice compared with a simple process assignment method to maximize the GPU utilization efficiency.

Home

Search

Support

About T2R2

Related Links

Publication Information