The APGAS programming model abstracts deep memory hierarchy such as distributed memory and GPU device memory by a global view of data and asynchronous operations on massively parallel computing environments. However, how much GPUs accelerate applications using the APGAS model remains unclear. In order to understand the effectiveness of using GPUs in the APGAS model, we give a comparative performance analysis of the APGAS model in X10 on GPUs with a standard massage passing model using lattice QCD. Our experimental results on TSUBAME2.5 show that our X10 CUDA implementation on 32 GPUs exhibits 19.4x speedup over X10 C++ on multi-core CPUs, and comparative performance with MPI CUDA in weak scaling. The results indicate that the APGAS programming model on GPUs scales well and accelerates the lattice QCD application significantly.