filmov
tv
NAMD and VMD Performance on ARM GPU Platforms, AHUG SC'20
Показать описание
Presenter: John Stone, University of Illinois
Abstract: This talk will briefly summarize and provide a updates on the current state
of GPU-accelerated ARM platform support in the NAMD parallel molecular dynamics
engine, and VMD, a high-performance molecular modeling environment for preparing, visualizing, and analyzing biomolecular simulations. Using the "Wombat" ARM64 cluster at Oak Ridge National Laboratory, we have recently had the opportunity to benchmark current versions of NAMD and VMD for representative molecular modeling tasks, with particular emphasis on compute heavy-operations that benefit from CUDA GPU-accelerated kernels and heterogeneous computing techniques. We highlight results that demonstrate areas where performance on the Wombat cluster is most comparable to that achieved on high-end Intel x86- and IBM POWER9-based compute nodes,
and areas where the individual strengths and weaknesses of the different platforms contribute to particular performance advantages or differences. We identify cases where our current ARM64 developments will benefit from completion of in-progress development of ARM64-specific vectorized
kernels. Finally, we present observations from very early experiences with the ARM scalable vector extensions (SVE), and vector length agnostic programming approaches, as compared with traditional vectorization on fixed-length SIMD hardware architectures.
Abstract: This talk will briefly summarize and provide a updates on the current state
of GPU-accelerated ARM platform support in the NAMD parallel molecular dynamics
engine, and VMD, a high-performance molecular modeling environment for preparing, visualizing, and analyzing biomolecular simulations. Using the "Wombat" ARM64 cluster at Oak Ridge National Laboratory, we have recently had the opportunity to benchmark current versions of NAMD and VMD for representative molecular modeling tasks, with particular emphasis on compute heavy-operations that benefit from CUDA GPU-accelerated kernels and heterogeneous computing techniques. We highlight results that demonstrate areas where performance on the Wombat cluster is most comparable to that achieved on high-end Intel x86- and IBM POWER9-based compute nodes,
and areas where the individual strengths and weaknesses of the different platforms contribute to particular performance advantages or differences. We identify cases where our current ARM64 developments will benefit from completion of in-progress development of ARM64-specific vectorized
kernels. Finally, we present observations from very early experiences with the ARM scalable vector extensions (SVE), and vector length agnostic programming approaches, as compared with traditional vectorization on fixed-length SIMD hardware architectures.