High-Performance Mixed Precision Numerical Linear Algebra
Erin Carson (Numerical Mathematics, Charles University)
Support for floating point arithmetic in multiple precisions is becoming increasingly common in emerging architectures. For example, half precision is available in the NVIDIA V100 and A100 GPUs, on which it runs twice as fast as single precision with a proportional savings in energy consumption. Further, using the specialized half-precision tensor cores can provide up to 16x speedup over double precision computations. Mixed precision capabilities are already included in many machines on the TOP500 list and are expected to be a crucial hardware feature in coming exascale machines. From a computational scientists perspective, our goal is to determine how and where we can exploit mixed precision computation in our codes. This requires both an understanding of performance characteristics as well as an understanding of the numerical behavior of algorithms in finite precision arithmetic. In this talk, we discuss recent and ongoing efforts in this area. In particular, we present and analyze a general algorithm for solving nxn nonsingular linear systems Ax = b based on iterative refinement in three precisions. From this, we develop GMRES-IR, a three-precision GMRES-based iterative refinement scheme that works for even ill-conditioned systems. We discuss performance results on modern GPU architectures and present the HPL-AI benchmark, based on GMRES-IR, which has exceeded exaflop performance in mixed precision on the Fugaku supercomputer, triple the performance achieved on the standard HPL benchmark used for the TOP500.