PCIe presents a bottleneck when moving data from the CPU to the GPU. With the integration of NVIDIA NVLink technology on POWER8 CPUs, it allows data to flow over 2.5x faster than comparable x86-based systems to NVIDIA Tesla Pascal GPUs (SXM2). The POWER8 CPU is the only processor that features the NVLink interface, and provides the GPU with access to memory that resides on the CPU side of the interconnect, improving the transfer of data between processors.
How it works
The Exxact Tensor TXR210-2000R, which features dual POWER8 with NVLink processors and 4x Tesla P100 Pascal GPUs (SXM2), interconnects multiple GPUs (up to four Tesla P100 in this case) with NVLink. Each CPU and GPU has four interconnects that total 80GB/s of bandwidth. Below is an example of dual POWER8 processors and quad P100s directly connected to each other:
NVLink provides a solution to the limitation of the the PCIe data pipe to the GPU. This allows faster communication than a PCIe x16 Gen3 connection and allows faster data exchange and application performance
In terms of programmability, CUDA 8 and the Page Migration Engine featured in the Tesla P100 uses a unified memory space with automated data management between GPU memory and system memory connected to the CPU. NVLink accelerates applications with GPUs by avoiding data management when moving functions from the CPU to the GPU. Because NVLink improves CPU to GPU communication time, smaller pieces of work can be moved to the GPU for acceleration, allowing more parts of an application to be GPU accelerated.
NVLink improves application performance by speeding up data movement in multi-GPU configurations. Applications that rely on exchanging data across GPUs can run much faster using NVLink than through the PCIe bus. Below is a list of some applications that can benefit from NVLink:
- Multi-GPU Exchange and sort
- Fast Fourier Transform (FFT)
- AMBER – Molecular Dynamics (PMEMD)
- ANSYS Fluent – Computational Fluid Dynamics
- Lattice Quantum Chromodynamics (LQCD)