Using NVIDIA GPUDirect RDMA with Chelsio’s iWARP RDMA adapters , 10 and 40Gb speeds across a standard Ethernet network is achieved with high performance and efficiency. Tests have shown higher throughput of GPUDirect RDMA using Chelsio T5 Unified Wire adapters, compared to InfiniBand over Ethernet (RoCE).
With GPUDirect, multiple GPUs, third party network adapters, solid-state drives (SSDs) and other devices can directly read and write between CUDA host and device memory. Network and GPU device drives can share “pinned” buffers, which eliminates the need to make a redundant copy in CUDA host memory. Peer-to-peer transfers between GPU’s allow high speed copying of data between the memories of two GPUs on the same PCIe bus. Using remote direct memory access (RDMA) eliminates unnecessary memory copies and reduces CPU overhead and latency. The results are significant performance improvements in data transfer times for applications running NVIDIA Tesla and Quadro products.
The technology of RDMA achieves high levels of efficiency due to direct system or application memory-to-memory communication, without CPU involvement or data copies. With RDMA enabled adapters, packet and protocol processing required for communication is handled in hardware by the network adapter. iWARP RDMA uses a hardware TCP/IP stack to run the adapter, which bypasses the host software stack. This eliminates any inefficiency due to software processing and provides all the benefits of RDMA, while operating over standard Ethernet networks.
The benefits of using Chelsio 10/40-GB Ethernet RDMA adapters along with NVIDIA’s GPUDirect technology is dramatically lower latency and higher throughput for mission-critical scientific and HPC applications.