Exxact HGX-2 TensorEX Server Smashes Deep Learning Benchmarks

For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX HGX-2 Server. This behemoth of a Deep Learning Server has 16 NVIDIA Tesla V100 GPUs.

We ran the  standard  “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, VGG-16. In addition we compared the FP16 to FP32 performance, and used batch size of 256 (except for ResNet152 FP32, the batch size was 64). As you’ll see, the same tests were run using 1,2,4,8 and 16 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

Notable HGX2 Server Features

  • 16x NVIDIA Tesla V100 SXM3
  • 81,920 NVIDIA CUDA Cores
  • 10,240 NVIDIA Tensor Cores
  • .5TB Total GPU Memory
  • NVSwitch powered by NVLink 2.4TB/sec aggregate speed

Tesla GPU servers

Exxact TensorEX HGX-2 Deep Learning Benchmarks: FP16

HGX-2 Deep Learning Benchmarks FP16

Run these FP16 benchmarks

Configure the num_gpus to the number of GPUs desired to test. Change model to desired model architecture.

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=256 --model=resnet50 --variable_update=parameter_server --use_fp16=True

Exxact TensorEX HGX-2 Deep Learning Benchmarks: FP32

HGX-2 FP32 Benchmarks

Run these FP32 benchmarks

To run FP32, remove fp16 flag, configure the num_gpus to the number of GPUs desired to test. Change model to desired architecture.

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=256 --model=resnet50 --variable_update=parameter_server

Other Notes and Future Plans for HGX2

The HGX2 GPU server is an absolute monster for deep learning or any GPU powered HPC tasks. In the future, we would like to conduct further benchmarks on more models as well as other acceleration methods such as XLA for TensorFlow, where we would expect significant performance gains. Also training models on even larger batch sizes is another area we will consider exploring.

System Specifications:

System Exxact TensorEX HGX-2 
GPU 16x NVIDIA Tesla V100 32 GB SXM3
CPU 2x Intel Xeon Platinum 8168
SSD (OS) 1TB x2 NVMe (RAID 1)
SSD (Data) 32 TB NVMe Storage
OS Ubuntu 16.04
CUDA Version 10.1
Python 2.7
TensorFlow 1.14
Docker Image tensorflow/tensorflow:nightly-gpu

Deep Learning Workstations Transformer

Training Parameters

Dataset: Imagenet (synthetic)
Mode: training
SingleSess: False
Batch Size: 256 per device*
Num Batches: 100
Num Epochs: 0.08
Devices: [‘/gpu:0’]…(varied)
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server

Interested in More Deep Learning Benchmarks?

ebook for deep[ learning