For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX Server. To conduct these benchmarks this deep learning server was outfitted with 4 NVIDIA V100S GPUs.
We ran the standard “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, Inception V4 and googlenet. In addition we compared the FP16 to FP32 performance, and used batch size of 128 . The same tests were run using 2 and 4 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.
NVIDIA V100S Deep Learning Benchmark Snapshot
As we see, running FP16 gives a great boost to performance in the overall images/sec metric. If you’re able to train using FP16 vs FP32, we recommend to do so.
NVIDIA V100S Deep Learning Benchmarks FP16
2 GPU img/sec | 4 GPU img/sec | Batch Size | |
ResNet50 | 1735.56 | 3218 | 128 |
ResNet152 | 760.57 | 1415.56 | 128 |
Inception V3 | 1134.88 | 2161.02 | 128 |
Inception V4 | 602.36 | 1205.97 | 128 |
googlenet | 2820.47 | 5265.14 | 128 |
NVIDIA V100S Deep Learning Benchmarks FP32
2 GPU img/sec | 4 GPU img/sec | Batch Size | |
ResNet50 | 762.21 | 1432.69 | 128 |
ResNet152 | 278.17 | 577.26 | 128 |
Inception V3 | 495.51 | 926.93 | 128 |
Inception V4 | 227.05 | 455.65 | 128 |
googlenet | 1692.94 | 3393.91 | 128 |
System Specifications:
Model | Exxact TensorEX Deep Learning Server |
GPU | NVIDIA Tesla V100S 32 GB PCIe |
CPU | Intel Xeon Silver 4116 |
RAM | 128GB DDR4 |
SSD (OS) | 120 GB |
SSD (Data) | 1024.2 GB |
OS | CentOS Linux 7 |
NVIDIA DRIVER | 440.82 |
CUDA Version | 10.2 |
Python | 3.6.9 |
TensorFlow | 20.02-tf1-py3 |
Docker Image | nvcr.io/nvidia/tensorflow:20.02-tf1-py3 |
Training Parameters
Dataset: | Imagenet |
Mode: | training |
SingleSess: | False |
Batch Size: | 128 |
Num Batches: | 100 |
Num Epochs: | 0.16 |
Devices: | [‘/gpu:0’]…(varied) |
NUMA bind: | False |
Data format: | NCHW |
Optimizer: | momentum |
Variables: | parameter_server |
Interested in More Deep Learning Benchmarks?