Benchmarks, Deep Learning, GPU Computing, HPC

TensorFlow Benchmarks for Exxact Server Featuring NVIDIA V100S

Exxact TensorEX Server Featuring NVIDIA V100S Benchmarked

NVIDIA V100S GPU

For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX Server. To conduct these benchmarks this deep learning server was outfitted with 4 NVIDIA V100S GPUs.

We ran the  standard  “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, Inception V4 and googlenet. In addition we compared the FP16 to FP32 performance, and used batch size of 128 . The same tests were run using 2 and 4 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

Tesla GPU servers

NVIDIA V100S Deep Learning Benchmark Snapshot

As we see, running FP16 gives a great boost to performance in the overall images/sec metric. If you’re able to train using FP16 vs FP32, we recommend to do so.

V100S Benchmarks

NVIDIA V100S Deep Learning Benchmarks FP16

V100S Benchmarks AI

2 GPU img/sec 4 GPU img/sec  Batch Size
ResNet50 1735.56 3218 128
ResNet152 760.57 1415.56 128
Inception V3 1134.88 2161.02 128
Inception V4 602.36 1205.97 128
googlenet 2820.47 5265.14 128

 

 

NVIDIA V100S Deep Learning Benchmarks FP32

V100S Benchmarks Deep Learning

2 GPU img/sec 4 GPU img/sec  Batch Size
ResNet50 762.21 1432.69 128
ResNet152 278.17 577.26 128
Inception V3 495.51 926.93 128
Inception V4 227.05 455.65 128
googlenet 1692.94 3393.91 128

System Specifications:

Model Exxact TensorEX Deep Learning Server
GPU NVIDIA Tesla V100S 32 GB PCIe
CPU Intel Xeon Silver 4116
RAM 128GB DDR4
SSD (OS) 120 GB
SSD (Data) 1024.2 GB
OS CentOS Linux 7
NVIDIA DRIVER 440.82
CUDA Version 10.2
Python 3.6.9
TensorFlow 20.02-tf1-py3
Docker Image nvcr.io/nvidia/tensorflow:20.02-tf1-py3

Deep Learning Workstations Transformer

Training Parameters

Dataset: Imagenet
Mode: training
SingleSess: False
Batch Size: 128
Num Batches: 100
Num Epochs: 0.16
Devices: [‘/gpu:0’]…(varied)
NUMA bind: False
Data format: NCHW
Optimizer: momentum
Variables: parameter_server

Interested in More Deep Learning Benchmarks?

Deep Learning Ebook Free tensorflow

 

Related posts