Note: This blog compares only the performance of TensorFlow for the training deep deep neural networks. TensorFlow can be compiled for many different use cases, as with TensorFlow GPU Docker containers.

Key Findings (TL;DR)

  • Negligible Performance Costs: On our test machine (Exxact Workstation using 2x 2080 Ti), performance costs of TensorFlow running on Docker compared to running TensorFlow compiled from source are negligible/close to zero.
  • Dependency Isolation: Docker allows the end user to utilize multiple containerized deep learning environments and frameworks on a single host machine that may otherwise have conflicting requirements & dependencies when not using Docker.
  • Reproducibility and Scalability: Spinning up and authoring images and containers is relatively easy, If you want to deploy your model in a real world environment, container orchestration tools like Kubernetes, and swarm can help.
  • Docker Learning Curve: Docker can have a bit of a learning curve for a non dev-ops person, which may cause aversion. Also, “Docker for deep learning” documentation is a bit sparse (aside from the TensorFlow main w). Should it be noted that TensorFlow compile from source would also have a learning curve for non dev-ops?

Comparing TensorFlow GPU Docker vs. Native Install

In this blog post, we examine and compare two popular methods of deploying the TensorFlow framework for deep learning training. We deployed TensorFlow GPU from a docker container, and compared it to a natively installed, compiled from source version. We ran the “tf_cnn_benchmarks: High performance benchmarks” found here in the TensorFlow github. The tests were conducted to show performance of both deployments one by one, side by side, with the same parameters and settings. While these results may seem obvious to those familiar with using docker, these tests on our dual GPU workstation definitively dispel any notions in the of lack of performance of Docker for deep learning. Furthermore, we examine the benefits of using Docker in a deep learning environment on an Exxact workstation, and show that there are many advantages for researchers and developers using containerization.

Performance Snapshot TensorFlow GPU Docker vs Install From Source

We used the popular neural network models: ResNet 50, InceptionV3, VGG16, and AlexNet with synthetic data, one by one, and compared the results.  All batch sizes are 64 unless otherwise noted.

(note: if you just want to see the performance numbers, and not interested in hearing about Docker, feel free to scroll to the 2nd half of this post) 

nn model Mode GPUs Images/sec
ResNet 50 Docker 1 296.10
2 548.22
Native 1 298.34
2 552.96
InceptionV3 Docker 1 195.18
2 356.10
Native 1 196.16
2 364.85
*VGG16 Docker 1 168.83
2 229.34
Native 1 169.17
2 228.70
**Alexnet Docker 1 3870.39
2 6387.21
Native 1 3856.36
2 6455.11

*VGG16 ran at batch size 32 due to memory errors.

**Alexnet ran at batch size 512 due to poor performance at batch size 64 (see video below for initial run at batch size 64).

Video Capture of Side by Side Comparison

OK, so performance is on par, but WHY should I consider Docker for deep learning?

To put it simply, you escape dependency hell. Having multiple deep learning frameworks or multiple versions of frameworks coexist and function properly on a single machine is extremely complex, and is a sure way to drive yourself insane. While this post focuses on TensorFlow, we do recognize that many modern deep learning researchers do not rely on just one framework. Having ready-to-go containers for each framework allows flexibility for experimentation, without having to worry about mucking up your current environment. The frameworks are completely self-contained. Something not working correctly? Simply wipe the entire container and start over, perform a “docker run” command and 30 seconds later, you have the entire container, a fresh environment ready to go.

(photo credit reddit u/Smakx)

“But I only use TensorFlow, I really don’t need to mess with other frameworks.”

Even for experimentation purposes, let’s say you just want to try the latest nightly release of TensorFlow GPU. Chances are it’s already containerized, and you can simply “docker run (name_of_program_you_need)” to execute your new environment. It will not compromise your local workstation, you can simply download, test, use, and if the container doesn’t fit your needs, discard and move on.

Should you choose the docker route, Docker Hub will become a mainstay resource. In our case, at the official TensorFlow Docker Hub, you’ll see multiple image options in the form of docker image tags such as: devel images come with Bazel and are ideal for developing changes to TensorFlow; and custom-op which is a special experimental image for developing TF custom ops. Optional tags are available such as -py3 images come with Python 3.5 instead of Python 2.7.

TensorFlow GPU Docker Reproducibility and Scalability

As with any development project, It is of great importance to make experiments and results reproducible. For deep learning this means implementing practices that properly track code, training data, weights, statistics, and files that can be rerun and reused with subsequent experiments. With containerized environments within Docker, and images from Docker hub, reproducible results for deep learning experiments are more achievable.

Containerized environments and images can be a huge advantage when deploying deep learning at large organizations, or any distributed development environment where deep learning talent may be spread across different organizations, departments, or even across different geographical regions. Furthermore, management and customization of your deployment can be further achieved using orchestration tools like Kubernetes or Docker swarm.

The native install allows working on projects on a local system without going through the trouble of setting up containers, and may be considered “easier to learn”. However, when it comes to deploying the model with Docker you also can run multiple containers to load the trained models and serve them very efficiently for end use applications (as with running regular apps using containers).

“But I don’t know how to use Docker, it’s too complicated!”

While Exxact systems support and run both Docker and native TensorFlow, we recommend, and ship standard the Docker implementation. However, we do understand there can be quite a learning curve if you’re not familiar with Docker.

If TensorFlow is installed natively on your local machine, you don’t really need to learn how to work with containers, container volumes, images, and that sort of thing. While it seems the local install method may be good for individual researchers, in that you don’t need to deal with learning Docker, this simply doesn’t scale well.

If your project starts to gain steam and begins to scale, you may eventually need to containerize your environment, so wouldn’t it be best to learn Docker, and work with containers from the start?  (If you still think Docker is too difficult to use, keep an eye on our blog for future posts on “Docker for Deep Learning” – we’ll be sure to make some new resources available soon.)   

Things to Consider when NOT Using Docker

Installing TensorFlow directly on the operating system has the same disadvantages as installing any other development or research environments. It may affect other applications or dependencies, and produce a lot of data which may pollute your clean & stable machine.

Also, be mindful that other programs may interact with your TensorFlow environment and cause unwanted and unpredictable behavior.

System Specs, Performance Metrics, Technical Data, Results

System Specifications:

System Exxact Valence Workstation 
GPU 2 x NVIDIA GeForce RTX 2080 Ti
CPU Intel CORE I7-7820X 3.6GHZ
RAM 32GB DDR4
SSD 240 GB SSD
OS Ubuntu 18.04
NVIDIA DRIVER 410.79
CUDA Version 10
Python 2.7
TensorFlow 1.13 (compiled from source)
Docker Image tensorflow/tensorflow:nightly-gpu

Deep Learning Workstation

TensorFlow Benchmark Results

ResNet50 1x GPU

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server

TensorFlow GPU Docker Container

Step Img/sec total_loss
1 images/sec: 296.7 +/- 0.0 (jitter = 0.0) 8.220
10 images/sec: 297.3 +/- 0.1 (jitter = 0.4) 7.880
20 images/sec: 297.3 +/- 0.1 (jitter = 0.5) 7.910
30 images/sec: 297.2 +/- 0.1 (jitter = 0.6) 7.820
40 images/sec: 297.0 +/- 0.1 (jitter = 0.6) 8.005
50 images/sec: 296.9 +/- 0.1 (jitter = 0.7) 7.768
60 images/sec: 296.7 +/- 0.1 (jitter = 0.8) 8.112
70 images/sec: 296.5 +/- 0.1 (jitter = 0.8) 7.818
80 images/sec: 296.5 +/- 0.1 (jitter = 0.9) 7.974
90 images/sec: 296.3 +/- 0.1 (jitter = 1.0) 8.095
100 images/sec: 296.2 +/- 0.1 (jitter = 1.0) 8.030
—————————————————————-
total images/sec: 296.10
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 299.5 +/- 0.0 (jitter = 0.0) 8.220
10 images/sec: 299.3 +/- 0.1 (jitter = 0.3) 7.880
20 images/sec: 299.2 +/- 0.1 (jitter = 0.3) 7.910
30 images/sec: 299.1 +/- 0.1 (jitter = 0.5) 7.821
40 images/sec: 299.0 +/- 0.1 (jitter = 0.5) 8.004
50 images/sec: 298.9 +/- 0.1 (jitter = 0.4) 7.769
60 images/sec: 298.9 +/- 0.1 (jitter = 0.5) 8.113
70 images/sec: 298.8 +/- 0.1 (jitter = 0.5) 7.817
80 images/sec: 298.7 +/- 0.1 (jitter = 0.6) 7.982
90 images/sec: 298.6 +/- 0.1 (jitter = 0.6) 8.095
100 images/sec: 298.5 +/- 0.1 (jitter = 0.8) 8.039
—————————————————————-
total images/sec: 298.34
—————————————————————-

ResNet50 2x GPU

python tf_cnn_benchmarks.py --num_gpus=2 --batch_size=64 --model=resnet50 --variable_update=parameter_server

TensorFlow GPU Docker Container

Step Img/sec total_loss
1 images/sec: 532.9 +/- 0.0 (jitter = 0.0) 8.047
10 images/sec: 547.1 +/- 2.3 (jitter = 7.8) 7.919
20 images/sec: 549.5 +/- 2.1 (jitter = 8.4) 7.823
30 images/sec: 549.4 +/- 1.6 (jitter = 7.5) 8.011
40 images/sec: 549.5 +/- 1.3 (jitter = 6.8) 8.006
50 images/sec: 549.9 +/- 1.2 (jitter = 6.6) 7.825
60 images/sec: 549.8 +/- 1.1 (jitter = 6.3) 7.951
70 images/sec: 549.5 +/- 1.1 (jitter = 6.3) 7.814
80 images/sec: 549.1 +/- 1.0 (jitter = 6.3) 7.833
90 images/sec: 548.9 +/- 0.9 (jitter = 6.3) 7.949
100 images/sec: 548.5 +/- 0.9 (jitter = 6.4) 8.086
—————————————————————-
total images/sec: 548.22
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 546.0 +/- 0.0 (jitter = 0.0) 8.047
10 images/sec: 557.2 +/- 2.1 (jitter = 6.0) 7.919
20 images/sec: 556.7 +/- 1.6 (jitter = 5.6) 7.822
30 images/sec: 555.4 +/- 1.5 (jitter = 6.6) 8.010
40 images/sec: 555.2 +/- 1.2 (jitter = 5.9) 8.005
50 images/sec: 555.1 +/- 1.1 (jitter = 6.4) 7.828
60 images/sec: 554.9 +/- 1.0 (jitter = 6.0) 7.952
70 images/sec: 554.5 +/- 0.9 (jitter = 6.6) 7.807
80 images/sec: 553.7 +/- 0.9 (jitter = 6.5) 7.827
90 images/sec: 553.2 +/- 0.9 (jitter = 6.7) 7.943
100 images/sec: 553.2 +/- 0.8 (jitter = 6.3) 8.097
—————————————————————-
total images/sec: 552.96
—————————————————————-

InceptionV3 1x GPU

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --variable_update=parameter_server

TensorFlow GPU Docker Container

Step Img/sec total_loss
1 images/sec: 195.9 +/- 0.0 (jitter = 0.0) 7.269
10 images/sec: 196.1 +/- 0.1 (jitter = 0.2) 7.307
20 images/sec: 196.0 +/- 0.1 (jitter = 0.3) 7.295
30 images/sec: 195.9 +/- 0.1 (jitter = 0.4) 7.388
40 images/sec: 195.8 +/- 0.1 (jitter = 0.3) 7.332
50 images/sec: 195.7 +/- 0.1 (jitter = 0.4) 7.269
60 images/sec: 195.6 +/- 0.1 (jitter = 0.4) 7.358
70 images/sec: 195.5 +/- 0.1 (jitter = 0.4) 7.360
80 images/sec: 195.4 +/- 0.1 (jitter = 0.6) 7.404
90 images/sec: 195.3 +/- 0.1 (jitter = 0.7) 7.315
100 images/sec: 195.2 +/- 0.1 (jitter = 0.7) 7.370
—————————————————————-
total images/sec: 195.18
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 197.1 +/- 0.0 (jitter = 0.0) 7.268
10 images/sec: 197.2 +/- 0.1 (jitter = 0.2) 7.296
20 images/sec: 197.2 +/- 0.1 (jitter = 0.2) 7.297
30 images/sec: 197.0 +/- 0.1 (jitter = 0.3) 7.400
40 images/sec: 196.9 +/- 0.1 (jitter = 0.5) 7.307
50 images/sec: 196.7 +/- 0.1 (jitter = 0.7) 7.259
60 images/sec: 196.6 +/- 0.1 (jitter = 0.8) 7.341
70 images/sec: 196.5 +/- 0.1 (jitter = 0.7) 7.357
80 images/sec: 196.4 +/- 0.1 (jitter = 0.8) 7.423
90 images/sec: 196.3 +/- 0.1 (jitter = 0.7) 7.307
100 images/sec: 196.2 +/- 0.1 (jitter = 0.8) 7.353
—————————————————————-
total images/sec: 196.16
—————————————————————-

InceptionV3 2x GPU

python tf_cnn_benchmarks.py --num_gpus=2 --batch_size=64 --model=inception3 --variable_update=parameter_server

TensorFlow GPU Docker Container

Step Img/sec total_loss
1 images/sec: 361.4 +/- 0.0 (jitter = 0.0) 7.276
10 images/sec: 364.7 +/- 1.7 (jitter = 4.4) 7.292
20 images/sec: 364.9 +/- 1.2 (jitter = 4.2) 7.367
30 images/sec: 364.0 +/- 1.0 (jitter = 3.3) 7.383
40 images/sec: 363.1 +/- 0.8 (jitter = 3.1) 7.357
50 images/sec: 362.2 +/- 0.8 (jitter = 3.5) 7.249
60 images/sec: 361.3 +/- 0.7 (jitter = 3.8) 7.360
70 images/sec: 360.3 +/- 0.7 (jitter = 4.7) 7.319
80 images/sec: 359.2 +/- 0.7 (jitter = 5.5) 7.354
90 images/sec: 357.8 +/- 0.8 (jitter = 6.4) 7.297
100 images/sec: 356.2 +/- 0.8 (jitter = 7.0) 7.305
—————————————————————-
total images/sec: 356.10
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 366.2 +/- 0.0 (jitter = 0.0) 7.282
10 images/sec: 370.0 +/- 1.1 (jitter = 3.3) 7.282
20 images/sec: 369.3 +/- 0.7 (jitter = 2.4) 7.354
30 images/sec: 369.1 +/- 0.6 (jitter = 3.0) 7.394
40 images/sec: 368.5 +/- 0.6 (jitter = 3.0) 7.346
50 images/sec: 367.9 +/- 0.5 (jitter = 3.2) 7.243
60 images/sec: 367.3 +/- 0.6 (jitter = 3.4) 7.354
70 images/sec: 366.6 +/- 0.6 (jitter = 3.6) 7.318
80 images/sec: 366.2 +/- 0.5 (jitter = 4.1) 7.364
90 images/sec: 365.6 +/- 0.5 (jitter = 4.6) 7.300
100 images/sec: 364.9 +/- 0.5 (jitter = 5.1) 7.305
—————————————————————-
total images/sec: 364.85
—————————————————————-

VGG16 1 x GPU

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=vgg16 --variable_update=parameter_server

TensorFlow GPU Docker Container

Step Img/sec total_loss
1 images/sec: 169.5 +/- 0.0 (jitter = 0.0) 7.296
10 images/sec: 169.4 +/- 0.0 (jitter = 0.1) 7.305
20 images/sec: 169.4 +/- 0.0 (jitter = 0.1) 7.344
30 images/sec: 169.4 +/- 0.0 (jitter = 0.2) 7.286
40 images/sec: 169.3 +/- 0.0 (jitter = 0.3) 7.285
50 images/sec: 169.2 +/- 0.0 (jitter = 0.3) 7.288
60 images/sec: 169.1 +/- 0.0 (jitter = 0.4) 7.268
70 images/sec: 169.1 +/- 0.0 (jitter = 0.4) 7.263
80 images/sec: 169.0 +/- 0.0 (jitter = 0.4) 7.261
90 images/sec: 169.0 +/- 0.0 (jitter = 0.4) 7.263
100 images/sec: 168.9 +/- 0.0 (jitter = 0.4) 7.305
—————————————————————-
total images/sec: 168.83
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 169.9 +/- 0.0 (jitter = 0.0) 7.306
10 images/sec: 169.8 +/- 0.0 (jitter = 0.1) 7.322
20 images/sec: 169.7 +/- 0.0 (jitter = 0.1) 7.307
30 images/sec: 169.6 +/- 0.0 (jitter = 0.2) 7.296
40 images/sec: 169.5 +/- 0.0 (jitter = 0.2) 7.255
50 images/sec: 169.5 +/- 0.0 (jitter = 0.3) 7.265
60 images/sec: 169.5 +/- 0.0 (jitter = 0.3) 7.257
70 images/sec: 169.4 +/- 0.0 (jitter = 0.3) 7.259
80 images/sec: 169.3 +/- 0.0 (jitter = 0.4) 7.253
90 images/sec: 169.3 +/- 0.0 (jitter = 0.4) 7.272
100 images/sec: 169.3 +/- 0.0 (jitter = 0.4) 7.287
—————————————————————-
total images/sec: 169.17
—————————————————————-

VGG16 2x GPU

python tf_cnn_benchmarks.py --num_gpus=2 --batch_size=32 --model=vgg16 --variable_update=parameter_server

Docker Container

Step Img/sec total_loss
1 images/sec: 226.0 +/- 0.0 (jitter = 0.0) 7.326
10 images/sec: 231.9 +/- 0.8 (jitter = 2.3) 7.288
20 images/sec: 231.1 +/- 0.6 (jitter = 1.7) 7.281
30 images/sec: 230.7 +/- 0.4 (jitter = 2.3) 7.304
40 images/sec: 230.6 +/- 0.4 (jitter = 2.1) 7.232
50 images/sec: 230.6 +/- 0.3 (jitter = 2.3) 7.283
60 images/sec: 230.2 +/- 0.3 (jitter = 2.5) 7.271
70 images/sec: 230.0 +/- 0.3 (jitter = 2.7) 7.273
80 images/sec: 229.8 +/- 0.3 (jitter = 2.6) 7.286
90 images/sec: 229.5 +/- 0.3 (jitter = 2.7) 7.274
100 images/sec: 229.4 +/- 0.3 (jitter = 2.8) 7.285
—————————————————————-
total images/sec: 229.34
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 230.1 +/- 0.0 (jitter = 0.0) 7.345
10 images/sec: 230.5 +/- 0.9 (jitter = 2.0) 7.297
20 images/sec: 230.0 +/- 0.9 (jitter = 2.9) 7.281
30 images/sec: 229.8 +/- 0.6 (jitter = 3.0) 7.325
40 images/sec: 229.9 +/- 0.5 (jitter = 2.8) 7.248
50 images/sec: 229.7 +/- 0.5 (jitter = 3.1) 7.276
60 images/sec: 229.5 +/- 0.4 (jitter = 3.1) 7.262
70 images/sec: 229.3 +/- 0.4 (jitter = 3.1) 7.258
80 images/sec: 229.0 +/- 0.4 (jitter = 3.3) 7.275
90 images/sec: 228.8 +/- 0.3 (jitter = 3.0) 7.288
100 images/sec: 228.8 +/- 0.3 (jitter = 2.8) 7.292
—————————————————————-
total images/sec: 228.70
—————————————————————-

AlexNet 1x GPU

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet --variable_update=parameter_server

TensorFlow GPU Docker Container

Done warm up

Step Img/sec total_loss
1 images/sec: 3875.3 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 3882.4 +/- 3.4 (jitter = 14.1) nan
20 images/sec: 3885.2 +/- 2.7 (jitter = 13.3) nan
30 images/sec: 3883.0 +/- 2.1 (jitter = 13.0) nan
40 images/sec: 3881.3 +/- 1.7 (jitter = 9.6) nan
50 images/sec: 3879.5 +/- 1.4 (jitter = 7.4) nan
60 images/sec: 3879.8 +/- 1.2 (jitter = 8.5) nan
70 images/sec: 3878.7 +/- 1.2 (jitter = 8.8) nan
80 images/sec: 3876.8 +/- 1.3 (jitter = 8.3) nan
90 images/sec: 3874.8 +/- 1.3 (jitter = 9.7) nan
100 images/sec: 3873.4 +/- 1.3 (jitter = 11.9) nan
—————————————————————-
total images/sec: 3870.39
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 3872.2 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 3864.3 +/- 2.9 (jitter = 11.7) nan
20 images/sec: 3866.2 +/- 2.3 (jitter = 13.1) nan
30 images/sec: 3866.8 +/- 1.8 (jitter = 10.9) nan
40 images/sec: 3865.8 +/- 1.5 (jitter = 9.3) nan
50 images/sec: 3864.7 +/- 1.3 (jitter = 9.8) nan
60 images/sec: 3863.3 +/- 1.2 (jitter = 9.7) nan
70 images/sec: 3861.7 +/- 1.2 (jitter = 10.8) nan
80 images/sec: 3860.2 +/- 1.3 (jitter = 10.7) nan
90 images/sec: 3860.0 +/- 1.1 (jitter = 9.7) nan
100 images/sec: 3859.2 +/- 1.1 (jitter = 9.5) nan
—————————————————————-
total images/sec: 3856.36
—————————————————————-

 AlexNet 2x GPU

python tf_cnn_benchmarks.py --num_gpus=2 --batch_size=512 --model=alexnet --variable_update=parameter_server

TensorFlow GPU Docker Container

Step Img/sec total_loss
1 images/sec: 6339.8 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 6374.6 +/- 11.5 (jitter = 20.9) nan
20 images/sec: 6380.0 +/- 6.7 (jitter = 26.2) nan
30 images/sec: 6377.5 +/- 5.1 (jitter = 27.6) nan
40 images/sec: 6376.2 +/- 4.1 (jitter = 26.4) nan
50 images/sec: 6375.2 +/- 3.5 (jitter = 26.4) nan
60 images/sec: 6384.4 +/- 11.1 (jitter = 19.9) nan
70 images/sec: 6381.3 +/- 9.7 (jitter = 22.6) nan
80 images/sec: 6388.6 +/- 13.3 (jitter = 22.9) nan
90 images/sec: 6385.6 +/- 11.9 (jitter = 23.6) nan
100 images/sec: 6391.2 +/- 14.3 (jitter = 25.3) nan
—————————————————————-
total images/sec: 6387.21
—————————————————————-

Native Install

Step Img/sec total_loss
1 images/sec: 6448.5 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 6442.5 +/- 9.4 (jitter = 7.5) nan
20 images/sec: 6470.2 +/- 44.3 (jitter = 28.0) nan
30 images/sec: 6447.3 +/- 30.3 (jitter = 34.9) nan
40 images/sec: 6459.2 +/- 34.4 (jitter = 29.4) nan
50 images/sec: 6481.7 +/- 38.1 (jitter = 29.4) nan
60 images/sec: 6483.7 +/- 35.9 (jitter = 29.8) nan
70 images/sec: 6470.7 +/- 31.1 (jitter = 31.9) nan
80 images/sec: 6461.5 +/- 27.4 (jitter = 31.3) nan
90 images/sec: 6454.1 +/- 24.5 (jitter = 28.9) nan
100 images/sec: 6458.8 +/- 24.3 (jitter = 28.7) nan
—————————————————————-
total images/sec: 6455.11
—————————————————————-

More TensorFlow GPU Docker Benchmarks