So whats the best GPU for MY deep learning application?
Selecting the right GPU for deep learning is not always such a clear cut task. Based on the types of networks you’re training, selecting the right GPU is more nuanced than simply looking at price/performance. Here we aim to provide some insights based on real data in the form of deep learning benchmarks for computer vision (img/sec throughput, batch size) and, natural language processing (NLP), where we compare the performance of training transformer models based on model size and batch size.
NVIDIA Turing GPUs we will be looking at:
RTX 2080 Ti (Blower Model)
Overall, the RTX 2080 Ti is an excellent value GPU for deep learning experimentation. However, it should be noted that this GPU may have some limitations on training modern NLP models due to the relatively low GPU Memory per card (11GB). On the plus side, the blower design allows for dense system configurations.
NVIDIA TITAN RTX
The TITAN RTX is a good all purpose GPU for just about any deep learning task. When used as a pair with the NVLink bridge you have effectively 48 GB of memory to train large models, including big transformer models for NLP. The twin fan design may hamper dense system configurations. As a plus, qualifying EDU discounts are available on TITAN RTX.
NVIDIA Quadro RTX 6000
Looking at the TITAN RTX and the 2080 Ti, this card you gives you the best of both worlds. The large memory capacity, plus the blower design allows for densely populated system configurations with ample memory capacity to train large models. Furthermore, the raw image throughput performance (img/sec) is on par with the mighty RTX 8000. Like the TITAN RTX, EDU discounts may be available on Quadro cards, so be sure to check!
Quadro RTX 8000
If you’ve done any significant amount deep learning on GPUs, you’ll be familiar with the dreaded ‘RuntimeError: CUDA error: out of memory’. Enter the RTX 8000, perhaps one of the best deep learning GPUs ever created. This card when used in a pair w/NVLink lives 96GB of GPU memory, double that of the RTX 6000 and TITAN RTX. It’s blower design allows for dense system configurations. The price does come at a premium, however if you can afford it, go for it.
Deep Learning for Computer Vision
The goal of computer vision is to make computers gain high-level “understanding” of images. To evaluate if a model truly “understands” the image, researchers have developed different evaluation methods to measure performance. We examine Images/second throughput and batch size by running tf_cnn_benchmarks.py from the official TensorFlow github page.
Clearly the RTX 8000 and 6000 models perform well in the 4x GPU configuration. If batch size isn’t important, the 2080 Ti system provides a excellent choice at a value price.
Deep Learning for Natural Language Processing (NLP)
NLP tasks include speech recognition, translation, speech-to-text, and Q&A Systems. Incidentally, GPU memory is of great importance, as modern transformer networks such as XLNet and BERT require massive memory to achieve highest accuracy. For this section, we compare training the official Transformer model (BASE and BIG) from the official Tensorflow Github.
Note: For a detailed tutorial on how we trained the transformer models and how we obtained our metrics see our blog post Examining the Transformer Architecture – Part 3: Training a Transformer Network from Scratch in Docker
Additional Deep Learning Benchmarks by Model
Inception V4 GPU Performance Comparison
Inception V3 GPU Deep Learning Benchmarks
Alexnet GPU Deep Learning Benchmarks
Nasnet GPU Deep Learning Benchmarks
ResNet 50 GPU Deep Learning Benchmarks
VGG16 GPU Deep Learning Benchmarks
For more detailed deep learning benchmarks, and methods used for obtaining data see below for specific GPU statistics.
- RTX 2080 Ti Deep Learning Benchmarks for TensorFlow
- TITAN RTX Deep Learning Benchmarks for Tensorflow
- NVIDIA Quadro RTX 6000 GPU Benchmarks for TensorFlow
- Quadro RTX 8000 Deep Learning Benchmarks for TensorFlow
- NVIDIA Quadro RTX 8000 BERT Large Fine Tuning Benchmarks in TensorFlow