GPUs and Deep Learning

A Graphics Processing Unit, commonly referred to as GPU, is a specialized processor that is used for high speed parallel processing. It sounds technical, but the name is becoming commonplace as their applicability and popularity rises. Originally designed to enhance the visual experience for games and sophisticated graphical user interfaces, the scope has increased dramatically. GPUs are now used in a lot of applications that matter to a lot of people. What we care about here is their ideal suitability for deep learning tasks.

The bulk of computational effort used by deep learning systems is consumed by a relatively small set of mathematical operations. GPUs are designed to perform these tasks very quickly and efficiently. These units do not replace the more general purpose CPU, but rather compliment it by working on very specific and computationally demanding problems.

Are All GPUs Created Equally?

No. In fact, the GPUs currently on the market range in capability that is measured using several key features. These include the processing speed, the number of GPU cores, the amount and speed of memory, and the memory bandwidth. Each of these characteristics are important, and depending on the application, an improvement in one will lead to faster computing overall.

With so many options, it is clear that research should be undertaken in advance of choosing the best GPU for the task. An obvious starting point is the brand, or manufacturer. Does one brand of GPU perform better for deep learning projects? Let’s take a look at some important points.

Market Share

GPUs are available from several manufacturers, most notably NVIDIA and AMD. In fact, according to the Jon Peddie Research firm, these two providers together enjoy almost 100% of the market share. Between these rivals, NVIDIA GPUs are more popular than AMD by a factor of almost 2-to-1, and for good reason. It comes down to a combination of capability and support.


Whether your applications are written in-house or third party, it is important to make the programmer’s job as straightforward as possible. When the software developer receives great support for their project, it turns into a great product. Many programmers produce fantastic, bug-free applications; however, there can be little debate that a poorly-supported component is more difficult to work with. This translates into an increased development time, and a higher likelihood that bugs will arise and hinder progress going forward.

The typical deep learning project is built on top of existing software, not the least of which includes various API-accessible libraries. In many cases, the foundation is the result of extensive in-house development that has evolved over several generations of production, testing, and revision. No matter the point where the operator stands, between designer and end user, they ultimately rely on the developer to bring their product to life. The developer in turn relies on both the strength of the product, and the support required to make it work.

GPU-Accelerated Libraries

From the perspective of a developer and deep learning, what does support look like? One important element is the availability of GPU-accelerated libraries, such as CUBLAS or CURAND. These are part of NVIDIA’s Compute Unified Device Architecture (CUDA) Toolkit, which is a computing platform created by NVIDIA to assist programmers in making use of GPU resources. Another important inclusion for this NVIDIA-exclusive package is CUFFT, which is a drop-in compatible replacement for the CPU-only library FFTW (Fast Fourier Transform in the West). These libraries make up only part of a vast repository of tools with an ever-expanding knowledge base. So, why does all of this matter?

Well, it matters right now because the biggest names in Deep Learning are tied to CUDA. It is indeed significant that when Deep Learning frameworks are compared, one of the key features is whether the package has support for the CUDA architecture.

Deep Learning and Numerical Frameworks

Deep learning frameworks are used by developers to help utilize the power of the technology through a high level programming interface. By using a language such as Python, software developers work more abstractly, and need to worry less about the technical details. Although the math-intensive functions are written in languages like C++, the functionality is accessible through high-level APIs. Programmers using well supported frameworks benefit from previous effort in terms of research, development, and testing. As of this writing, the most popular framework for deep learning applications is TensorFlow.


TensorFlow is lauded by many for simplifying and abstracting deep learning tasks. Developed by the Google Brain team, TensorFlow is an open source library that makes machine learning faster and easier. This popular framework makes extensive use of the CUDA architecture. In fact, without CUDA, the full power of your GPU will not be unleashed in TensorFlow applications.

To learn more about TensorFlow in-depth, read one of our blogs here where we compare two popular methods of deploying the TensorFlow deep learning framework.


PyTorch is a scientific computing package that is used to provide speed and flexibility in Deep Learning projects. During the development stage it can be used as a replacement for the CPU-only library, NumPy (Numerical Python), which is heavily relied upon to perform mathematical operations in Neural Networks. PyTorch also relies on the CUDA library for GPU acceleration, and as of this writing, support for AMD GPUs is not available for this framework. Kickstart your deep learning with these 3 PyTorch projects here!

Microsoft Cognitive Toolkit (Formerly CNTK)

The Microsoft Cognitive Toolkit is a Deep Learning framework that was originally developed for internal use at Microsoft. Once released and available for all to use as an open source package, CNTK became one of the most widely known Deep Learning frameworks. Although more users now reportedly use TensorFlow and PyTorch, this toolkit is still credited with ease of use and compatibility. This package supports both CPU and GPU operation, although using GPU acceleration requires the use of the NVIDIA-exclusive CUDNN library.


When it comes to computing, at the forefront for most people is speed. Regardless of how an application has come to be, a quick training time is essential for Deep Learning models. The faster the system is trained, the sooner you get results. Of course this begs the question: How fast is the GPU?

When it comes to GPUs, performance is really a combination of raw computing power, memory, and bandwidth. However, generally speaking, the higher the number of FLOPS (floating point operations per second), the quicker the processor. Recently, Tom’s Hardware released their speed-ranked list of every GPU that is currently on the market. Notably, NVIDIA GPUs occupy the first six places, making them the clear winner in terms of speed.


No matter the task being performed by either a CPU or GPU, it requires memory (RAM). This is because the processor is responsible for carrying out substantial calculations, and the results for each need to be stored. For demanding tasks such as those in Deep Learning applications, a considerable amount of memory is required. Without sufficient RAM, performance is significantly degraded and the full potential of the raw processing power cannot be realized.

With respect to GPUs and RAM, there are two types: Integrated and Dedicated. An integrated GPU does not have its own memory. Rather, it shares on-board memory that is used by the CPU. A dedicated GPU, on the other hand, performs calculations using its own RAM. Although a dedicated GPU comes at a premium, with the additional memory generally ranging between 2 GB and 12 GB, there are important advantages.

Firstly, a dedicated GPU will not be negatively impacted by a heavily taxed CPU. Without having to share RAM, it truly runs in parallel without having to wait for CPU-related operations to complete. Next, and perhaps more importantly, a dedicated GPU will contain higher performance memory chips. This increase in memory speed is a direct contributor to the bandwidth, which is another critical characteristic used to assess overall performance.


No discussion on random access memory would be complete without a mention of bandwidth, and how it factors into the operation. Bandwidth is a combination of memory speed, memory width, and memory type. Although each is significant, it is best summarized by stating that when the bandwidth is substandard, the GPU will spend a lot of time idle as it waits for the RAM to respond. It should not be surprising that a lower-priced GPU by the same manufacturer may differ only in its bandwidth, and as such, is an important feature to consider.

Which Brand Do I Choose?

In the current state of technology, NVIDIA GPUs are the best choice for those who are interested in Deep Learning. There is no doubt that AMD make great hardware and powerful processors, but in terms of the most important factors – speed and support – they simply do not compete at this stage of the game.

Why? In a nutshell, the development of a Deep Learning application depends on support, and once the system is ready it comes down to speed. The speed of the GPU is determined by several components, with memory and bandwidth being as important as the GPU’s clock speed. Whether you are a developer, a provider, or an end-user – no matter your perspective, having a fast GPU is the key to getting your results.

What are your thoughts? Let us know below in the comments or on social media!

Leave a comment on Facebook.

Talk to us on Twitter.