', 'auto'); // ga('send', 'pageview');

Taking A Deeper Look at the AMD Radeon Instinct GPUs for Deep Learning

Introduced earlier this year, the AMD Radeon Instinct line of GPUs were a hot topic at the annual Supercomputing 2017 event. AMD announced its immediate availability of a suite of new, high performance system powered by AMD EPYC CPUs and AMD Radeon Instinct GPUs to accelerate innovation in supercomputing. AMD plans to combine this portfolio with software, featuring their new ROCm 1.8 open platform with updated development tools and libraries, enabling compute AMD EPYC-based PetaFLOPs systems.

But what do we know about the Radeon Instinct GPUs so far? To put it briefly, the Radeon Instinct line is dedicated for large-scale machine intelligence and deep learning data center applications. These new graphic cards produce some of the latest Radeon technology that boosts performance and deliver much high compute throughput in Deep Learning tasks.  Despite having such advanced design and performance, AMD optimzed the Radeon Instinct to be cost-effective machine and deep learning inference, where workloads can take advantage of the acelerator’s highly parallel computing capabilities.  Fields such as government science labs, life science, financial, AI, higher academic institutions will all be ideal for data-centric HPC class systems with AMD Instinct products.

 

3 Different options to choose from to fit your needs:
Compute Units TFLOPs Memory Size Memory Bandwidth
Radeon Instinct MI25

64 nCU

4096 Stream Processors

24.6/12.3

FP16/FP32 Performance

16GB 484 GB/s
Radeon Instinct MI8

64

4096 Stream Processors

8.2

FP16/FP32 Performance

4GB 512 GB/s
Radeon Instinct MI6

36

2304 Stream Processors

5.7

FP16/FP32 Performance

16GB 224 GB/s

 

Radeon Instinct MI25:  World’s fastest training accelerator for machine intelligence and deep learning

The Radeon Instinct MI25 accelerator brings in a new era of compute for the datacenter with its Next-Gen “Vega” architecture delivering superior compute performance via its powerful parallel compute engine and Next-Gen programmable geometry pipeline improving processing efficiencies, while delivering 2x peak throughput-per-clock over previous Radeon architectures. The Radeon Instinct MI25 provides increased performance density, while decreasing energy consumption per operation making it the perfect solution for today’s demanding workloads in the datacenter.
 
Highlights:

  • Industry Leading Performance for Deep Learning
  • Next-Gen “Vega” Architecture
  • Advanced Memory Engine
  • Large BAR Support for Multi-GPU Peer to Peer
  • ROCm Open Software Platform for Rack Scale
  • Optimized MIOpen Libraries for Deep Learning
  • MxGPU Hardware Virtualization

 

Radeon Instinct MI8:  Cost-sensitive, scalable accelerator for machine and deep learning inference applications

The Radeon Instinct MI8 accelerator based on AMD’s 3rd generation “Fiji” architecture with improved data-parallel processing and ultra-fast HBM1 memory delivers 8.2 TFLOPS of peak performance with up to 512 GB/s of memory bandwidth in a single, passively cooled GPU card. The MI8 accelerator, combined with AMD’s ROCm open software platform, is AMD’s GPU solution for cost sensitive system deployments for Machine Intelligence, Deep learning and HPC workloads, where performance and efficiency are key system requirements.
 
Highlights:

  • 8.2 TFLOPS FP16 or FP32 Performance
  • Up To 47 GFLOPS Per Watt FP16 or FP32 Performance
  • 4GB HBM1 on 512-bit Memory Interface
  • Passively Cooled Server Accelerator
  • Large BAR Support for Multi GPU Peer to Peer
  • ROCm Open Platform for HPC-Class Rack Scale
  • Optimized MIOpen Libraries for Deep Learning
  • MxGPU SR-IOV Hardware Virtualization

 

Radeon Instinct MI6:  Versatile training and inference accelerator for machine intelligence and deep learning

The Radeon Instinct MI6 accelerator is based on AMD’s new 4th generation “Polaris” architecture. It is built on a 14nm FinFET process and has exceptional data parallel processing capabilities featuring ultra-fast GDDR5 memory delivering 5.7 TFLOPS of peak performance with 16GB GDDR5 memory and up to 224 GB/s of memory bandwidth in a single, passively cooled GPU card. The MI6 accelerator, combined with AMD’s ROCm open software platform, is AMD’s answer for efficiency and cost-sensitive inference and edge-training system deployments for Machine Intelligence and Deep learning, along with HPC workloads, where performance with large memory and efficiency are main system solution drivers.
 
Highlights:

  • 5.7 TFLOPS FP16 or FP32 Performance
  • Up To 38 GFLOPS Per Watt Peak FP16 or FP32 Performance
  • 16GB Ultra-Fast GDDR5 Memory on 256-bit Memory Interface
  • Passively Cooled Server Accelerator
  • Large BAR Support for Multi-GPU Peer to Peer
  • ROCm Open Platform for HPC-Class Scale Out
  • Optimized MIOpen Libraries for Deep Learning
  • MxGPU SR-IOV Hardware Virtualization