At the 2016 International Supercomputing Conference, NVIDIA introduced the NVIDIA® Tesla® P100 GPU accelerator for PCIe servers. The PCIe variant was announced to meet the unprecedented computational demands planted on modern data centers. The Tesla P100 is anticipated to deliver massive leaps in performance and value compared with CPU-based systems.

teslap100pcieWith HPC data centers needing to support the ever-growing demands of scientists and researchers while staying within a tight budget, the old approach of deploying lots of commodity computenodes with vast interconnect overhead has shown that the substantial costs doesn’t exactly equate to a huge increase of data center performance. The introduction of the NVIDIA Tesla P100 accelerators should help remedy this situation as it is designed to boost throughput and save money for HPC and hyperscale data centers. Powered by the brand new NVIDIA Pascal™ architecture, Tesla P100 for PCIe-based servers enables a single node to replace up to half-rack of commodity CPU nodes by delivering lightning-fast performance in a broad range of HPC applications. Handling the same workload with far fewer nodes means customers can save up to 70% in overall data center costs.

Earlier this year at the 2016 GPU Technology Conference, NVIDIA showcased the Tesla P100 along with the new DGX-1® supercomputer. It was the first time we witnessed a Pascal architecture powered Tesla GPU and NVIDIA’s new mezzanine connector, also known as the SXM2 interface. While previous Tesla series GPUs came in a PCIe form factor, NVIDIA’s SXM2 connector was a necessary upgrade to optimize NVIDIA’s high-speed NVLink bus. However, NVIDIA recognized that not all users will want to build their systems around mezzanine connections, so naturally a PCIe version of the P100 was also created.

NVIDIA Tesla Family Specification Comparison
Tesla P100 (SXM2) Tesla P100 (16GB) Tesla P100 (12GB) Tesla M40
Stream Processors 3584 3584 3584 3072
Core Clock 1328MHz ? ? 948MHz
Boost Clock(s) 1480MHz 1300MHz 1300MHz 1114MHz
Memory Clock 1.4Gbps HBM2 1.4Gbps HBM2 1.4Gbps HBM2 6Gbps GDDR5
Memory Bus Width 4096-bit 4096-bit 3072-bit 384-bit
Memory Bandwidth 720GB/sec 720GB/sec 540GB/sec 288GB/sec
VRAM 16GB 16GB 12GB 12GB
L2 Cache 4MB 4MB 3MB 3MB
Half Precision 21.2 TFLOPS 18.7 TFLOPS 18.7 TFLOPS 6.8 TFLOPS
Single Precision 10.6 TFLOPS 9.3 TFLOPS 9.3 TFLOPS 6.8 TFLOPS
Double Precision 5.3 TFLOPS 4.7 TFLOPS 4.7 TFLOPS 213 GFLOPS
Max Power Consumption 300W 250W 250W 250W
Form Factor Mezzanine PCIe PCIe PCIe
Cooling N/A Passive Passive Passive
Architecture Pascal Pascal Pascal Maxwell 2

The PCIe-based Tesla P100 will come in two versions: one with a 4096-bit memory bus width, 16 GB VRAM, and a 4MB L2 cache, while the second includes a 3072-bit memory bus width, 12 GB VRAM, and a 3MB L2 Cache. With the mezzanine version touting a boost clock of 1.48 GHz and the PCIe version, 1.3 GHz, the latter is essentially a downclocked version of the former.  Though the “underdog,” the P100 for PCIe-based servers is definitely not “underpowered,” as it still delivers 18.6 TFLOPS of half-precision performance, more than capable of handling massive server banks.

Tesla P100 is said to be “reimagined from silicon to software,” crafted with innovation at every level. It features four groundbreaking technologies that deliver a dramatic jump in performance:

New Pascal Architecture: Delivering 5.3 and 10.6 TeraFLOPS of double and single precision performance for HPC, 21.2 TeraFLOPS of FP16 for Deep learning.

NVLink: The World’s first high-speed Interconnect for multi-GPU scalability with 5x boost in performance (NVLink is not featured on the PCIe version).

CoWoS ® with HBM2: Unifying data and compute into a single package for up to 3X memory bandwidth over prior-generation solutions.

Page Migration Engine: Parallel programming has become simpler by enabling datasets beyond the physical limits of GPU memory.

The PCIe-based NVIDIA Tesla P100 GPU accelerator is expected to be available beginning in Q4 2016 from Exxact Corporation. Exxact Tensor Series servers featuring the Tesla P100 will also be available in Q4 2016. Users not planning on establishing the data center route, or wanting to test the P100 before the PCIe variant ships out, should consider NVIDIA’s P100-powered DGX-1® supercomputer; it features eight Tesla P100 accelerators delivering 170 teraflops of half-precision peak performance, equivalent to 250 CPU-based servers, and can be ordered through Exxact here.