Skip navigation

NVIDIA Updates Pascal GPU Board - Four HBM2 Stacks and Massive Die Previewed

Post

Launch in 2016, 200 GB/s NVLINK Interconnect

NVIDIA Updates Pascal GPU Board – Four HBM2 Stacks and Massive Die Previewed Ahead of Launch in 2016, 200 GB/s NVLINK Interconnect

HARDWAREINDUSTRYREPORT 1 month ago by Hassan Mujtaba 
  
NVIDIA’s next generation, high-performance graphics core, codenamed Pascal is planned for launch in 2016. Pascal is going to bring several new technologies to the green side in the form of the latest process node, an efficient and dense design, High-Bandwidth memory, Unified Memory support and NVLINK interconnect. The Pascal GPU will not only be an update for GeForce users but also the latest CUDA compute architecture that will be geared towards the HPC market which includes servers and workstations.


NVIDIA Pascal GPU Spotted at GTC Taiwan 2015 – Fiji-Like Design With 4 HBM2 Stacks, 1 TB/s Bandwidth

Two years ago, NVIDIA announced their latest GPU roadmap showcasing the Volta GPU as a replacement to the Maxwell GPU. Last year, a surprising update came in the form of Pascal which replaced Volta for launch in 2016 while Volta itself was pushed to 2018. NVIDIA’s Volta was supposed to be the first GPU from the green team to feature stacked DRAM but that wasn’t the case as Pascal was to shine with the latest memory and architectural features underneath its hood. This year at GTC 2015, NVIDIA shared quite a lot of details about Pascal GPU but we have yet to get a glance of the architectural improvements implemented inside Pascal GPU and all the juicy details would have to wait till next year’s GTC in April 2016 which will be around the time Pascal makes it into the market stream.

Last month, at GTC Taiwan 2015, NVIDIA presented brief technical seminars for their GPUs and the applications that worked around them. During the main keynote, Vice President of Solutions Architecture and Engineering at NVIDIA, Marc Hamilton, talked about several new technologies that NVIDIA will be announcing in the coming years. Of course, Pascal was a part of the keynote and not only did he talked about Pascal GPU but one of the slides showcased the updated Pascal GPU board with the actual chip fused on the new form factor which will be aimed at HPC servers.










 
  • Pascal microarchitecture.
  • DirectX 12 feature level 12_1 or higher.
  • Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
  • Built on the 16FF+ manufacturing process from TSMC.
  • Allegedly has a total of 17 billion transistors, more than twice that of GM200.
  • Taped out in June 2015.
  • Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM for the consumer variant and 32GB for the professional variant.
  • Features a 4096bit memory interface.
  • Features NVLink and support for Mixed Precision FP16 compute tasks at twice the rate of FP32 and full FP64 support. 2016 release.
  • When Pascal was initially announced, NVIDIA’s CEO, Jen-Hsun Huang, showcased a prototype board that was meant to visualize the concept of HBM memory featured on an interposer along with the GPU. AMD gave us the first consumer HBM offering and we saw how the HBM architecture was actually integrated on the main chip which housed the GPU and HBM chips. Measuring 5x7mm in size, the HBM chips were not only small but saved a lot of room to make insanely compact cards such as the Radeon R9 Nano and the Radeon R9 Fury X. While limited to 4 GB, HBM1 proved that the new architecture saves energy, saves space, runs much faster and can be stacked with higher memory in future versions with the HBM2 technology that arrives in 2016 with the Pascal and Arctic Islands chips.
 
  • The latest picture of the Pascal GPU board is slightly different than the prototype board NVIDIA showcased a year back. This time, the board uses the actual Pascal GPU core with four HBM2 stacks which will feature up to 16 GB VRAM on consumer and 32 GB VRAM on professional HPC solutions. The Pascal GPU looks very similar to the Fiji GPU with a similar design. The die seems slightly larger than the Fiji GPU and could be anywhere around 500-600mm2. We cannot say for sure whether the Pascal chip shown on the board is the full GP100 solution or a lower tier chip that will come in as a successor to the GM204 chip but knowing that NVIDIA has aimed their high-performance chips at the HPC market, such board designs will act as a new form factor for workstation/servers and it is likely to be featuring the full Pascal GPU. On the sides of the chip, we can see the metallic heatspreader while the VRMs/MOSFETs sit on both sides o the chip.
 
  • Now we know that NVIDIA has taped out Pascal chips and we recently spotted a shipment of Pascal GPUs on their way to NVIDIA’s testing facility straight from TSMC’s fabs. This could mean that the chip we are looking at is very much the first look at an actual Pascal GPU with stacked HBM unlike the prototype board we saw back in 2014. Now there’s been some questioning about the board we were showcased back in 2014 as to whether it will be an actual form factor and it was officially stated by NVIDIA that along side PCI-Express form factors, Pascal GPUs will be available on Mezzanine board which is smaller than PCI-Express 3.0 PCBs. This specific PCB will come with the Mezzanine connector that has speeds of 15 GB/s and up to 40 GB/s  and will be available on select HPC servers and workstations that feature NVLINK support. Several of these boards can be stacked on top of each other to conserve space inside servers while consumer PCs will stick with PCI-Express form factor and full-length cards as they are the best solution for high-end gaming rigs and professional usage.

NVIDIA NVLINK and Future of HPC Oriented GPUs

The Pascal GPU would also introduce NVLINK which is the next generation Unified Virtual Memory link with Gen 2.0 Cache coherency features and 5 – 12 times the bandwidth of a regular PCIe connection. This will solve many of the bandwidth issues that high performance GPUs currently face. One of the latest things we learned about NVLINK is that it will allow several GPUs to be connected in parallel in HPC focused platforms that will feature several nodes fitted with Pascal GPUs for compute oriented workloads. The latest NVLINK interconnect path will allow multi-processors featured inside HPC blocks to have faster interconnect than traditional PCI-e Gen3 lanes up to 200 GB/s speeds. Pascal GPUs will also feature Unified memory support allowing the CPU and GPU to share the same memory pool and finally we have Mixed precision support. While NVLINK isn’t planned for commercial integration right now, it will be featured in PCs using ARM64 chips and some x86 powered HPC servers that utilize from OpenPower, Tyan and Quantum solutions.

Outpacing PCI Express
 
 















Today a typical system has one or more GPUs connected to a CPU using PCI Express. Even at the fastest PCIe 3.0 speeds (8 Giga-transfers per second per lane) and with the widest supported links (16 lanes) the bandwidth provided over this link pales in comparison to the bandwidth available between the CPU and its system memory. In a multi-GPU system, the problem is compounded if a PCIe switch is used. With a switch, the limited PCIe bandwidth to the CPU memory is shared between the GPUs. The resource contention gets even worse when peer-to-peer GPU traffic is factored in.
 


NVLink addresses this problem by providing a more energy-efficient, high-bandwidth path between the GPU and the CPU at data rates 5 to 12 times that of the current PCIe Gen3. NVLink will provide between 80 and 200 GB/s of bandwidth, allowing the GPU full-bandwidth access to the CPU’s memory system.

A Flexible and Energy-Efficient Interconnect

The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Our Pascal GPUs will support a number of these links, providing configuration flexibility. The links can be ganged together to form a single GPU↔CPU connection or used individually to create a network of GPU↔CPU and GPU↔GPU connections allowing for fast, efficient data sharing between the compute elements.


When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections enabling previously unavailable opportunities for GPU clustering.

 

Moving data takes energy, which is why we are focusing on making NVLink a very energy efficient interconnect. NVLink is more than twice as efficient as a PCIe 3.0 connection, balancing connectivity and energy efficiency.

Understanding the value of the current ecosystem, in an NVLink-enabled system, CPU-initiated transactions such as control and configuration are still directed over a PCIe connection, while any GPU-initiated transactions use NVLink. This allows us to preserve the PCIe programming model while presenting a huge upside in connection bandwidth.


 

The NVIDIA Pascal GPU will be a major update as it will probably turn out to be the first family of GPUs to utilize from HBM2 and the latest 16nm FinFET process. Next year, AMD plans to launch their Arctic Islands family too with an insane transistor count that’s rumored around 17-18 billion utilizing the same HBM2 memory and new process node. The NVIDIA Pascal GPU will be featured inside top of the line servers and workstation while Volta, the GPU after it will be featured inside two next generation super computers codenamed Sierra and Summit, reaching over 300 Peta Flops of compute performance. If you thought the Radeon R9 Fury X and the GeForce GTX 980 Ti were beastly cards, than you should be prepared to see the monstrous amount of performance that next generation GPUs are going to offer.
 

NVIDIA Pascal GPU Prototype Board:


 
GPU Family AMD Arctic Islands NVIDIA Pascal
     
GPU Name AMD Greenland NVIDIA GP100
GPU Process TSMC 16nm FinFET TSMC 16nm FinFET
GPU Transistors 15-18 Billion ~17 Billion
HBM Memory (Consumers) 4-16 GB (SK Hynix) HBM2 2-16 GB (SK Hynix/Samsung)
HBM2
HBM Memory (Dual-Chip Professional/ HPC) 32 GB (SK Hynix) HBM2 32 GB (SK Hynix/Samsung) HBM2
HBM2 Bandwidth 1 TB/s (Peak) 1 TB/s (Peak)
Graphics Architecture GCN 2.0? (New ISA) Next-CUDA (Compute Oriented)
Successor of (GPU) Fiji (Radeon 300/Fury) GM200 (Maxwell)
 

 


Back to the top
1 guest and 0 members have just viewed this.