NVIDIA Updates Pascal GPU Board - Four HBM2 Stacks and Massive Die Previewed
#24 (In Topic #13)
Launch in 2016, 200 GB/s NVLINK Interconnect
HARDWAREINDUSTRYREPORT 1 month ago by Hassan Mujtaba
NVIDIA’s next generation, high-performance graphics core, codenamed Pascal is planned for launch in 2016. Pascal is going to bring several new technologies to the green side in the form of the latest process node, an efficient and dense design, High-Bandwidth memory, Unified Memory support and NVLINK interconnect. The Pascal GPU will not only be an update for GeForce users but also the latest CUDA compute architecture that will be geared towards the HPC market which includes servers and workstations.
NVIDIA Pascal GPU Spotted at GTC Taiwan 2015 – Fiji-Like Design With 4 HBM2 Stacks, 1 TB/s Bandwidth
Two years ago, NVIDIA announced their latest GPU roadmap showcasing the Volta GPU as a replacement to the Maxwell GPU. Last year, a surprising update came in the form of Pascal which replaced Volta for launch in 2016 while Volta itself was pushed to 2018. NVIDIA’s Volta was supposed to be the first GPU from the green team to feature stacked DRAM but that wasn’t the case as Pascal was to shine with the latest memory and architectural features underneath its hood. This year at GTC 2015, NVIDIA shared quite a lot of details about Pascal GPU but we have yet to get a glance of the architectural improvements implemented inside Pascal GPU and all the juicy details would have to wait till next year’s GTC in April 2016 which will be around the time Pascal makes it into the market stream.
Last month, at GTC Taiwan 2015, NVIDIA presented brief technical seminars for their GPUs and the applications that worked around them. During the main keynote, Vice President of Solutions Architecture and Engineering at NVIDIA, Marc Hamilton, talked about several new technologies that NVIDIA will be announcing in the coming years. Of course, Pascal was a part of the keynote and not only did he talked about Pascal GPU but one of the slides showcased the updated Pascal GPU board with the actual chip fused on the new form factor which will be aimed at HPC servers.
- When Pascal was initially announced, NVIDIA’s CEO, Jen-Hsun Huang, showcased a prototype board that was meant to visualize the concept of HBM memory featured on an interposer along with the GPU. AMD gave us the first consumer HBM offering and we saw how the HBM architecture was actually integrated on the main chip which housed the GPU and HBM chips. Measuring 5x7mm in size, the HBM chips were not only small but saved a lot of room to make insanely compact cards such as the Radeon R9 Nano and the Radeon R9 Fury X. While limited to 4 GB, HBM1 proved that the new architecture saves energy, saves space, runs much faster and can be stacked with higher memory in future versions with the HBM2 technology that arrives in 2016 with the Pascal and Arctic Islands chips.
- The latest picture of the Pascal GPU board is slightly different than the prototype board NVIDIA showcased a year back. This time, the board uses the actual Pascal GPU core with four HBM2 stacks which will feature up to 16 GB VRAM on consumer and 32 GB VRAM on professional HPC solutions. The Pascal GPU looks very similar to the Fiji GPU with a similar design. The die seems slightly larger than the Fiji GPU and could be anywhere around 500-600mm2. We cannot say for sure whether the Pascal chip shown on the board is the full GP100 solution or a lower tier chip that will come in as a successor to the GM204 chip but knowing that NVIDIA has aimed their high-performance chips at the HPC market, such board designs will act as a new form factor for workstation/servers and it is likely to be featuring the full Pascal GPU. On the sides of the chip, we can see the metallic heatspreader while the VRMs/MOSFETs sit on both sides o the chip.
- Now we know that NVIDIA has taped out Pascal chips and we recently spotted a shipment of Pascal GPUs on their way to NVIDIA’s testing facility straight from TSMC’s fabs. This could mean that the chip we are looking at is very much the first look at an actual Pascal GPU with stacked HBM unlike the prototype board we saw back in 2014. Now there’s been some questioning about the board we were showcased back in 2014 as to whether it will be an actual form factor and it was officially stated by NVIDIA that along side PCI-Express form factors, Pascal GPUs will be available on Mezzanine board which is smaller than PCI-Express 3.0 PCBs. This specific PCB will come with the Mezzanine connector that has speeds of 15 GB/s and up to 40 GB/s and will be available on select HPC servers and workstations that feature NVLINK support. Several of these boards can be stacked on top of each other to conserve space inside servers while consumer PCs will stick with PCI-Express form factor and full-length cards as they are the best solution for high-end gaming rigs and professional usage.
NVIDIA NVLINK and Future of HPC Oriented GPUs
The Pascal GPU would also introduce NVLINK which is the next generation Unified Virtual Memory link with Gen 2.0 Cache coherency features and 5 – 12 times the bandwidth of a regular PCIe connection. This will solve many of the bandwidth issues that high performance GPUs currently face. One of the latest things we learned about NVLINK is that it will allow several GPUs to be connected in parallel in HPC focused platforms that will feature several nodes fitted with Pascal GPUs for compute oriented workloads. The latest NVLINK interconnect path will allow multi-processors featured inside HPC blocks to have faster interconnect than traditional PCI-e Gen3 lanes up to 200 GB/s speeds. Pascal GPUs will also feature Unified memory support allowing the CPU and GPU to share the same memory pool and finally we have Mixed precision support. While NVLINK isn’t planned for commercial integration right now, it will be featured in PCs using ARM64 chips and some x86 powered HPC servers that utilize from OpenPower, Tyan and Quantum solutions.
Outpacing PCI Express
Today a typical system has one or more GPUs connected to a CPU using PCI Express. Even at the fastest PCIe 3.0 speeds (8 Giga-transfers per second per lane) and with the widest supported links (16 lanes) the bandwidth provided over this link pales in comparison to the bandwidth available between the CPU and its system memory. In a multi-GPU system, the problem is compounded if a PCIe switch is used. With a switch, the limited PCIe bandwidth to the CPU memory is shared between the GPUs. The resource contention gets even worse when peer-to-peer GPU traffic is factored in.
NVLink addresses this problem by providing a more energy-efficient, high-bandwidth path between the GPU and the CPU at data rates 5 to 12 times that of the current PCIe Gen3. NVLink will provide between 80 and 200 GB/s of bandwidth, allowing the GPU full-bandwidth access to the CPU’s memory system.
A Flexible and Energy-Efficient Interconnect
The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Our Pascal GPUs will support a number of these links, providing configuration flexibility. The links can be ganged together to form a single GPU↔CPU connection or used individually to create a network of GPU↔CPU and GPU↔GPU connections allowing for fast, efficient data sharing between the compute elements.
When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections enabling previously unavailable opportunities for GPU clustering.
NVIDIA Pascal GPU Prototype Board:
|GPU Family||AMD Arctic Islands||NVIDIA Pascal|
|GPU Name||AMD Greenland||NVIDIA GP100|
|GPU Process||TSMC 16nm FinFET||TSMC 16nm FinFET|
|GPU Transistors||15-18 Billion||~17 Billion|
|HBM Memory (Consumers)||4-16 GB (SK Hynix) HBM2||2-16 GB (SK Hynix/Samsung)
|HBM Memory (Dual-Chip Professional/ HPC)||32 GB (SK Hynix) HBM2||32 GB (SK Hynix/Samsung) HBM2|
|HBM2 Bandwidth||1 TB/s (Peak)||1 TB/s (Peak)|
|Graphics Architecture||GCN 2.0? (New ISA)||Next-CUDA (Compute Oriented)|
|Successor of (GPU)||Fiji (Radeon 300/Fury)||GM200 (Maxwell)|
1 guest and 0 members have just viewed this.