NVIDIA Announces New GPU Architecture, A100 GPU and Accelerator


While NVIDIA’s usual presentation efforts for the year have been wiped out by the current coronavirus epidemic, the company’s progress towards the development and commercialization of new products has continued unabated. To that end, at today’s GPU 2020 digital conference, the company and its CEO Jensen Huang take to the virtual stage to announce NVIDIA’s next generation GPU architecture, Ampere, and the first products to ‘will use.

As the Volta revealed 3 years ago – and is now traditional for NVIDIA GTC – priority is now given to the very high end of the market. In 2017, NVIDIA launched the Volta-based GV100 GPU, and with it the V100 accelerator. The V100 was a huge success for the company, considerably developing its data center activity on the back of the new tensor cores of the Volta architecture and pure brute force which can only be supplied by an 800 mm2+ GPU. Now in 2020, the company is looking to continue this growth with Volta’s successor, Ampere architecture.

Now, a much more secret business than before, NVIDIA held its future GPU roadmap close to its chest. While the codename Ampere (among others) has been floating around for some time now, it is only this morning that we finally get confirmation of the presence of Ampere, as well as our first details on the architecture. Due to the nature of NVIDIA’s digital presentation – as well as the limited information provided in NVIDIA press pre-briefings – we do not yet have all the details on Ampere. However, for this morning at least, NVIDIA is addressing architecture highlights for its data center computing and artificial intelligence customers, and what major innovations Ampere is bringing to help them with their workloads.

The Ampere family kicks off the A100. Officially, this is the name of the GPU and the accelerator that incorporates it; and at least at the moment, they are both identical, because there is only one accelerator using the GPU.

NVIDIA Accelerator Specifications Comparison
100 V100 P100
FP32 CUDA Cores 6912 5120 3584
Boost Clock ~ 1.41 GHz 1530 MHz 1480 MHz
Memory clock 2.4 Gbit / s HBM2 1.75 Gbit / s HBM2 1.4 Gbit / s HBM2
Memory bus width 5120 bits 4096 bits 4096 bits
Memory bandwidth 1.6 TB / sec 900 GB / sec 720 GB / sec
VRAM 40 GB 16 GB / 32 GB 16 GB
Unique precision 19.5 TFLOP 15.7 TFLOP 10.6 TFLOP
Double precision 9.7 TFLOP
(1/2 rate FP32)
(1/2 rate FP32)
(1/2 rate FP32)
INT8 Tensor 624 TOP N / A N / A
FP16 Tensor 312 TFLOP 125 TFLOP N / A
TF32 Tensor 156 TFLOP N / A N / A
Interconnect NVLink 3
12 links (600 GB / sec)
NVLink 2
6 links (300 GB / sec)
NVLink 1
4 links (160 GB / sec)
GPU 100
(826 mm2)
(815 mm2)
(610 mm2)
Number of transistors 54.2B 21.1B 15.3B
TDP 400W 300W / 350W 300W
Manufacturing process TSMC 7N TSMC 12nm FFN TSMC 16nm FinFET
Interface SXM4 SXM2 / SXM3 SXM
Architecture Ampere Volta Pascal

Designed to be the successor to the V100 accelerator, the A100 aims just as high, just as we expected from NVIDIA’s new flagship accelerator for computing. The leading Ampere part is built on the 7 nm process of TSMC and integrates 54 billion transistors, 2.5 times more than the V100 before. NVIDIA implemented the full density improvements offered by the 7nm process, then some because the resulting GPU matrix measures 826mm2 in size, even larger than the GV100. NVIDIA went big on the last generation, and to outdo themselves, they got bigger this generation.

We’ll cover the individual specifications in more detail later, but at a high level, it’s clear that NVIDIA has invested more in some areas than others. The performance of the FP32 is only slightly improved on paper compared to the V100. Meanwhile, tensor performance has improved dramatically – almost 2.5 times for FP16 tensors – and NVIDIA has significantly expanded the formats that can be used with INT8 / 4 support, as well as a new FP32 format called TF32. Memory bandwidth is also expected significantly, with several HBM2 memory stacks providing a total of 1.6 TB / second of bandwidth to feed the beast that is Ampere.

NVIDIA will provide the initial version of this accelerator in their now common SXM format, which is a mezzanine-type card well suited for installation in servers. From generation to generation, energy consumption has increased again, which is probably suitable for a generation called Ampere. In total, the A100 is rated at 400 W, against 300 W and 350 W for different versions of the V100. This makes the SXM form factor all the more important to NVIDIA’s efforts, since PCIe cards would not be suitable for this type of power consumption.

As for the Ampere architecture itself, NVIDIA today publishes limited details on this subject. Expect to hear more in the coming weeks, but for now NVIDIA confirms that they retain their different architecturally compatible product lines, albeit in potentially very different configurations. So, although the company is not talking about Ampere (or derivatives) for video cards today, it makes it clear that what it is working on is not pure computing architecture, and that technologies ‘Ampere will also be found in the graphic parts. , probably with new features for them too. Ultimately, this is part of NVIDIA’s ongoing strategy to ensure they have a unique ecosystem, where, to quote Jensen, “Every single workload runs on every GPU.”

This is the latest news


Please enter your comment!
Please enter your name here