250 Watt Ampere In A Standard Form Factor


With the launch of their Architecture amp and the new A100 accelerator barely a month behind them, NVIDIA this morning announces the PCIe version of their accelerator as part of the launch of the now virtual ISC Digital conference for the high performance computing. The most prudent counterpart of NVIDIA’s flagship SXM4 version of the A100 accelerator, the PCie version of the A100 is designed to deliver A100 in a more traditional form factor for customers who need something they can plug in on standardized servers. Overall, the PCIe A100 offers the same peak performance as the SXM4 A100, however with a drop of 250 Watts of TDP, real-world performance will not be quite as high.The mandatory consideration for NVIDIA SXM form factor NVIDIA PCIe accelerators is to serve as flesh on the other side of NVIDIA range accelerator. While NVIDIA would gladly sell everyone SXM-based accelerators that would encompass the expensive NVIDIA HGX transporter board – there are still many customers who need to be able to use the GPU accelerators in the standard, PCIe server rack mounting. Or for small workloads, customers don’t need the nature of the 4-way and more scalability offered by SXM-form factor accelerators. So, with their PCIe cards, NVIDIA can serve the rest of the market accelerator which their SXM products cannot reach.

The PCIe A100, in turn, is a full-fledged A100, just to another form factor and with a more suitable TDP. In terms of peak performance, the PCIe A100 is just as fast as its counterpart SXM4; NVIDIA this time, is not this expedition as a low-configuration configuration with fewer clockspeeds or fewer functional blocks than the flagship SXM4 version. As a result, the PCIe card brings everything A100 offers to the table, with the same heavy emphasis on the operations tensor, including the new higher precision TF32 and FP64 formats, and even faster integer inference.

NVIDIA Specification Comparison Accelerator
FP32 CUDA Cores6912691251203584
Boost Clock1.41 GHz1.41 GHz1.38 GHz1.3 GHz
Memory Clock2.4 Gbps HBM22.4 Gbps HBM21.75 Gbps HBM21.4 Gb / s HBM2
Memory Bus Width5120-bit5120-bit4096 bits4096 bits
Memory Bandwidth1.6 TB / s1.6 TB / s900 GB / sec720GB / sec
VRAM40 GB40 GB16 GB / 32 GB16 GB
Simple Precision19.5 TFLOPs19.5 TFLOPs14.1 TFLOPs9.3 TFLOPs
Double Precision9.7 TFLOPs
(1/2 FP32 rate)
9.7 TFLOPs
(1/2 FP32 rate)
(1/2 FP32 rate)
4.7 TFLOPs
(1/2 FP32 rate)
INT8 Tensor624 TOPs624 TOPsN / AN / A
FP16 Tensor312 TFLOPs312 TFLOPs112 TFLOPsN / A
TF32 Tensor156 TFLOPs156 TFLOPsN / AN / A
Relative Performance (SXM Version)90%100%N / AN / A
InterconnectionNVLink 3
6 Links? (300 GB / sec?)
NVLink 3
12 Links (600 GB / sec)
NVLink 2
4 Links (200 GB / sec)
NVLink 1
4 Links (160 GB / sec)
Count transistor54.2 B54.2 B21.1 B15.3 B
Manufacturing processTSMC 7NTSMC 7NTSMC 12nm FFNTSMC 16 nm FinFET
InterfacePCIe 4.0SXM4PCIe 3.0SXM

But because the dual form factor expansion card slot is designed for the bottom of the TDP of products, thus providing less room for cooling and generally less access to electricity as well, the PCIe version of the A100 n have ratchet down its TDP from 400W to 250W. That’s a considerable 38% reduction in power consumption, and as a result, the PCIe A100 is not going to be able to match the enduring performance of its SXM4 counterpart figures – that’s the advantage to go with a more power form factor and cooling budgets. All told, the PCIe version of the A100 should provide around 90% of the performance of the SXM4 mono-GPU workload, which for a big drop in TDP is not a bad compromise.

And on that note, I have to give NVIDIA credit when credit is due: unlike the PCIe accelerator V100 version, NVIDIA is doing a much better job of documenting these performance differences. This time around NVIDIA is explicitly noting the 90% figure in their specification and marketing sheets related to materials. So there should be much less confusion about how the PCIe accelerator version compares to the SXM version.

Other than the form factor and TDP changes, the only other notable standard deviation for the PCIe A100 of the SXM version is the number of NVLink connected Gpu supported. For their NVIDIA PCIe card is again using the NVLink bridges connected across the top of A100 cards, thus allowing two (and only two) cards to be linked together. NVIDIA datasheet does not have a list of total bandwidth available, but the PCIe V100 supported up to 100 gbps in each direction using two links, the PCIe A100 and its 3 NVLink connectors should be able to do 150 GB / sec, if not more.

Otherwise, the PCIe A100 comes with the habit of form factor trimmings. The card is fully passively cooled, designed to be used with servers with powerful chassis fans. And if no official NVIDIA photo of shots, there are sockets for PCIe power connectors. Meanwhile, with the reduction in the use of NVLink in this version of the card, A100 native PCIe 4 support will no doubt be of increasing importance here, highlighting the advantage that the AMD Epyc + NVIDIA A100 coupling has now since AMD is the only x86 server vendor PCIe 4 support.

Packing stuff, while NVIDIA is not announcing pricing or information availability today, the new PCIe A100 cards are expected to be shipping soon. The increased compatibility of the PCIe card has allowed NVIDIA to line up over 50 server wins at this point, with 30 of those servers set to ship this summer.


Please enter your comment!
Please enter your name here