A Success on Arm for HPC: We Found a Fujitsu A64FX Wafer
by Dr. Ian Cutress on December 5, 2019 8:00 AM EST- Posted in
- Enterprise
- Arm
- Trade Shows
- Accelerator
- Fujitsu
- Supercomputing 19
- SC19
- A64FX
When speaking about Arm in the enterprise space, the main angle for discussion is on the CPU side. Having a high-performance SoC at the heart of the server has been a key goal for many years, and we have had players such as Amazon, Ampere, Marvell, Qualcomm, Huawei, and others play for the server market. The other angle to attack is for co-processors and accelerators. Here we have one main participant: Fujitsu. We covered the A64FX when the design was disclosed at Hot Chips last year, with its super high cache bandwidth, and it will be available on a simple PCIe card. The main end-point for a lot of these cards will be the Fugaku / Post-K supercomputer in Japan, where we expect it to hit a one of the top numbers on the TOP500 supercomputer list next year.
After the design disclosure last year at Hot Chips, at Supercomputing 2018 we saw an individual chip on display. This year at Supercomputing 2019, we found a wafer.
I just wanted to post some photos. Enjoy.
The A64FX is the main recipient of the Arm Scalable Vector Extensions, new to Arm v8.2, which in this instance gives 48 computing cores with a 512-bit wide SIMD powered by 32 GiB of HBM2. Inside the chip is a custom network, and externally the chip is connected via a Tofu interconnect (6D/Torus), and the chip provides 2.7 TFLOPs of DGEMM performance. The chip itself is built on TSMC 7nm and has 8.786 billion transistors, but only 594 pins. Peak memory bandwidth is 1 TB/s.
The chip is built for both high performance, high throughput, and high performance per watt, supporting FP64 through to INT8. The L1 data cache is designed for sustained throughput, and power management is tightly controlled on chip. Either way you slice it, this chip is mightily impressive. We even saw HPE deploy two of these chips in a single half-width node.
23 Comments
View All Comments
PixyMisa - Thursday, December 5, 2019 - link
594 signal pins. Plus a whole lot of power and ground.PeachNCream - Thursday, December 5, 2019 - link
So is it ARM or Arm? I see AT using those two forms and I'm uncertain if there is a distinction between the all caps and first letter only capitalized variations.FreckledTrout - Thursday, December 5, 2019 - link
ARM = ArchitectureArm = Company specifically Arm Holdings
PeachNCream - Thursday, December 5, 2019 - link
Ah thanks muchly!Ian Cutress - Thursday, December 5, 2019 - link
Arm (the company) also used to be ARM stylistically, because it is/was an acronym. Sometimes we forget, because it was that way for so long (!)29a - Thursday, December 5, 2019 - link
Acorn RISM Machine. ; )29a - Thursday, December 5, 2019 - link
RISCSupercell99 - Friday, December 6, 2019 - link
not confusing at all.Soulkeeper - Thursday, December 5, 2019 - link
I guess this would be a competitor to the NEC TSUBASAname99 - Friday, December 6, 2019 - link
Kinda amazing to compare that 8.8B transistors with the 10B in an A12X.Not in a dick-measuring way -- obviously Fujitsu are doing everything very differently, from using all large HPC transistors to providing a LOT of high drive memory IOs.
But interesting in the sense of how little transistor count tells you anymore.
Fujitsu are going for good CPU cores (with at least non-trivial amounts of various cache), and this gets them 48 such cases.
Apple is going for a whole lot of lightweight throughput computing (GPU, NPU, ISP, media ...) and burns up the equivalent of what, maybe 36 or so Fujitsu cores on that stuff.
People (some anyway) mock the idea of dark silicon, but this comparison is part of dark silicon in action --- Apple (and Qualcomm, and Huawei, just not yet quite at the level of Apple) burning SO many transistors to ship what's the equivalent of a 48 (or 52) core chip with "only" 8 nominal cores...