Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache

Name: Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache
Item: Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache
Author: Dr. Ian Cutress

by Dr. Ian Cutress on November 15, 2021 9:00 AM EST

69 Comments | Add A Comment

69 Comments

This week we have the annual Supercomputing event where all the major High Performance Computing players are putting their cards on the table when it comes to hardware, installations, and design wins. As part of the event Intel is having a presentation on its hardware offerings, which discloses additional details about the next generation hardware going into the Aurora Exascale supercomputer.

Aurora is a contract that Intel has had for some time – the scope was originally to have a 10nm Xeon Phi based system, for which the idea was mothballed when Xeon Phi was scrapped, and has been an ever changing landscape due to Intel’s hardware offerings. It was finalized a couple of years ago that the system would now be using Intel’s Sapphire Rapids processors (the ones that come with High Bandwidth Memory) combined with new Ponte Vecchio X^e-HPC based GPU accelerators and boosted from several hundred PetaFLOPs to an ExaFLOP of compute. Most recently, Intel CEO Pat Gelsinger has disclosed that the Ponte Vecchio accelerator is achieving double the performance, above the expectations of the original disclosures, and that Aurora will be a 2+EF Supercomputer when built. Intel is expecting to deliver the first batch of hardware to the Argonne National Laboratory by the end of the year, but this will come with $300m write-off on Intel’s Q4 financials. Intel is expecting to deliver the rest of the machine through 2022 as well as ramp up the production of the hardware for mainstream use through Q1 for wider spread launch in the first half of the year.

Today we have additional details about the hardware.

On the processor side, we know that each unit of Aurora will feature two of Intel’s newest Sapphire Rapids CPUs (SPR), featuring four compute tiles, DDR5, PCIe 5.0, CXL 1.1 (not CXL.mem), and will be liberally using EMIB connectivity between the tiles. Aurora will also be using SPR with built-in High Bandwidth Memory (SPR+HBM), and the main disclosure is that SPR+HBM will offer up to 64 GB of HBM2e using 8-Hi stacks.

Based on the representations, Intel intends to use four stacks of 16 GB HBM2e for a total of 64 GB. Intel has a relationship with Micron, and the Micron HBM2e physical dimensions are in line with the representations given in Intel’s materials (compared to say, Samsung or SKHynix). Micron currently offers two versions of 16 GB HBM2E with ECC hardware: one at 2.8 Gbps per pin (358 GB/s per stack) and one at 3.2 Gbps per pin (410 GB/s per stack). Overall we’re looking at a peak bandwidth then between 1.432 TB/s to 1.640 TB/s depending on which version Intel is using. Versions with HBM will use an additional four tiles, to connect each HBM stack to one of SPR’s chiplets.

Based on this diagram from Intel, despite Intel stating that SPR+HBM will share a socket with traditional SPR, it’s clear that there will be versions that are not compatible. This may be an instance where the Aurora versions of SPR+HBM are tuned specifically for that machine.

On the Ponte Vecchio (PVC) side of the equation, Intel has already disclosed that a single server inside Aurora will have six PVC accelerators per two SPR processors. Each of the accelerators will be connected in an all-to-all topology to each other using the new Xe-Link protocol built into each PVC – Xe-Link supports 8 in fully connected mode, so Aurora only needing six of those saves more power for the hardware. It’s not been disclosed how they are connected to the SPR processors – Intel has stated that there will be a unified memory architecture between CPU and GPU.

The insight added today by Intel is that each Ponte Vecchio dual-stack implementation (the diagram Intel has shown repeatedly is two stacks side by side) will feature a total of 64 MB of L1 cache and 408 MB of L2 cache, backed by HBM2e.

408 MB of L2 cache across two stacks means 204 MB per stack. If we compare that to other hardware:

NVIDIA A100 has 40 MB of L2 cache
AMD’s Navi 21 has 128 MB of Infinity Cache (an effective L3)
AMD’s CNDA2 MI250X in Frontier has 8 MB of L2 per ‘stack’, or 16 MB total

Whichever way you slice it, Intel is betting hard on having the right hierarchy of cache for PVC. Diagrams of PVC also show 4 HBM2e chips per half, which suggests that each PVC dual-stack design might have 128 GB of HBM2e. It is likely that none of them are ‘spare’ for yield purposes, as a chiplet based design allows Intel to build PVC using known good die from the beginning.

On top of this, we also get an official number as to the scale of how many Ponte Vecchio GPUs and Sapphire Rapids (+HBM) processors we need for Aurora. Back in November 2019, when Aurora was only listed as a 1EF supercomputer, I crunched some rough numbers based on Intel saying Aurora was 200 racks and making educated guesses on the layout – I got to 5000 CPUs and 15000 GPUs, with each PVC needing around 66.6TF of performance. At the time, Intel was already showing off 40 TF of performance per card on early silicon. Intel’s official numbers for the Aurora 2EF machine are:

18000+ CPUs and 54000+ GPUs is a lot of hardware. But dividing 2 Exaflops by 54000 PVC accelerators comes to only 37 TeraFlops per PVC as an upper bound, and that number is assuming zero performance is coming from the CPUs.

To add into the mix: Intel CEO Pat Gelsinger only a couple of weeks ago said that PVC was coming in at double the performance originally expected, allowing Aurora to be a 2EF machine. Does that mean the original performance target for PVC was ~20 TF of FP64? Apropos of nothing, AMD’s recent MI250X announcement last week showcased a dual-GPU chip with 47.9 TF of FP64 vector performance, moving to 95.7 TF in FP64 matrix performance. The end result here might be that AMD’s MI250X is actually higher raw performance than PVC, however AMD requires 560 W for that card, whereas Intel’s power numbers have not been disclosed. We could do some napkin math here as well.

Frontier uses 560 W MI250X cards, and is rated for 1.5 ExaFlops of FP64 Vector at 30 MW of power. This means Frontier needs 31300 cards (1.5 EF / 49.7 TF) to meet performance targets, and for each 560 W MI250X card, Frontier has allocated 958 Watts of power (30 MW / 31300 cards). This is a 71% overhead for each card (which means cooling, storage systems, other compute/management etc).
Aurora uses PVC at an unknown power, is rated for 2 ExaFlops of FP64 Vector at 60 MW of power. We know that PVC has 54000+ cards to meet performance targets, which means that the system has allocated 1053 W (that’s 60 MW / 54000) per card to include the PVC accelerator and other overheads required. If we were to assume (a big assumption I know) that Frontier and Aurora have similar overheads, then we’re looking at 615 W per PVC.
This would end up with PVC at 615 W for 37 TF, against MI250X at 560 W for 47.9 TF.
This raw discussion fails to discuss specific features each card has for its use case.

Compute GPU Accelerator Comparison Confirmed Numbers
AnandTech	Intel	AMD	NVIDIA
Product	Ponte Vecchio	MI250X	A100 80GB
Architecture	Xe-HPC	CDNA2	Ampere
Transistors	100 B	58.2 B	54.2 B
Tiles (inc HBM)	47	10	6 + 1 spare
Compute Units	128	2 x 110	108
Matrix Cores	128	2 x 440	432
INT8 Tensor	?	383 TOPs	624 TOPs
FP16 Matrix	?	383 TOPs	312 TOPs
FP64 Vector	?	47.9 TFLOPS	9.5 TFLOPS
FP64 Matrix	?	95.7 TFLOPs	19.5 TFLOPS
L2 / L3	2 x 204 MB	2 x 8 MB	40 MB
VRAM Capacity	128 GB (?)	128 GB	80 GB
VRAM Type	8 x HBM2e	8 x HBM2e	5 x HBM2e
VRAM Width	?	8192-bit	5120-bit
VRAM Bandwidth	?	3.2 TB/s	2.0 TB/s
Chip-to-Chip Total BW	8	8 x 100 GB/s	12 x 50 GB/s
CPU Coherency	Yes	With IF	With NVLink 3
Manufacturing	Intel 7 TSMC N7 TSMC N5	TSMC N6	TSMC N7
Form Factors	OAM	OAM (560 W)	SXM4 (400W*) PCIe (300W)
Release Date	2022	11/2021	11/2020
*Some Custom deployments go up to 600W

Intel also disclosed that it will be partnering with SiPearl to deploy PVC hardware in the European HPC efforts. SiPearl is currently building an Arm-based CPU called Rhea built on TSMC N7.

Moving forward, Intel also released a mini-roadmap. Nothing too surprising here - Intel has plans for designs beyond Ponte Vecchio, and that future Xeon Scalable processors will also have options enabled with HBM.

69 Comments

View All Comments

mode_13h - Monday, November 22, 2021 - link
Well said.

I sometimes think ardent atheists, such as Richard Dawkins, go too far and ultimately hurt their own cause. Science can never explain why the universe exists. To pretend otherwise is as fraudulent as any claim made by the religions they're trying to counter.

I take a pragmatic position. For instance, we should teach evolution in schools because it offers practical insight and has predictive power. Whether you believe it was guided by a higher power, or that the world was suddenly created in a way that makes it *look* far older than it really is, aren't questions we ultimately need to answer, in order to resolve that particular question. So, why even go down those unproductive paths?
Oxford Guy - Monday, November 22, 2021 - link
There is nothing ‘too far’ about refusing to believe in things that are produced via imagination.

‘many feel science is trying to dethrone God. Of course, science's job is to dethrone falsehood only, explain Nature, and find truth.’

It’s not ‘trying’. It stands in fundamental opposition. Science and religion are 100% incompatible. Any ostensible blending of the two is the latter not the former.
GeoffreyA - Tuesday, November 23, 2021 - link
Oxford Guy, their going too far may lead them to the opposite extreme, where they end up with something equivalent to a Creator. For example, an infinite number of universes, where one, ours, has just the right values. Or Hawking's fluctuation in the quantum vacuum giving rise to the universe. If these aren't imaginary, beyond the reach of present evidence, and strangely smacking of creation myth, I give up.

All that we know was the universe began at the big bang, or receded from a previous cycle, and to me, that is quite consistent with a Creator. I don't vouch for religious scripture, tradition, or doctrine, but believe that some Being, or Beings, put together the universe: there's mathematical design in it left, right, and centre. Change one law slightly and the whole edifice collapses.

Science and religion have different aims. Science creates predictive models that fit the evidence of Nature, whereas religion tries to explain it from a higher point of view, why, and from where. Religion shouldn't assert this as infallible truth, but rather belief or faith. When science ventures into the realm of metaphysical speculation, as in multiverse theories that explain our world, it ought to admit it's operating on the same level as speculation about God. The odd thing is, people can be just as dogmatic about such speculation. It appears that when man rejects the "imaginary God," he tends to create God in another form.
mode_13h - Tuesday, November 23, 2021 - link
> The odd thing is, people can be just as dogmatic about such speculation.

Woah! There's dogma in science??
; )

> ... when man rejects the "imaginary God," he tends to create God in another form.

Well said.
GeoffreyA - Wednesday, November 24, 2021 - link
"Woah! There's dogma in science??"

Ay, and excommunication too! ;)
mode_13h - Wednesday, November 24, 2021 - link
> I ... believe that some Being, or Beings, put together the universe

The problem this poses is how *they* came into being. Were they created by other Beings? Is it "Turtles, all the way down"?

I'm being cute, but it's worth pointing out that the Creator conjecture doesn't really resolve the question of the ultimate origin of everything.
GeoffreyA - Thursday, November 25, 2021 - link
The hardest question and something I've often battled with but come to no answer. Referring it to a creator leads to the usual, infinite recursion, turtles all the way down.

In order to proceed, our concept of time must be obliterated. Spatial thinking must go. Then we're left with a non-spatial, non-temporal domain. Somehow, through information stored in that primitive realm, spacetime emerges. All our human thinking deals with "where" and "when," and as a result, "where did that come from?" The only conclusion I'm left with is that in that timeless, spaceless arena, nothing came from anywhere but simply is. Though counter-intuitive, it may be the only way out; the alternative is infinite recursion. I would say that the secret is understanding the nature of non-time, which science seems to hint at over and over again. This Being, if existing, dwells in a mode of existence quite alien to any conception of ours. He/She/It, for want of a better word, is perhaps a form of mind and could be the absolute groundwork of Existence. (Perhaps there is an absolute frame of reference after all, Uncle Albert.) How? I don't know, but it's the only solution I've got at present. Recursion is the other one but solves nothing.

This question walks the very border where existence and non-existence meet. What, exactly, is non-existence? And what is its opposite? Is it possible that out of nothing, something came? Did zero separate into +something and -something? Is it possible that the human mind is colouring the idea that "nothing" is fundamental? What if "something" were first? Or "nothing" wasn't stable? What, exactly, is infinity? Where there is no space, "where" is everything stored? Did reality spring from some mathematical set in some abstract realm? It's been said that, from the point of view of light, there is no length, and that when mass ceases to exist, as in the far, far future, distance loses meaning, as well as time. (Put differently, mass caused the symmetry breaking that introduced length and time scales.) Perhaps a hint of what that pre-spacetime arena is like. The question fills me with wonder and, though the answer is beautiful, sadly, we may never know. I sometimes like to think of it as an endless sea, grey clouds curling backwards mightily, and we ourselves standing on the shores of eternity.
GeoffreyA - Thursday, November 25, 2021 - link
On reading again, this sounds like the ravings of a madman, and assumes what it sets out to prove! Apologies, really.
mode_13h - Friday, November 26, 2021 - link
No apology needed. It's heavy stuff. No one has these answers.

We just need to learn to sit comfortably, in our ignorance of such things. If faith helps you do that, I think it's a legitimate means.
GeoffreyA - Sunday, November 28, 2021 - link
Yes. And food, family, and love: the most important things in life.

Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache

Related Reading

Post Your Comment

69 Comments

View All Comments

mode_13h - Monday, November 22, 2021 - link

Oxford Guy - Monday, November 22, 2021 - link

GeoffreyA - Tuesday, November 23, 2021 - link

mode_13h - Tuesday, November 23, 2021 - link

GeoffreyA - Wednesday, November 24, 2021 - link

mode_13h - Wednesday, November 24, 2021 - link

GeoffreyA - Thursday, November 25, 2021 - link

GeoffreyA - Thursday, November 25, 2021 - link

mode_13h - Friday, November 26, 2021 - link

GeoffreyA - Sunday, November 28, 2021 - link

Log in

Don't have an account? Sign up now