Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache

Name: Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache
Item: Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache
Author: Dr. Ian Cutress

by Dr. Ian Cutress on November 15, 2021 9:00 AM EST

69 Comments | Add A Comment

69 Comments

This week we have the annual Supercomputing event where all the major High Performance Computing players are putting their cards on the table when it comes to hardware, installations, and design wins. As part of the event Intel is having a presentation on its hardware offerings, which discloses additional details about the next generation hardware going into the Aurora Exascale supercomputer.

Aurora is a contract that Intel has had for some time – the scope was originally to have a 10nm Xeon Phi based system, for which the idea was mothballed when Xeon Phi was scrapped, and has been an ever changing landscape due to Intel’s hardware offerings. It was finalized a couple of years ago that the system would now be using Intel’s Sapphire Rapids processors (the ones that come with High Bandwidth Memory) combined with new Ponte Vecchio X^e-HPC based GPU accelerators and boosted from several hundred PetaFLOPs to an ExaFLOP of compute. Most recently, Intel CEO Pat Gelsinger has disclosed that the Ponte Vecchio accelerator is achieving double the performance, above the expectations of the original disclosures, and that Aurora will be a 2+EF Supercomputer when built. Intel is expecting to deliver the first batch of hardware to the Argonne National Laboratory by the end of the year, but this will come with $300m write-off on Intel’s Q4 financials. Intel is expecting to deliver the rest of the machine through 2022 as well as ramp up the production of the hardware for mainstream use through Q1 for wider spread launch in the first half of the year.

Today we have additional details about the hardware.

On the processor side, we know that each unit of Aurora will feature two of Intel’s newest Sapphire Rapids CPUs (SPR), featuring four compute tiles, DDR5, PCIe 5.0, CXL 1.1 (not CXL.mem), and will be liberally using EMIB connectivity between the tiles. Aurora will also be using SPR with built-in High Bandwidth Memory (SPR+HBM), and the main disclosure is that SPR+HBM will offer up to 64 GB of HBM2e using 8-Hi stacks.

Based on the representations, Intel intends to use four stacks of 16 GB HBM2e for a total of 64 GB. Intel has a relationship with Micron, and the Micron HBM2e physical dimensions are in line with the representations given in Intel’s materials (compared to say, Samsung or SKHynix). Micron currently offers two versions of 16 GB HBM2E with ECC hardware: one at 2.8 Gbps per pin (358 GB/s per stack) and one at 3.2 Gbps per pin (410 GB/s per stack). Overall we’re looking at a peak bandwidth then between 1.432 TB/s to 1.640 TB/s depending on which version Intel is using. Versions with HBM will use an additional four tiles, to connect each HBM stack to one of SPR’s chiplets.

Based on this diagram from Intel, despite Intel stating that SPR+HBM will share a socket with traditional SPR, it’s clear that there will be versions that are not compatible. This may be an instance where the Aurora versions of SPR+HBM are tuned specifically for that machine.

On the Ponte Vecchio (PVC) side of the equation, Intel has already disclosed that a single server inside Aurora will have six PVC accelerators per two SPR processors. Each of the accelerators will be connected in an all-to-all topology to each other using the new Xe-Link protocol built into each PVC – Xe-Link supports 8 in fully connected mode, so Aurora only needing six of those saves more power for the hardware. It’s not been disclosed how they are connected to the SPR processors – Intel has stated that there will be a unified memory architecture between CPU and GPU.

The insight added today by Intel is that each Ponte Vecchio dual-stack implementation (the diagram Intel has shown repeatedly is two stacks side by side) will feature a total of 64 MB of L1 cache and 408 MB of L2 cache, backed by HBM2e.

408 MB of L2 cache across two stacks means 204 MB per stack. If we compare that to other hardware:

NVIDIA A100 has 40 MB of L2 cache
AMD’s Navi 21 has 128 MB of Infinity Cache (an effective L3)
AMD’s CNDA2 MI250X in Frontier has 8 MB of L2 per ‘stack’, or 16 MB total

Whichever way you slice it, Intel is betting hard on having the right hierarchy of cache for PVC. Diagrams of PVC also show 4 HBM2e chips per half, which suggests that each PVC dual-stack design might have 128 GB of HBM2e. It is likely that none of them are ‘spare’ for yield purposes, as a chiplet based design allows Intel to build PVC using known good die from the beginning.

On top of this, we also get an official number as to the scale of how many Ponte Vecchio GPUs and Sapphire Rapids (+HBM) processors we need for Aurora. Back in November 2019, when Aurora was only listed as a 1EF supercomputer, I crunched some rough numbers based on Intel saying Aurora was 200 racks and making educated guesses on the layout – I got to 5000 CPUs and 15000 GPUs, with each PVC needing around 66.6TF of performance. At the time, Intel was already showing off 40 TF of performance per card on early silicon. Intel’s official numbers for the Aurora 2EF machine are:

18000+ CPUs and 54000+ GPUs is a lot of hardware. But dividing 2 Exaflops by 54000 PVC accelerators comes to only 37 TeraFlops per PVC as an upper bound, and that number is assuming zero performance is coming from the CPUs.

To add into the mix: Intel CEO Pat Gelsinger only a couple of weeks ago said that PVC was coming in at double the performance originally expected, allowing Aurora to be a 2EF machine. Does that mean the original performance target for PVC was ~20 TF of FP64? Apropos of nothing, AMD’s recent MI250X announcement last week showcased a dual-GPU chip with 47.9 TF of FP64 vector performance, moving to 95.7 TF in FP64 matrix performance. The end result here might be that AMD’s MI250X is actually higher raw performance than PVC, however AMD requires 560 W for that card, whereas Intel’s power numbers have not been disclosed. We could do some napkin math here as well.

Frontier uses 560 W MI250X cards, and is rated for 1.5 ExaFlops of FP64 Vector at 30 MW of power. This means Frontier needs 31300 cards (1.5 EF / 49.7 TF) to meet performance targets, and for each 560 W MI250X card, Frontier has allocated 958 Watts of power (30 MW / 31300 cards). This is a 71% overhead for each card (which means cooling, storage systems, other compute/management etc).
Aurora uses PVC at an unknown power, is rated for 2 ExaFlops of FP64 Vector at 60 MW of power. We know that PVC has 54000+ cards to meet performance targets, which means that the system has allocated 1053 W (that’s 60 MW / 54000) per card to include the PVC accelerator and other overheads required. If we were to assume (a big assumption I know) that Frontier and Aurora have similar overheads, then we’re looking at 615 W per PVC.
This would end up with PVC at 615 W for 37 TF, against MI250X at 560 W for 47.9 TF.
This raw discussion fails to discuss specific features each card has for its use case.

Compute GPU Accelerator Comparison Confirmed Numbers
AnandTech	Intel	AMD	NVIDIA
Product	Ponte Vecchio	MI250X	A100 80GB
Architecture	Xe-HPC	CDNA2	Ampere
Transistors	100 B	58.2 B	54.2 B
Tiles (inc HBM)	47	10	6 + 1 spare
Compute Units	128	2 x 110	108
Matrix Cores	128	2 x 440	432
INT8 Tensor	?	383 TOPs	624 TOPs
FP16 Matrix	?	383 TOPs	312 TOPs
FP64 Vector	?	47.9 TFLOPS	9.5 TFLOPS
FP64 Matrix	?	95.7 TFLOPs	19.5 TFLOPS
L2 / L3	2 x 204 MB	2 x 8 MB	40 MB
VRAM Capacity	128 GB (?)	128 GB	80 GB
VRAM Type	8 x HBM2e	8 x HBM2e	5 x HBM2e
VRAM Width	?	8192-bit	5120-bit
VRAM Bandwidth	?	3.2 TB/s	2.0 TB/s
Chip-to-Chip Total BW	8	8 x 100 GB/s	12 x 50 GB/s
CPU Coherency	Yes	With IF	With NVLink 3
Manufacturing	Intel 7 TSMC N7 TSMC N5	TSMC N6	TSMC N7
Form Factors	OAM	OAM (560 W)	SXM4 (400W*) PCIe (300W)
Release Date	2022	11/2021	11/2020
*Some Custom deployments go up to 600W

Intel also disclosed that it will be partnering with SiPearl to deploy PVC hardware in the European HPC efforts. SiPearl is currently building an Arm-based CPU called Rhea built on TSMC N7.

Moving forward, Intel also released a mini-roadmap. Nothing too surprising here - Intel has plans for designs beyond Ponte Vecchio, and that future Xeon Scalable processors will also have options enabled with HBM.

69 Comments

View All Comments

mode_13h - Monday, November 29, 2021 - link
To continue this omphaloskepsis just a bit longer, I'll offer that spend much less time and energy thinking about the nature of existence than the consequences of human existence.

We're all too familiar with all the horrible things humans have done, are doing, and will do to the planet. However, we should consider what happens to it without any sentient life. There's nothing to defend it from asteroids, for instance. And we know our sun will eventually die, certainly cooking our planet to a cinder long before it cools into a white dwarf. So, it seems to me that it's our responsibility to act not only as stewards of life on Earth, but also as its exporters to distant worlds.

The analogy that comes to mind is the Earth as a petri dish. If our life never escapes the Earth (and ultimately, our solar system), before the dish is incinerated, of what consequence will it have been? I see the ultimate mission of humanity as one of survival and spreading both ours and other Earthly life. In that, I guess it's not dissimilar to the way a parent's life takes on new meaning as the steward of their children. Partly fueled by hopes that their own unfulfilled dreams and aspirations might one day be realized by their offspring or descendants.
GeoffreyA - Wednesday, December 1, 2021 - link
If we're incinerated before leaving the solar system, the naive answer is that our life here would have been of no consequence. All our hopes and dreams, joys and pains, loves and hates would be lost, here on this "bank and shoal of time." The stricter answer is that everything in this universe is subject to entropy and an end. The time will come when no usable energy is left: the end of Earth amplified to universal scale. Yet I would argue, no, human existence was of consequence: our happiness, our sorrow, added weight to our existence, made it worthwhile, even if all records are lost.

Coming back to Earth's end, and our being stewards of life here, we will have to find another home, another star, in order to preserve human, animal, and plant life. There's a whole universe out there; we've just got to go "where no man has gone before" and find potential Earths. Interstellar travel is not possible at present, but I suspect there'll be answers once a quantum theory of gravity succeeds general relativity. As it is, GR opened possibilities: wormholes are one, which I doubt will ever be feasible; and more intriguingly, the Alcubierre drive, which may make Star Trek's warp drive a reality. The answer will come, I know it, and we'll be able to journey to the furthest reaches of space. Just an ant in a desert, but who knows what beautiful things we may find along the way? And what terrible? Some sort of teleportation can't be ruled out either: perhaps science will be able to exploit some loophole---or dare I say hidden API?---in spacetime's implementation.

If interstellar travel never becomes feasible, we may be doomed. The only solution I see is sending off the human code in some compact fashion, dispersing it through space. That way, even if we perish on Earth, the species may survive elsewhere. The irony there is that they may never know who we, their parents, were, unless we were to send some "document" with the code. It'll be a grievous blow to mankind, too, if our great works were lost for ever. Amazon S3 may be able to store a lot of data, but what's going to happen when Earth is baked in the oven? Perhaps if we packaged our masterpieces of art into some format and sent it off into space.
mode_13h - Wednesday, December 1, 2021 - link
I don't put much faith in faster-than-light travel. Not of matter, at least.

I think it's intriguing to consider that the UFO sightings (if legitimate), could actually be some sort of wormhole that's moving around. Perhaps all that's transiting it is information (i.e. in the form of electromagnetic radiation). If it had little or no mass, that could explain the sort of impossible acrobatics that observers of UFOs have reported.

> Amazon S3 may be able to store a lot of data,
> but what's going to happen when Earth is baked in the oven?

Heh, it'd take a lot less than that to usher in a near-complete cultural amnesia. If worldwide supply chains completely broke down to the point that core energy or material demands of storage & computer manufacturers could no longer be met, then datacenters would slowly grind to a halt and most of their information lost forever.

Let's say some supervolcano plunges the earth into a nuclear winter. There are something like a dozen such monstrosities lurking, and preventing such a calamity is not something you can do in a short amount of time (although this is a very interesting topic of its own). If semiconductor factories ground to a halt for long enough, we might lose the critical mass of technology and information needed to restart them. Worse yet, if you're rebooting society, energy is going to be a huge problem, because we've nearly exhausted most of the fossil fuel reserves that are accessible using low-tech means.
GeoffreyA - Thursday, December 2, 2021 - link
Traditionally, I never put much stock on FTL travel, and the laws of physics prohibit it in every way. If anything travelled faster, causality would fall apart and time travel would be possible. The Alcubierre drive seems to allow "effective" FTL travel: from the ship's frame of reference, no laws are broken. It's almost as if the car were stationary but the road got up and moved. Anyhow, the drive requires negative energy, which might out be of the question. As for wormholes, I used to be a big proponent, but considering the difficulty in creating and maintaining one, I've lost hope in that line of thought. But, if one already existed, like that in Interstellar, it would be intriguing indeed.

I never thought much about UFOs. Reasoning against that would be: why don't those wormholes appear in the middle of a city in broad daylight? And if wormholes, why aren't there distortions in their vicinity?

Exactly, a lot less than Earth's demise can cripple mankind. It goes to show how fragile all earthly things are. Interestingly, this question of information loss is a central problem in physics concerning black holes. Is information preserved or lost for ever? Can information be eradicated? Current thinking suggests that information is preserved through entangled correlations in Hawking radiation. And here's a humorous thought: what if the universe were a giant hard drive?
mode_13h - Wednesday, December 1, 2021 - link
> The only solution I see is sending off the human code in some compact fashion,
> dispersing it through space. That way, even if we perish on Earth,
> the species may survive elsewhere.

Space is incredibly noisy, and you'd be fighting the inverse-square law of signal attenuation. So, you'd need a very powerful transmitter, repeating for a very long time, in order for anyone around another star to even have a chance of receiving a complete and correct copy of the data. And then, what would they make of it?

Plus, as you point out, there's so much cultural information. And I don't mean just things like the arts and society, but also language and all the design information that's encoded into everything we touch and inhabit. And that's just humans. What about the billions of other organisms (if we're including micro organisms) that share our planet? Heck, if you managed to synthesize a human on an alien world, could it even survive without a proper gut microbiome? There's an astonishing multitude of different gut bacteria, as well.

To continue on my previous theme of a universe without faster-than-light travel, I'm intrigued by sci fi stories about humans on a sort of interstellar ark who've been journeying for so many generations that they've forgotten they're even on a space ship or a journey to another star.
GeoffreyA - Thursday, December 2, 2021 - link
I was actually thinking of sending the human code in a physical, resilient medium. Somehow, in the hopes that it would generate or synthesise once it landed on some distant world. But your idea of transmitting it as an electromagnetic wave is pretty intriguing.

Yes, there's too much information: language, organisms, etc. And that's another problem. We're intimately tied to the bacteria of Earth, like those in our gut. Perhaps if we made some changes to make those humans platform independent in the initial stages? Indeed, preserving Earth's information as a whole is a problem. A more moderate approach might work. Perhaps if we selectively sent off things of value ("the exports of Earth"). Language could be preserved through film, audio, and writing. (Going deeper, I wouldn't be surprised if all the information in the universe is preserved in some fashion, but we don't have access to the "specification" to enable a "byte-by-byte" copy.)

I suppose when all is said and done, the old-fashioned ark will do the trick. Two of each kind: generation is no problem. Stash select bacteria and micro-organisms on the ship, not to mention books, films, and media, and we'll preserve a good deal. The ark concept is certainly gripping. I think it's the solitude and silence of space. Can't help but picture Ripley frozen in that pod with the cat! Or the generic picture of a ship's computer beeping away and the humans fast asleep for decades.
mode_13h - Friday, December 3, 2021 - link
> Can't help but picture Ripley frozen in that pod with the cat!

I picture something more like a giant, hollow asteroid. You'd need a lot of mass as shielding against radiation and various other bits of material zinging about. Put some spin on it, as artificial gravity.
GeoffreyA - Thursday, December 2, 2021 - link
There's a French film I saw a while ago. "Oxygen," with Melanie Laurent. Worth a watch.
mode_13h - Tuesday, November 23, 2021 - link
> Science and religion are 100% incompatible.

I appeal for pragmatism. Feeling besieged will only galvanize the position of those who would go along with a more middle-ground approach. Having been raised in a religious tradition, I understand the comfort of ritual and the sense of community they feel. Threatening them will succeed only in having them close ranks and close their minds.

You can pursue ideological purity and logical consistency in your own life, but forcing the issue makes you not so different from some of those you oppose. I'm sure the extremists among them would relish your zealousness and weave it into their narrative of tyrannical atheists bent on persecuting the faithful.

I'd have hoped we'd learned a thing or two from the immeasurable death and suffering wrought over religious and political schisms.
GeoffreyA - Tuesday, November 23, 2021 - link
mode_13h, I respect Dawkins and believe he's sincere, but he tends to come off as one who's a bit too concerned about debunking religion and God. And yes, a pragmatic approach goes a long way. For my part, I don't like evolution, *or rather the methods,* but it's quite useful as both a tool and way of thinking about things.

Intel: Sapphire Rapids With 64 GB of HBM2e, Ponte Vecchio with 408 MB L2 Cache

Related Reading

Post Your Comment

69 Comments

View All Comments

mode_13h - Monday, November 29, 2021 - link

GeoffreyA - Wednesday, December 1, 2021 - link

mode_13h - Wednesday, December 1, 2021 - link

GeoffreyA - Thursday, December 2, 2021 - link

mode_13h - Wednesday, December 1, 2021 - link

GeoffreyA - Thursday, December 2, 2021 - link

mode_13h - Friday, December 3, 2021 - link

GeoffreyA - Thursday, December 2, 2021 - link

mode_13h - Tuesday, November 23, 2021 - link

GeoffreyA - Tuesday, November 23, 2021 - link

Log in

Don't have an account? Sign up now