CES 2023: AMD Instinct MI300 Data Center APU Silicon In Hand - 146B Transistors, Shipping H2’23by Ryan Smith on January 5, 2023 7:00 AM EST
Alongside AMD’s widely expected client product announcements this evening for desktop CPUs, mobile CPUs, and mobile GPUs, AMD’s CEO Dr. Lisa Su also had a surprise up her sleeve for the large crowd gathered for her prime CES keynote: a sneak peak at MI300, AMD’s next-generation data center APU that is currently under development. With silicon literally in hand, the quick teaser laid out the basic specifications of the part, along with reiterating AMD’s intentions of taking leadership in the HPC market.
First unveiled by AMD during their 2022 Financial Analyst Day back in June of 2022, MI300 is AMD’s first shot at building a true data center/HPC-class APU, combining the best of AMD’s CPU and GPU technologies. As was laid out at the time, MI300 would be a disaggregated design, using multiple chiplets built on TSMC’s 5nm process, and using 3D die stacking to place them over a base die, all of which in turn will be paired with on-package HBM memory to maximize AMD’s available memory bandwidth.
AMD for its part is no stranger to combining the abilities of its CPUs and GPUs – one only needs to look at their laptop CPUs/APUs – but to date they’ve never done so on a large scale. AMD’s current best-in-class HPC hardware is to combine the discrete AMD Instinct MI250X (a GPU-only product) with AMD’s EPYC CPUs, which is exactly what’s been done for the Frontier supercomputer and other HPC projects. MI300, in turn, is the next step in the process, bringing the two processor types together on to a single package, and not just wiring them up in an MCM fashion, but going the full chiplet route with TSV stacked dies to enable extremely high bandwidth connections between the various parts.
The key point of tonight’s reveal was to show off the MI300 silicon, which has reached initial production and is now in AMD’s labs for bring-up. AMD had previously promised a 2023 launch for the MI300, and having the silicon back from the fabs and assembled is a strong sign that AMD is on track to make that delivery date.
Up Close With MI300 (Image Courtesy Tom's Hardware)
Along with a chance to see the titanic chip in person (or at least, over a video stream), the brief teaser from Dr. Su also offered a few new tantalizing details about the hardware. At 146 billion transistors, MI300 is the biggest and most complex chip AMD has ever built – and easily so. Though we can only compare it to current chip designs, this is significantly more transistors than either Intel’s 100B transistor Xeon Max GPU (Ponte Vecchio), or NVIDIA’s 80B transistor GH100 GPU. Though in fairness to both, AMD is stuffing both a GPU and a CPU into this part.
The CPU side of the MI300 has been confirmed to use 24 of AMD’s Zen 4 CPU cores, finally giving us a basic idea of what to expect with regards to CPU throughput. Meanwhile the GPU side is (still) using an undisclosed number of CDNA 3 architecture CUs. All of this, in turn, is paired with 128GB of HBM3 memory.
According to AMD, MI300 is comprised of 9 5nm chiplets, sitting on top of 4 6nm chiplets. The 5nm chiplets are undoubtedly the compute logic chipets – i.e. the CPU and GPU chiplets – though a precise breakdown of what’s what is not available. A reasonable guess at this point would be 3 CPU chiplets (8 Zen 4 cores each) paired with possibly 6 GPU chiplets; though there's still some cache chiplets unaccounted for. Meanwhile, taking AMD’s “on top of” statement literally, the 6nm chiplets would then be the base dies all of this sits on top of. Based on AMD’s renders, it looks like there’s 8 HBM3 memory stacks in play, which implies around 5TB/second of memory bandwidth, if not more.
With regards to performance expectations, AMD isn’t saying anything new at this time. Previous claims were for a >5x improvement in AI performance-per-watt versus the MI250X, and an overall >8x improvement in AI training performance, and this is still what AMD is claiming as of CES.
The key advantage of AMD’s design, besides the operational simplicity of putting CPU cores and GPU cores on the same design, is that it will allow both processor types to share a high-speed, low-latency unified memory space. This would make it fast and easy to pass data between the CPU and GPU cores, letting each handle the aspects of computing that they do best. As well, it would significantly simplify HPC programming at a socket level by giving both processor types direct access to the same memory pool – not just a unified virtual memory space with copies to hide the physical differences, but a truly shared and physically unified memory space.
When it launches in the later half of 2023, AMD’s MI300 is expected to be going up against a few competing products. The most notable of which is likely NVIDIA’s Grace Hopper superchip, which combines an NVIDIA Armv9 Grace CPU with a Hopper GPU. NVIDIA has not gone for quite the same level of integration as AMD is, which arguably makes MI300 a more ambitious project, though NVIDIA’s decision to maintain a split memory pool is not without merit (e.g. capacity). Meanwhile, AMD’[s schedule would have them coming in well ahead of arch rival Intel’s Falcon Shores XPU, which isn’t due until 2024.
Expect to hear a great deal more from AMD about Instinct MI300 in the coming months, as the company will be eager to show off their most ambitious processor to date.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Khanan - Thursday, January 5, 2023 - linkThis is the same company that was nearly bankrupt a few years ago, how the turntables have danced.
ballsystemlord - Thursday, January 5, 2023 - link@Ryan Even if you cannot review this one, please post in the comment section if you find someone else who has done a review.
PS: Of course I know this is not for gaming.
Bodennew - Saturday, January 7, 2023 - linkWow perhaps this will give legitimacy to APUs in the PC market
nandnandnand - Saturday, January 7, 2023 - linkPhoenix is still at 12 CUs. Intel has to play catch-up to whatever Strix is going to be, and doesn't even put its best iGPUs on desktop. Things are not trending towards mega APUs in the PC market, if that's what you mean by legitimacy.
If there's anything that shows the legitimacy of APUs, it's the current generation of consoles. Hacking the Xbox Series X or PS5 into a general purpose x86 PC would instantly legitimize PC APUs.
quaz0r - Saturday, January 7, 2023 - linkBaking it all onto the same slice of silicon is the future for sure
nandnandnand - Saturday, January 7, 2023 - linkAnd in a 3D structure as tall as they can make it.
Threska - Sunday, January 8, 2023 - linkIt's more about surrounding the gate.
sindhur - Monday, January 30, 2023 - linkAn established worldwide standard for occupational health and safety management systems is OHSAS 18001.May assist your business in identifying and managing its health and safety hazards by putting into effect its best-practice procedures.OHSAS 18001 Certification in indonesia Organisations all over the world understand the need to manage and enhance the performance of health and safety, and they accomplish this via occupational health and safety management systems.Prior to 1999,there were many national standards and independent certification programmes to choose from.The market became fragmented and muddled as a result, which damaged the legitimacy of particular programmes.In order to fill this gap, a worldwide partnership known as the Occupational Health and Safety assessment series project group was established.Representatives of academic institutions, organisations that provide certification,organisations that provide occupational safety and health services,and secretarial institutions made up the Group.The OHSAS Project group released the OHSAS Series in 1999,drawing on the best existing standards and frameworks.There were two specifications in the Series 18001 and 18002,which each established standards for an OHS management system.