Amazon Announces Graviton2 SoC Along With New AWS Instances: 64-Core Arm With Large Performance Upliftsby Andrei Frumusanu on December 3, 2019 12:30 PM EST
We only recently reported on the story that Amazon are designing a custom server SoC based on Arm’s Neoverse N1 CPU platforms, only for Amazon to now officially announce the new Graviton2 processor as well as AWS instances based on the new hardware.
The new Graviton2 SoC is a custom design by Amazon’s own in-house silicon design teams and is a successor to the first-generation Graviton chip. The new chip quadruples the core count from 16 cores to 64 cores and employs Arm’s newest Neoverse N1 cores. Amazon is using the highest performance configuration available, with 1MB L2 caches per core, with all 64 cores connected by a mesh fabric supporting 2TB/s aggregate bandwidth as well as integrating 32MB of L3 cache.
Amazon claims the new Graviton2 chip is can deliver up to 7x higher performance than the first generation based A1 instances in total across all cores, up to 2x the performance per core, and delivers memory access speed of up to 5x compared to its predecessor. The chip comes in at a massive 30B transistors on a 7nm manufacturing node - if Amazon is using similar high density libraries to mobile chips (they have no reason to use HPC libraries), then I estimate the chip to fall around 300-350mm² if I was forced to put out a figure.
The memory subsystem of the new chip is supported by 8 DDR4-3200 channels with support for hardware AES256 memory encryption. Peripherals of the system are supported by 64 PCIe4 lanes.
Powered by the new generation processor, Amazon also detailed its new 6th generation instances M6g, R6g and C6g, offering various configuration up to the full 64 cores of the chip and up to 512GB of RAM for the memory optimised instance variants. 25Gbps “enhanced networking” connectivity, as well as 18Gbps bandwidth to EBS (Elastic Block Storage).
Amazon is also making some very impressive benchmark comparisons against its fifth-generation instances, supporting Intel Xeon Platinum 8175 processor of up to 2.5GHz:
- All of these performance enhancements come together to give these new instances a significant performance benefit over the 5th generation (M5, C5, R5) of EC2 instances. Our initial benchmarks show the following per-vCPU performance improvements over the M5 instances:
- SPECjvm® 2008: +43% (estimated)
- SPEC CPU® 2017 integer: +44% (estimated)
- SPEC CPU 2017 floating point: +24% (estimated)
- HTTPS load balancing with Nginx: +24%
- Memcached: +43% performance, at lower latency
- X.264 video encoding: +26%
- EDA simulation with Cadence Xcellium: +54%
Amazon is making M6g instances with the new Graviton2 processor available for CPU for non-production workloads, with expected wider rollout in 2020.
The announcement is a big win for Amazon and especially for Arm’s endeavours in the server space as they try to surpass the value that the x86 incumbents are able to offer. Amazon describes that the new 6g instances are able to offer 40% higher performance/$ than the existing x86 5th generation platforms, which represents some drastic cost savings for the company and its customers.
- AWS Designing a 32-Core Arm Neoverse N1 CPU for Cloud Servers
- Arm Announces Neoverse N1 & E1 Platforms & CPUs: Enabling A Huge Jump In Infrastructure Performance
- GIGABYTE's Cavium ThunderX2 Systems: 1U R181-T90 and 2U R281-T91
- Assessing Cavium's ThunderX2: The Arm Server Dream Realized At Last
- GIGABYTE's ThunderXStation with Dual Cavium ThunderX2 Arm SoCs
- Investigating Cavium's ThunderX: The First Arm Server SoC With Ambition
- Marvell Completes Acquisition of Cavium, Gets CPU, Networking & Security Assets
- Amazon AWS Offers Another AMD EPYC-Powered Instance: T3a
- Amazon Offers More EPYC: M5ad & R5ad Instances
Post Your CommentPlease log in or sign up to comment.
View All Comments
mode_13h - Tuesday, December 3, 2019 - linkSure, Amazon could compare to what they *estimate* a 10 nm Ice Lake server chip could deliver, but that just adds more variable into the mix. There's value in comparing to a known quantity (i.e. a current instance, whether their own or a competitive one).
Anyway, such apples-to-apples comparisons will certainly be made, once both types of instances are actually available.
name99 - Tuesday, December 3, 2019 - linkIt's not meant to be "impressive", it's meant to be informative.
Some of us can put the number in context; for everyone else it's irrelevant.
Spunjji - Wednesday, December 4, 2019 - linkWho cares how many transistors it uses if a comparable Intel product *doesn't exist*?
phoenix_rizzen - Thursday, December 12, 2019 - linkXeon: 1.5 MB of L1, 24 MB of L2, and 33 MB of L3 cache.
Graviton: ? MB of L1, 64 MB of L2, 32 MB of L3 cache.
24 cores vs 64 cores.
6 memory controllers vs 8.
48 PCIe lanes vs 64.
And so on. There's other blocks in the CPU die as well that aren't compared here (media, AVX, etc) It's not that hard to figure out why one has more transistors than another.
bryanlarsen - Tuesday, December 3, 2019 - linkA vCPU is half a core on Intel but a full core on Neoverse. So 40% faster per vCPU is actually 30% slower per core.
Wilco1 - Tuesday, December 3, 2019 - link44% faster per vCPU means 95% per core since Hyperthreading gives about 30% on average.
SarahKerrigan - Tuesday, December 3, 2019 - linkIn terms of throughput, the correct comparison point for one Neoverse vCPU is two Purley vCPUs, because AFAIK a vCPU is added per hard context on EC2. Based on that, a Purley core is still considerably higher throughput than an N1 core.
I suspect single-thread is close, or at least would be at base clocks; their mature turbo implementation continues to be a strong point for Intel. I also expect the Graviton2 chip's perf/W to be far better than the Xeon it is compared to.
Wilco1 - Tuesday, December 3, 2019 - linkSure, but the Neoverse cores are much smaller so you get 2.5 times the cores. If you're interested in throughput you need to compare total throughput per chip rather than per core. According to the SPECINT score a 24-core Skylake-SP gets only half the throughput of one Graviton2 chip, so you need 2 of them.
The Platinum 8175 AWS m5 instances have a 3.1GHz all-core turbo (https://en.wikichip.org/wiki/intel/xeon_platinum/8... so getting ~95% of single-threaded performance of the Skylake at its max turbo is pretty impressive!
Antony Newman - Tuesday, December 3, 2019 - linkUltimate multicore performance for single SoC x86 is being limited by dark silicon on Intel 14nm.
For a 64 Core Intel monster - they need their (Intel) 7nm process - or a multi SoC solution.
When TSMC’s 5nm ovens are ready is ready - Amazon will be able to ARMs next Cores that will close the per Core performance gap - but allow considerably more cores before bottlenecking occurs,
A 128 Core Arm Poseidon SoC on TSMC 5nm could very well eclipse a 64 Core Intel CPU bakes on Intel 7nm - but cost Amazon a fraction of the cost.
mdriftmeyer - Wednesday, December 4, 2019 - linkWhen TSMC's 5nm is ready AMD's future Zen cores will curb stomp anything ARM can offer, like they already do.
Language is a funny thing, ``New Generation of ARM-based instances powered by AWS Graviton2 processors offer 40% better price/performance than current x86-based instances.''
A. That's 40% over previous Graviton processor nodes. BFD.
B. Our upcoming x86-based instances drastically knee cap our current x86-based instances in price/performance but we won't say that as we're trying to sell our own schtick here.