If you examine the CPU industry and ask where the big money is, you have to look at the server and datacenter market. Ever since the Opteron days, AMD's market share has been rounded to zero percent, and with its first generation of EPYC processors using its new Zen microarchitecture, that number skipped up a small handful of points, but everyone has been waiting with bated breath for the second swing at the ball. AMD's Rome platform solves the concerns that first gen Naples had, plus this CPU family is designed to do many things: a new CPU microarchitecture on 7nm, offer up to 64 cores, offer 128 lanes of PCIe 4.0, offer 8 memory channels, and offer a unified memory architecture based on chiplets. Today marks the launch of Rome, and we have some of our own data to share on its performance.

Review edited by Dr. Ian Cutress

First Boot

Sixty-four cores. Each core with an improved Zen 2 core, offering ~15% better IPC performance than Naples (as tested in our consumer CPU review), and doubled AVX2/FP performance. The chip has a total of 256 MB of L3 cache, and 128 PCIe 4.0 lanes. AMD's second generation EPYC, in this case the EPYC 7742, is a behemoth.

Boot to BIOS, check the node information.

[Note: That 1500 mV reading in the screenshot is the same reading we see on consumer Ryzen platforms; it seems to be the non-DVFS voltage as listed in the firmware, but isn't actually observed]

It is clear that the raw specifications of our new Rome CPU is some of the most impressive on the market. The question then goes to whether or not this is the the new fastest server chip on the market - a claim that AMD is putting all its weight behind. If this is the new fastest CPU on the market, the question then becomes 'by how much?', and 'how much does it cost?'.

I have been covering server CPUs since the launch of the Opteron in 2003, but this is nothing like I have seen before: a competitive core and twice as much of them on a chip than what the competition (Intel, Cavium, even IBM) can offer. To quote AMD's SVP of its Enterprise division, Forrest Norrod

"We designed this part to compete with Ice Lake, expecting to make some headway on single threaded performance. We did not expect to be facing re-warmed Skylake instead. This is going to be one of the highlights of our careers"

Self-confidence is at all times high at AMD, and on paper it would appear to be warranted. The new Rome server CPUs have improved core IPC, a doubling of the core count at the high end, and it is using a new manufacturing process (7 nm) technology in one swoop. Typically we see a server company do one of those things at a time, not all three. It is indeed a big risk to take, and the potential to be exciting if everything falls into place. 

To put this into perspective: promising up to 2x FP performance, 2x cores, and a new process technology would have sounded so odd a few years ago. At the tail end of the Opteron days, just 4-5 years ago, Intel's best CPUs were up to three times faster. At the time, there was little to no reason whatsoever to buy a server with AMD Opterons. Two years ago, EPYC got AMD back into the server market, but although the performance per dollar ratio was a lot better than Intel's, it was not a complete victory. Not only was AMD was still trailing in database performance and AVX/FP performance, but partners and OEMs were also reluctant to partner with the company without a proven product.

So now that AMD has proven its worth with Naples, and AMD promising more than double the deployed designs of Rome with a very quick ramp to customers, we have to compare the old to the new. For the launch of the new hardware, AMD provided us with a dual EPYC 7742 system from Quanta, featuring two 64-core CPUs.

Zen 2 and Rome: SMILE For Performance
POST A COMMENT

184 Comments

View All Comments

  • MarcusTaz - Wednesday, August 7, 2019 - link

    Another site's article that starts with an F stated that Rome runs hot and uses 1.4 volts, above TMSC recommended 1.3 volt. Did you need to run 1.4 volts for these tests? Reply
  • evernessince - Wednesday, August 7, 2019 - link

    Well 1st, that 1.3v figure is from TSMC's mobile focused 7nm LPP node. Zen 2 is made on the high performance 7nm node, not the mobile focused LPP. Whatever publication you read didn't do their homework. TSMC has not published information on their high performance node and I think it rather arrogant to give AMD an F based on an assumption. As if AMD engineers are stupid enough to put dangerous voltages through their CPUs that would result in a company sinking lawsuit. It makes zero sense.

    FYI all AMD 3000 series processors go up to 1.4v stock. Given that these are server processors, they will run hot. After all, more cores = more heat. It's the exact same situation for Intel server processors. The only difference here is that AMD is providing 50 - 100% more performance in the same or less power consumption at 40% less cost.
    Reply
  • DigitalFreak - Thursday, August 8, 2019 - link

    You reading Fudzilla? Reply
  • Kevin G - Wednesday, August 7, 2019 - link

    AMD is back. They have the performance crown again and have decided to lap the competition with what can be described as an embarrassing price/performance comparison to Intel. The only thing they need to do is be able to meet demand.

    One thing I wish they would have done is added quad socket support. Due to the topology necessary, intersocket bandwidth would be a concern at higher core counts but if you just need lots of memory, those low end 8 core chips would have been fine (think memcache or bulk NVMe storage).

    With the topology improvements, I also would have liked AMD to try something creative: a quad chip + low clocked/low voltage Vega 20 in the same package all linked together via Infinity Fabric. That would be something stunning for HPC compute. I do see AMD releasing some GPU in a server socket at some point for this market as things have been aligning in this direction for sometime.

    Supporting something like CCIX or OpenCAPI also would have been nice. A nod toward my previous point, even enabling Infinity Fabric to Vega 20 compute cards instead of PCIe 4.0 would have been yet another big step for AMD as that'd permit full coherency between the two chips without additional overhead.

    I think it would be foolish to ignore AVX-512 for Zen 3, even if the hardware they run it one continues to use 256 bit wide SIMD units. ISA parity is important even if they don't inherently show much of a performance gain (though considering the clock speed drops seen in Sky Lake-SP, if AMD could support AVX-512 at the clocks they're able to sustain at AVX2 on Zen 2, they might pull off an overall throughput win).

    With regards to Intel, they have Cooper Lake due later this year. If Intel was wise, they'd use that as a means to realign their pricing structure and ditch the memory capacity premium. Everything else Intel can do in the short term is flex their strong packaging techniques and push integrated accelerators: on package fabric, FPGA, Optane DIMMs etc. Intel can occupy several lucrative niches in important, growing fields with that they have in-house right now but they need to get them to market and at competitive prices. Otherwise it is AMD's game for the next 12 to 15 months until Ice Lake-SP arrives to bring back the competitive landscape. It isn't even certain that Intel can score a clean win either as Zen 3 based chips may start to arrive in the same time frame.
    Reply
  • bobdvb - Thursday, August 8, 2019 - link

    I think a four compute node, 2U, dual processor Epyc Rome combined with Mellanox ConnextX-6 VPI, should be quite frisky for HPC. Reply
  • JohanAnandtech - Sunday, August 11, 2019 - link

    "One thing I wish they would have done is added quad socket support. "
    Really? That is extremely small niche market with very demanding customers. Why would you expect AMD to put so much effort in an essentially dead end market?
    Reply
  • KingE - Wednesday, August 7, 2019 - link

    > While standalone compression and decompression are not real world benchmarks (at least as far as servers go), servers have to perform these tasks as part of a larger role (e.g. database compression, website optimization).

    Containerized apps are usually delivered via large, compressed filesystem layers. For latency sensitive-applications, e.g. scale-from-zero serverless, single- and lightly-threaded decompression performance is a larger-than-expected consideration.
    Reply
  • RSAUser - Thursday, August 8, 2019 - link

    Usually the decompression overhead is minimal there. Reply
  • KingE - Thursday, August 8, 2019 - link

    Sure, if you can amortize it over the life of a container, or can benefit from cached pulls. Otherwise, as is fairly common in an event-based 'serverless' architecture, it's a significant contributor to long-tail latency. Reply
  • Thud2 - Wednesday, August 7, 2019 - link

    Will socket-to-socket IF link bandwidth management allow for better dual GPU performance? Reply

Log in

Don't have an account? Sign up now