PCIe 4.0

As the first commerical x86 server CPU supporting PCIe 4.0, the I/O capabilities of second generation EPYC servers are top of the class. One PCIe 4.0 x16 offers up to 32 GB/s in both direction, so each socket offers up to 256 GB/s in both directions, for a full 128 PCIe 4.0 lanes per CPU. 

Each CPU has 8 x16 PCIe 4.0  links available which can be split up among up to 8 devices per PCIe root, as shown above. There is also full PCIe peer-to-peer support both within a single socket and across sockets.

With the previous generation, in order to enable a dual socket configuration, 64 PCIe lanes from each CPU were used to link them together. For EPYC, AMD still allows for 64 PCIe lanes to be used, but these are PCIe 4.0 lanes now. There is also another feature that AMD has here - socket-to-socket IF link bandwidth management - which allows OEM partners to design dual-socket systems with less socket-to-socket bandwidth and more PCIe lanes if needed. 

We also learned that there are in fact 129 PCIe 4.0 lanes on each CPU. On each CPU there is one extra PCIe lane for the BMC (the chip that controls the server). Considering we are living in the age of AI acceleration, the EPYC 7002 servers will be great as hosts for quite a few GPUs or TPUs. Density has never looked so fun.

Zen 2 and Rome: SMILE For Performance The BIG LIST of Rome CPUs: Core Counts and Frequencies
Comments Locked

180 Comments

View All Comments

  • bobdvb - Thursday, August 8, 2019 - link

    I think a four compute node, 2U, dual processor Epyc Rome combined with Mellanox ConnextX-6 VPI, should be quite frisky for HPC.
  • JohanAnandtech - Sunday, August 11, 2019 - link

    "One thing I wish they would have done is added quad socket support. "
    Really? That is extremely small niche market with very demanding customers. Why would you expect AMD to put so much effort in an essentially dead end market?
  • KingE - Wednesday, August 7, 2019 - link

    > While standalone compression and decompression are not real world benchmarks (at least as far as servers go), servers have to perform these tasks as part of a larger role (e.g. database compression, website optimization).

    Containerized apps are usually delivered via large, compressed filesystem layers. For latency sensitive-applications, e.g. scale-from-zero serverless, single- and lightly-threaded decompression performance is a larger-than-expected consideration.
  • RSAUser - Thursday, August 8, 2019 - link

    Usually the decompression overhead is minimal there.
  • KingE - Thursday, August 8, 2019 - link

    Sure, if you can amortize it over the life of a container, or can benefit from cached pulls. Otherwise, as is fairly common in an event-based 'serverless' architecture, it's a significant contributor to long-tail latency.
  • Thud2 - Wednesday, August 7, 2019 - link

    Will socket-to-socket IF link bandwidth management allow for better dual GPU performance?
  • wabash9000 - Thursday, August 8, 2019 - link

    "The city may be built on seven hills, but Rome's 8x8-core chiplet design is a truly cultural phenomenon of the semiconductor industry."
    The city of Rome was actually built on 8 hills, even their celebration of the 7 hills had 8 listed. Something got confused and it was actually 8 hills. Search "QI: Series O Overseas" on youtube
  • Ian Cutress - Thursday, August 8, 2019 - link

    That episode is consequently where my onowdge about the 7 Hills / 8 Hills comes from.
  • abufrejoval - Sunday, August 11, 2019 - link

    sic transit gloria mundi... cum youtube non scolae discimus...

    I learned in Latin class, first of four foreign languages I learned in school (but I know that doesn't impress anyone from Belgium with three domestic ones :-)
  • ZolaIII - Thursday, August 8, 2019 - link

    Seams that EPYC 7702P will be a absolute workstation killer deal. Hopefully AMD won't screw up with motherboard's this time around.

Log in

Don't have an account? Sign up now