Gaming Performance (Discrete GPU)

For our gaming tests, we are using our AMD Ryzen 9 5950X paired with an NVIDIA RTX 2080 Ti graphics card. Our standard test suite consists of 12 titles, tested at four configurations:

  • Stage 1: Actual Gaming (1080p Maximum Quality, or equivalent)
  • Stage 2: All About Pixels (‘4K Minimum’ Quality)
  • Stage 3: Medium Low (‘1440p Minimum’)
  • Stage 4: Lowest Lows (720p Minimum or lower)

The final three settings are a set of CPU-limited gaming, and help find the limit of where we move from CPU limited to GPU limited. Some users baulk at this testing finding it irrelevant, however these configurations have been widely requested over the years. The contraire to this testing is the first setting, at 1080p Maximum: this being requested given that 1080p is the most popular gaming resolution, and Maximum Quality because this graphics card should be able to handle almost everything at that resolution at very playable framerates.

All the details for our gaming tests can be found in our #CPUOverload article.

Stage 1: Actual Gaming
AMD Ryzen 9 5950X, SMT On vs SMT Off
AnandTech Settings Average
FPS
95th
Percentile
Chernobylite 1080p Max 100% -
Civilization 6 1080p Max 103% -
Deus Ex: MD 1080p Max 99% 100%
Final Fantasy 14 1080p Max 102% -
Final Fantasy 15 8K Standard 100% 99%
World of Tanks 1080p Max 100% 102%
World of Tanks 4K Max 103% 102%
Borderlands 3 1080p Max 101% 103%
F1 2019 1080p Ultra 103% 106%
Far Cry 5 1080p Ultra 104% 104%
GTA V 1080p Max 99% 100%
RDR 2 1080p Max 100% 100%
Strange Brigate 1080p Ultra 101% 101%

In real-world gaming situations, there’s very little to pick between having SMT enabled or disabled. Almost universally it is either beneficial or a smidgen better to have it enabled, with F1 2019, Civilization 6, and Far Cry 5 seemingly the best recipients. I’ve also added in the Stage 3 result from World of Tanks, just because that benchmark doesn’t really have a proper settings menu.

Stage 2: All About Pixels
AMD Ryzen 9 5950X, SMT On vs SMT Off
AnandTech Settings Average
FPS
95th
Percentile
Chernobylite 4K Low 99% -
Civilization 6 4K Min 105% -
Deus Ex: MD 4K Min 98% 100%
Final Fantasy 14 4K Min 102% -
Final Fantasy 15 4K Standard 100% 100%
Borderlands 3 4K Very Low 101% 104%
F1 2019 4K Ultra Low 100% 100%
Far Cry 5 4K Low 101% 100%
GTA V 4K Low 100% 101%
RDR 2 8K Min 100% 100%
Strange Brigate 4K Low 100% 100%

With our high resolution settings with minimal quality, there is only one outlier in Civilization 6 on the average frame rates, which seem to be a bit higher when SMT is enabled.

Stage 3: Medium Low
AMD Ryzen 9 5950X, SMT On vs SMT Off
AnandTech Settings Average
FPS
95th
Percentile
Chernobylite 1440p Low 100% -
Civilization 6 1440p Min 105% -
Deus Ex: MD 1440p Min 97% 96%
Final Fantasy 14 1440p Min 102% -
Final Fantasy 15 1080p Standard 101% 105%
World of Tanks 1080p Standard 101% 101%
Borderlands 3 1440p Very Low 103% 105%
F1 2019 1440p Ultra Low 99% 99%
Far Cry 5 1440p Low 99% 99%
GTA V 1440p Low 100% 99%
RDR 2 1440p Low 100% 100%
Strange Brigate 1440p Low 100% 100%

At the more medium settings, we’re starting to see some more variation (Borderlands gets a few percent from SMT). We’re starting to see Deus Ex:MD drop off a bit with SMT enabled.

Stage 4: Lowest Lows
AMD Ryzen 9 5950X, SMT On vs SMT Off
AnandTech Settings Average
FPS
95th
Percentile
Chernobylite 360p Low 106% -
Civilization 6 480p Min 102% -
Deus Ex: MD 600p Min 91% 91%
Final Fantasy 14 768p Min 102% -
Final Fantasy 15 720p Standard 99% 102%
World of Tanks 768p Min 101% 100%
Borderlands 3 360p Very Low 108% 110%
F1 2019 768p Ultra Low 102% 105%
Far Cry 5 720p Low 100% 101%
GTA V 720p Low 99% 98%
RDR 2 384p Low 100% 103%
Strange Brigate 720p Low 95% 95%

This is perhaps our most varied set of results, with Deus Ex:MD showing an almost 10% drop with SMT enabled. DEMD is usually considered a CPU title, but so is Chernobylite, which sees a 6% gain. Borderlands is +8-10% with SMT enabled, which is more of a modern game. However, I doubt anyone is playing at these resolutions.

Overall Gaming Performance

If we take full averages from all the data points, then we’re seeing a rough +1% gain in performance in the more complex scenarios across the board.

Resolution Average Comparison
AMD Ryzen 9 5950X, SMT On vs SMT Off
AnandTech Setting aka Average
FPS
95th
Percentile
Stage 1 1080p Max Actual Gaming 101% 101%
Stage 2 4K+ Min All About Pixels 101% 101%
Stage 3 1440p Min Medium Lows 101% 101%
Stage 4 < 768p Min Lowest Lows 100% 101%

In reality, any loss or gain is highly dependent on the title in question, and can swing from one side of the line to the other. It’s clear that Deus Ex prefers SMT off, and F1 2019 or Borderlands prefers SMT on, but we are talking fine margins here.

CPU Performance Power Consumption, Temperature
Comments Locked

126 Comments

View All Comments

  • GeoffreyA - Tuesday, December 8, 2020 - link

    There's a single set of 4 decoders. In SMT mode, I believe some sharing is going in. This is from the original Zen design:

    https://images.anandtech.com/doci/10591/HC28.AMD.M...
  • GeoffreyA - Tuesday, December 8, 2020 - link

    * going on
  • naive dev - Wednesday, December 9, 2020 - link

    Right, I found that article as well and from that slide it looks like the decoder would be shared. But then that slide was from 2017, so that might have changed.

    It looks though as if the decoder could decode those 4 instructions from a single program counter only, right? It's not like the decoder could decode e.g. 2 instructions from program counter 1 and another 2 instructions from program counter 2?
  • GeoffreyA - Thursday, December 10, 2020 - link

    I'm not too sure how the implementation works, but I expect they're shuffling both threads through the decoder at roughly the same time. The decoder has four units (I think 1 complex and 3 simple). As far as I'm aware, that has stayed the same in both Zen 2 and 3.
  • mapesdhs - Thursday, December 10, 2020 - link

    Ian, a question about Handbrake, though it may not apply to the type of test you used. I've read that Handbrake doing an h264 encode can only use 16 threads max. Does this mean that in theory one could run two separate h264 encodes on a 5950X and thus obtain a good overall throughput speedup? Have you tried such a thing? Or might this only work if it were possible to force one encode to only use the 16 threads of one 8c block (CCX?), and the other encode to use the rest? ie. so that the separate encodes are not fighting over the same cores or indeed the same CCX-shared L3? Is it possible to force this somehow? Also, if the claimed 16 thread limit for h264 is true, is there a performance difference for a single h264 encode between SMT on vs. off just in general? ie. with it on, is the OS smart enough to ensure that the 16 threads are spread across all the cores evenly rather than being scrunched onto fewer cores because reasons? If not, then turning SMT off might speed it up. Note that I'm using Windows for all this.

    I don't know if any of this applies to h265, but atm the encoding I do is still 1080p. I did an analysis of all available Ryzen CPUs based on performance, power consumption and cost (I ruled out Intel partly due to the latter two factors but also because of a poor platform upgrade path) and found that although the 5900X scored well, overall it was beaten by the 2700X, mainly because the latter is so much cheaper. However, the 5950X would look a lot better if one could run two encodes on it at the same time without clashing, but review articles naturally never try this. I wish I could test it, but the only 16c system I have is a dual-socket S2011 setup with two 2560 v2s, so the separate CPUs introduce all sorts of other issues (NUMA and suchlike).

    I found something similar a long time ago when I noticed one could run six separate Maya frame renders on a 24-CPU SGI rack Onyx (essentially one render per CPU board), compared to running a single render on a quad-CPU (single board) deskside Onyx, giving a good overall throughput increase (the renderer being limited to 4 CPUs per job). See:

    http://www.sgidepot.co.uk/perfcomp_RENDER4_maya1.h...

    Funny actually, re what you say about an overly good speedup perhaps implying a less than optimal core design. Something odd about SGIs is how many times on a multi CPU system one can btain better results by using more threads than there are CPUs, baring in mind MIPS CPUs from that era did not have SMT, ie. the CPUs kinda behave as if they do have SMT even though they don't. I found this behaviour occured most for Blender and C-Ray.

    So anyway, it would be great if it were possible to run two h264 encodes on a 5950X at the same time, but there's probably no point if the OS doesn't spread out the loads in a sensible manner, or if in that circumstance there isn't a way to force each encode to use a separate CCX.

    All very specific to my use case of course, but I have hundreds of hours of material to convert, so the ability to get twice the throughput from a 5950X would make that CPU a lot more interesting; so far reviews I've read show it to be about 2x faster than the 2700X for h264 Handbrake (just one encode of course), but it costs 4.4x more, rather ruining the price/performance angle. And if it does work then I guess one could ask the same question of TR - could one run eight separate h264 encodes on a future Zen3 TR without the thread management being a total mess? :D I'm assuming it probably wouldn't be so good with the older Zen2 design given the split L3.
  • GeoffreyA - Sunday, December 13, 2020 - link

    Interesting question. Would be nice if someone could give this a test on 16-core Ryzen or TR, and see what happens. Yesterday, I was able to take both FFmpeg and Handbrake up to 128 threads, and it does work; but, having only a 4-core, 4-thread CPU, can't comment.*

    As for x264's performance limit, I'm not sure at what number of threads it begins to flag; but, quality wise, using too many (say, over 16 at 1080p) is not advisable. According to the x264 developers, vertical resolution / threads shouldn't fall below 40-50 and certainly not below 30.

    https://forum.doom9.org/showthread.php?p=1213185#p...

    forum.doom9.org/showthread.php?p=1646307#post1646307

    More posts on high core counts:

    forum.doom9.org/showthread.php?t=173277

    forum.doom9.org/showthread.php?t=175766

    * As far as I know, Windows schedules threads all right. From 1903, on Zen 2, one CCX is supposed to be filled up, then another. I imagine 16 threads will be spread across two CCXs in the 5950X. FFmpeg's --threads switch could prove useful too.
  • GeoffreyA - Sunday, December 13, 2020 - link

    -threads, not --threads

    Here are links set out better (thought they'd link in the comment):

    https://forum.doom9.org/showthread.php?p=1213185#p...

    https://forum.doom9.org/showthread.php?p=1646307#p...

    https://forum.doom9.org/showthread.php?t=173277

    https://forum.doom9.org/showthread.php?t=175766
  • karthikpal - Friday, December 11, 2020 - link

    Nice content bro
    <a href="https://www.tronicsmaster.com">Ryzen 7 5800X</a>
  • deil - Sunday, December 13, 2020 - link

    I wonder when smt4 will hit the market a model with 3 copies of most things on the die, in a ring configuration fp/int/fp/int, cache inside a ring st would have a chance to use 2 FP modules for single int processor part (when others don't use it ofc).
    This kind of setup would have very interesting performance numbers at least. I am not saying it's a good idea, but interesting one for sure.
  • Machinus - Sunday, December 13, 2020 - link

    This article omits one of the basic considerations in any manually-configured and custom-cooled desktop system: achieving uniform, preditcable thermal behavior. Unless you are building servers to perform only one or two specific types of mathematical operations, and can build, configure, and stress test on those instruction types alone, you need high confidence that the chip will never exceed the thermal flux densities of the cooling system you built. Fixed-clock systems with a static number of available cores have much more consistent thermal performance than chips whose clocks, and number of threads, are free-floating. This reduces your peak flops, but it significantly extends system lifetime. HEDT and HPC systems have double or triple-digit coure counts per sockrt in 2020; SMT is not worth paying the price of reduced hardware lifetime unless you are building extremely specialized calculation servers.

Log in

Don't have an account? Sign up now