Power Consumption, Temperature

Two other arguments for having SMT enabled or disabled comes down to power consumption and temperature.

With SMT enabled, the core utilization is expected to be higher, with more instructions flowing through and being processed per cycle. This naturally increases the power requirements on the core, but might also reduce the frequency of the core. The trade-off is meant to be that the work going through the core should be more than enough to make up for extra power used, or any lower frequency. The lower frequency should enable a more efficient throughput, assuming the voltage is adjusted accordingly.

This is perhaps where AMD and Intel differ slightly. Intel’s turbo frequency range is hard-bound to specific frequency values based on core loading, regardless of how many threads are active or how many threads per core are active. The activity is a little more opportunistic when we reach steady state power, although exactly how far down the line that is will depend on what the motherboard has set the power length to. AMD’s frequency is continually opportunistic from the moment load is applied: it obviously scales down as more cores are loaded, but it will balance up and down based on core load at all times. On the side of thermals, this will depend on the heat density being generated in each core, but this also acts as a feedback loop into the turbo algorithm if the power limit has not been reached.

For our analysis here, we’ve picked two benchmarks. Agisoft, which is a variable threaded test performs practically the same with SMT On/Off, and 3DPMavx, a pure MT test which gets the biggest gain from SMT.

Agisoft

Photoscan from Agisoft is a 2D image to 3D model creator, using dozens of high-quality 2D images to generate related point maps to form a 3D model, before finally texturing the model using the images provided. It is used in archiving artefacts, as well as converting 2D sculpture into 3D scenes. Our test analyses a standardized set of 85 x 18 megapixel photos, with a result measured in time to complete.

Simply looking at CPU temperatures while running our real-world Agisoft test, our current setup (MSI X570 Godlike with Noctua NH12S) shows that both CPUs will flutter around 74ºC sustained. Perhaps the interesting element is at the beginning of the test, where the CPU temperatures are higher in SMT Off mode. Looking into the data, and during SMT Off, the processor is at 4300 MHz, compared to 4150 MHz when SMT is enabled. This would account for the difference.

Looking at power, we can follow that for the bulk of the test, both processors have similar package power consumption, around 130 W. The SMT Off is drawing more power during the first couple of minutes of the test, due to the higher frequency. Clearly the thermal density in this part of the test by only having one thread per core is allowing for a higher turbo.

If we measure the total power of the test, it’s basically identical in any metric that matters. Nearer the end of the test, where the workload is more variably threaded, this is where the SMT Off mode seems to come under power. This benchmark completion time is essentially the same due to the nature of the test, but SMT Off comes in at 2% lower power overall.

3DPMavx (3D Particle Movement)

Our 3DPM test is an algorithmic sequence of non-interactive random three-dimensional movement, designed to simulate molecular diffusive movement inside a gas or a fluid. The simulation is made non-interactive (i.e. no two molecules will collide) due to the original average movement of each particle taking collisions into account. Our test cycles through six movement algorithms at ten seconds apiece, followed by ten seconds of idle, with the whole loop being repeated six times, taking about 20 minutes, regardless of how fast or slow the processor is. The related performance figure is millions of particle movements per second. Each algorithm has been accelerated for AVX2.

On the temperature side of things, it is clear that the SMT Off mode again puts up a higher thermal profile. Temperatures this time peak at 66ºC, but it is clear the difference between the two modes.

On the power side, we can see why SMT Off mode is warmer – the cores are drawing more power. Looking at the data, SMT Off mode is running ~4350 MHz, compared to SMT On which is running closer to 4000 MHz.

With the higher frequency with SMT Off, the estimated total power consumption is 6.8% higher. This appears to be very constant throughout the benchmark, which lasts about 20 minutes total.

But, let us add in the performance numbers. Because 3DPMavx can take advantage of SMT On, that mode scores +77.5% by having two threads per core rather than one (a score of 10245 vs 5773). Combined this makes SMT On mode +91% better in performance per watt on this benchmark.

Gaming Performance (Discrete GPU) Conclusions: SMT On
Comments Locked

126 Comments

View All Comments

  • MrSpadge - Friday, December 4, 2020 - link

    That doesn't mean SMT is mainly responsible for that. The x86 decoders are a lot more complex. And at the top end you get diminishing performance returns for additional die area.
  • Wilco1 - Friday, December 4, 2020 - link

    I didn't say all the difference comes from SMT, but it can't be the x86 decoders either. A Zen 2 without L2 is ~2.9 times the size of a Neoverse N1 core in 7nm. That's a huge factor. So 2 N1 cores are smaller and significantly faster than 1 SMT2 Zen 2 core. Not exactly an advertisement for SMT, is it?
  • Dolda2000 - Friday, December 4, 2020 - link

    >Graviton 2 gives 75-80% of the performance of the fastest Rome at less than a third of the area
    To be honest, it wouldn't surprise me one bit if 90% of the area gives 10% of the performance. Wringing out that extra 1% single-threaded performance here or there is the name of the game nowadays.

    Also there are many other differences that probably cost a fair bit of silicon, like wider vector units (NEON is still 128-bit, and exceedingly few ARM cores implement SVE yet).
  • Wilco1 - Saturday, December 5, 2020 - link

    It's nowhere near as bad with Arm designs showing large gains every year. Next generation Neoverse has 40+% higher IPC.

    Yes Arm designs opt for smaller SIMD units. It's all about getting the most efficient use out of transistors. Having huge 512-bit SIMD units add a lot of area, power and complexity with little performance gain in typical code. That's why 512-bit SVE is used in HPC and nowhere else.

    So with Arm you get many different designs that target specific markets. That's more efficient than one big complex design that needs to address every market and isn't optimal in any.
  • whatthe123 - Saturday, December 5, 2020 - link

    The extra 20% of performance is difficult to achieve. You can already see it on zen CPUs, where 16 core designs are dramatically more efficient per core in multithread running at around 3.4ghz, vs 8 core designs running at 4.8ghz. I've always hated these comparisons with ARM for this reason... you need a part with 1:1 watt parity to make a fair comparison, otherwise 80% performance at half the power can also be accomplished even on x86 by just reducing frequency and upping core count.
  • Wilco1 - Saturday, December 5, 2020 - link

    Graviton clocks low to conserve power, and still gets close to Rome. You can easily clock it higher - Ampere Altra does clock the same N1 core 32% higher. So that 20-25% gap is already gone. We also know about the next generation (Neoverse N2 and V1) which have 40+% higher IPC.

    Yes adding more cores and clocking a bit lower is more efficient. But that's only feasible when your core is small! Altra Max has 128 cores on a single die, and I don't think we'll see AMD getting anywhere near that in the next few years even with chiplets.
  • peevee - Monday, December 7, 2020 - link

    It is obviously a lot LESS than 5%. Nothing that matters in terms of transistors (caches and vector units) increases. Even doubling of registers would add a few hundreds/thousands of transistors on a chip with tens of billions of transistors, less than 0.000001%.

    They can double all scalar units and it still would be below 1% increase.
  • Kangal - Friday, December 4, 2020 - link

    I agree.
    Adding SMT/HT requires something like a +10% increase in the Silicon Budget, and a +5% increase in power draw but increases performance by +30%, speaking in general. So it's worth the trade-off for daily tasks, and those on a budget.

    What I was curious to see, is if you disabled SMT on the 5950X, which has lots of cores. Leaving each thread with slightly more resources. And use the extra thermals to overclock the processor. How would that affect games?

    My hunch?
    Thread-happy games like Ashes of Singularity would perform worse, since it is optimised and can take advantage of the SMT. Unoptimized games like Fallout 76 should see an increase in performance. Whereas actually optimised games like Metro Exodus they should be roughly equal between OC versus SMT.
  • Dolda2000 - Friday, December 4, 2020 - link

    >What I was curious to see, is if you disabled SMT on the 5950X, which has lots of cores.
    That is exactly what he did in this article, though.
  • Kangal - Saturday, December 5, 2020 - link

    I guess you didn't understand my point.
    Think of a modern game which is well optimised, is both GPU intensive and CPU intensive. Such as Far Cry V or Metro Exodus. These games scale well anywhere from 4-physical-core to 8-physical-cores.

    So using the 5950X with its 16-physical-cores, you really don't need extra threads. In fact, it's possible to see a performance uplift without SMT, dropping it from 32-shared-threads down to 16-full-threads, as each core gets better utilisation. Now add to that some overclocking (+0.2GHz ?) due to the extra thermal headroom, and you may legitimately get more performance from these titles. Though I suspect they wouldn't see any substantial increases or decreases in frame rates.

    In horribly optimised games, like Fallout 76, Mafia 3, or even AC Odyssey, anything could happen (though probably they would see some increases). Whereas we already know that in games that aren't GPU intensive, but CPU intensive (eg practically all RTS games), these were designed to scale up much much better. So even with the full-cores and overclock, we know these games will actually show a decrease in performance from losing those extra threads/SMT.

Log in

Don't have an account? Sign up now