A Hybrid/Heterogeneous Design

Developing a processor with two different types of core is not a new concept – there are billions of smartphones that have exactly that inside them, running Android or iOS, as well as IoT and embedded systems. We’ve also seen it on Windows, cropping up on Qualcomm’s Windows on Snapdragon mobile notebooks, as well as Intel’s previous Lakefield design. Lakefield was the first x86 hybrid design in that context, and Alder Lake is the more mass-market realization of that plan.

A processor with two different types of core disrupts the typical view of how we might assume a computer works. At the basic level, it has been taught that a modern machine is consistent – every CPU has the same performance, processes the same data at the same rate, has the same latency to memory, the same latency to each other, and everything is equal. This is a straightforward homogenous design that’s very easy to write software for.

Once we start considering that not every core has the same latency to memory, moving up to a situation where there are different aspects of a chip that do different things at different speeds and efficiencies, now we move into a heterogeneous design scenario. In this instance, it becomes more complex to understand what resources are available, and how to use them in the best light. Obviously, it makes sense to make it all transparent to the user.

With Intel’s Alder Lake, we have two types of cores: high performance/P-cores, built on the Golden Cove microarchitecture, and high efficiency/E-cores, built on the Gracemont microarchitecture. Each of these cores are designed for different optimization points – P-cores have a super-wide performance window and go for peak performance, while E-cores focus on saving power at half the frequency, or lower, where the P-core might be inefficient.

This means that if there is a background task waiting on data, or something that isn’t latency-sensitive, it can work on the E-cores in the background and save power. When a user needs speed and power, the system can load up the P-cores with work so it can finish the fastest. Alternatively, if a workload is more throughput sensitive than latency-sensitive, it can be split across both P-cores and E-cores for peak throughput.

For performance, Intel lists a single P-core as ~19% better than a core in Rocket Lake 11th Gen, while a single E-core can offer better performance than a Comet Lake 10th Gen core. Efficiency is similarly aimed to be competitive, with Intel saying a Core i9-12900K with all 16C/24T running at a fixed 65 W will equal its previous generation Core i9-11900K 8C/16T flagship at 250 W. A lot of that will be that having more cores at a lower frequency is more efficient than a few cores at peak frequency (as we see in GPUs), however an effective 4x performance per watt improvement requires deeper investigation in our review.

As a result, the P-cores and E-cores look very different. A deeper explanation can be found in our Alder Lake microarchitecture deep dive, but the E-cores end up being much smaller, such that four of them are roughly in the same area as a single P-core. This creates an interesting dynamic, as Intel highlighted back at its Architecture Day: A single P-core provides the best latency-sensitive performance, but a group of E-cores would beat a P-core in performance per watt, arguably at the same performance level.

However, one big question in all of this is how these workloads end up on the right cores in the first place? Enter Thread Director (more on the next page).

A Word on L1, L2, and L3 Cache

Users with an astute eye will notice that Intel’s diagrams relating to core counts and cache amounts are representations, and some of the numbers on a deeper inspection need some explanation.

For the cores, the processor design is physically split into 10 segments.

A segment contains either a P-core or a set of four E-cores, due to their relative size and functionality. Each P-core has 1.25 MiB of private L2 cache, which a group of four E-cores has 2 MiB of shared L2 cache.

This is backed by a large shared L3 cache, totaling 30 MiB. Intel’s diagram shows that there are 10 LLC segments which should mean 3.0 MiB each, right? However, moving from Core i9 to Core i7, we only lose one segment (one group of four E-cores), so how come 5.0 MiB is lost from the total L3? Looking at the processor tables makes less sense.

 

Please note that the following is conjecture; we're awaiting confirmation from Intel that this is indeed the case.

It’s because there are more than 10 LLC slices – there’s actually 12 of them, and they’re each 2.5 MiB.  It’s likely that either each group of E-cores has two slices each, or there are extra ring stops for more cache.

Each of the P-cores has a 2.5 MiB slice of L3 cache, with eight cores making 20 MiB of the total. This leaves 10 MiB between two groups of four E-cores, suggesting that either each group has 5.0 MiB of L3 cache split into two 2.5 MiB slices, or there are two extra LLC slices on Intel’s interconnect.

Alder Lake Cache
AnandTech Cores
P+E/T
L2
Cache
L3
Cache
IGP Base
W
Turbo
W
Price
$1ku
i9-12900K 8+8/24 8x1.25
2x2.00
30 770 125 241 $589
i9-12900KF 8+8/24 8x1.25
2x2.00
30 - 125 241 $564
i7-12700K 8+4/20 8x1.25
1x2.00
25 770 125 190 $409
i7-12700KF 8+4/20 8x1.25
1x2.00
25 - 125 190 $384
i5-12600K 6+4/20 6x1.25
1x2.00
20 770 125 150 $289
i5-12600KF 6+4/20 6.125
1x200
20 - 125 150 $264

This is important because moving from Core i9 to Core i7, we lose 4xE-cores, but also lose 5.0 MiB of L3 cache, making 25 MiB as listed in the table. Then from Core i7 to Core i5, two P-cores are lost, totaling another 5.0 MiB of L3 cache, going down to 20 MiB. So while Intel’s diagram shows 10 distinct core/LLC segments, there are actually 12. I suspect that if both sets of E-cores are disabled, so we end up with a processor with eight P-cores, 20 MiB of L3 cache will be shown.

 
Intel Announces 12th Gen Core Alder Lake Thread Director: Windows 11 Does It Best
Comments Locked

395 Comments

View All Comments

  • Gothmoth - Wednesday, October 27, 2021 - link

    241W TDP in intel speak means 280W under full load.

    you pay twice the money per year for energy..... and how do you cool this thing?
  • Wrs - Wednesday, October 27, 2021 - link

    Most of us don't sustain our processors at peak. Average consumption for fixed work and lightly threaded/idle power are substantial inputs to energy cost, which is probably not the biggest consideration for a desktop. If it were you'd get a laptop?

    On cooling, if you're getting a 241W turbo processor you are not aiming for a low-profile build. Any $70 tower cooler ought to handle 241W with ease, if the processor interface is strong. Intel's have historically been strong. AMD is usually behind there. For example the 5800X package can only dissipate around 160W on the best liquid/air coolers, as the power density is too high on the CCD and the solder is thick.
  • Spunjji - Thursday, October 28, 2021 - link

    More like $100 to handle 250W "with ease".

    "Intel's have historically been strong"
    Not between Ivy Bridge through to (some) Comet Lake.

    "For example the 5800X package can only dissipate around 160W on the best liquid/air coolers"
    You don't *need* to dissipate more than that, though? You barely get more performance than running it around 120W.

    Really loving this game of talking up Intel's strategy for their CPUs producing absurd amounts of heat. Like great, they deal with heat better, *because they have to*. Inefficient is as inefficient does.
  • Wrs - Thursday, October 28, 2021 - link

    No, $70 for 250W+. There's a somewhat hard limit for heat pipes. A $100 cooler is typically dual tower or equivalent with a limit in the 350-450W range (though admit I never set out to measure those specifically). That doesn't mean there aren't crappy designs, but I'm referring to any reputable maker. That also doesn't mean a cooler will cool to the limit on any processor. There are choke points with thermal interfaces and die size. Back in '08 I had this single tower, two-fan Thermalright 120 extreme that successfully sustained 383W on LGA1366 at under 100C... that's an above average $100 cooler. Might have gone higher with a bigger/thinner die but just illustrates the possibilities.

    The 5800x, on the other hand, cannot practically sustain over ~160W on that caliber of cooler. In fact a high-end heat sink (I've used a D15 and U12S) remains cool to the touch while the CCD is throttling at 90C, with some 135W over 88mm2, simply because the thermal interface down below is the choke. On the double CCD Zen 3's, the amperage at the socket seems to be the limit. Plenty of people know Zen3 doesn't have as much overclocking headroom, simply by the difference between stock 1C turbo and all-core OC frequency, forcing users to pick between snappy ST performance and sustained MT. Notice how both Rocket Lake and ADL don't have that issue, given sufficient cooling.

    Lastly, let's not confuse efficiency with rated power limits. The review sites will have to measure ADL efficiency empirically. From a theoretical view I don't spot a big difference between Intel 7 and TSMC 7nm, so to see the Intel 7 part rated for so much higher power than the N7 part (all the best Zen3's are 142W turbo) tells me that Intel's package/process accommodates much higher heat dissipation and by extension has more room to perform better whether stock vs. stock or OC vs. OC. And it's kind of expected based on the physical characterization of a 208mm2 monolithic die (per der8auer) with a reduced z-height and thin solder, as compared to Zen 3's typically thick package and thin IHS.
  • Oxford Guy - Friday, October 29, 2021 - link

    '$70 for 250W+'

    Noisy.

    Let's look at the cost per watt for a quiet installation.
  • Wrs - Friday, October 29, 2021 - link

    Maximum noise for a cooler is based on the fans and airflow path through the cooler, not the heat. The duty cycle - and thus noise - for typical PWM fans is regulated based on processor temperature, again not actually the heat. So if you want quiet, get a quiet fan, or get a processor with good enough efficiency and thermal dissipation or heat tolerance that it won't need the fans at 100%.

    Hope I didn't overcomplicate the explanation. When I put a $70 stock Noctua U12S on my 5800x and start a game, it gets somewhat noisy. That's because the ~100W being put out by the CPU isn't dissipating well to the heatsink, not because 100W is a challenge for a U12S.
  • Spunjji - Friday, October 29, 2021 - link

    That's more a function of how you have your fan curves configured. If the CPU isn't putting out enough heat to saturate the heatpipes, and the die temp is going to be high no matter how fast you run the fans because of thermal density, then you have room to reduce the fan curve.
  • Oxford Guy - Friday, October 29, 2021 - link

    Coolers that are undersized (and less expensive) make more noise.

    It’s similar to the problem with Radeon VII. The die was designed to be small for profit and the clock had to be too high to try to compensate.

    Quiet cooling costs more money in high-wattage parts. It’s not complicated, beyond the fact that some expensive AIOs are noisier than others.
  • mode_13h - Sunday, October 31, 2021 - link

    > Radeon VII. The die was designed to be small for profit and
    > the clock had to be too high to try to compensate.

    Radeon VII was not designed as a gaming GPU. It was small because it was the first GPU made at 7 nm, by a long shot. At that time, it was probably one of the biggest dies made on that node.
    The fact they could turn a profit by selling it at a mere $700 was bonus.

    And the MI50/MI60 that were its primary target don't even have a fan. They have a rigid power limit, as well. So, the idea that AMD just said "hey, let's make this small, clock it high, and just run the fans fast" is a bit off the mark.

    https://www.amd.com/en/products/professional-graph...
  • Spunjji - Friday, October 29, 2021 - link

    Maybe we live in different places. Where I am, a decent tower capable of cooling 250W comfortable - not maxed out - is the equivalent of $100 US.

    "The 5800x, on the other hand, cannot practically sustain over ~160W on that caliber of cooler"
    I don't know why anyone would bother, though. The difference between MT performance with PBO and overclocked MT performance is minimal. If you need more MT that badly and TDP isn't a problem, then Threadripper is a better option. If your use case doesn't cover the cost of Threadripper then it's unlikely you'll miss a few percent in performance and you'll probably benefit from not overspending on cooling just to get it. Rocket Lake doesn't compete well with Zen 3 in MT even when overclocked, so it's not a great argument for that chip. We'll have to see how it pans out with ADL, though it does look promising.

    "Lastly, let's not confuse efficiency with rated power limits"
    I'm not!

    "tells me that Intel's package/process accommodates much higher heat dissipation"
    Sure, but...

    "by extension has more room to perform better"
    ...this is absolutely not something you can deduce just from knowing that it can sustain higher power levels 😅

Log in

Don't have an account? Sign up now