Testing the Cortex-X2: A New Android Flagship Core

Improving on the Cortex-X1 by switching to the Arm v9 architecture and increasing the core resources, both Arm and Qualcomm are keen to promote that the Cortex-X2 offers better performance and responsiveness than previous CPU cores. The small frequency bump from 2.85 GHz to 3.00 GHz will add some of that performance, however the question is always if the new manufacturing process coupled with the frequency increase allows for better power efficiency when running these workloads. Our standard analysis tool here is SPEC2017.

Running through some of these numbers, there are healthy gains to the core, and almost everything has a performance lift.

On the integer side (from 500.perlbench to 557.xr), there are good gains for gcc (+17%), mcf (+13%), xalancbmk (+13%), and leela (+14%), leading to an overall +8% improvement. Most of these integer tests involve cache movement and throughput, and usually gains in sub-tests like gcc can help a wide range of regular user workloads.

Looking at power and energy for the integer benchmarks, we’re seeing the X2 consume more instantaneous power on almost all the tests, but the efficiency is kicking in. That overall 8% performance gain is taking 5% less total energy, but on average requires 2% more peak power.

If we put this core up against all the other performance cores we test, we see that 8% jump in performance for 5% less energy used, and the X2 stands well above the X1 cores of the previous generation, especially those in non-Snapdragon processors. There is still a fundamental step needed to reach the Apple cores, even the previous-generation A14 performance core, which scores 34% higher for the same energy consumed (albeit on average another 34% peak power).

Just on these numbers, Qualcomm’s +20% performance or +30% efficiency doesn’t bare fruit, but the floating point numbers are significantly different.

Several benchmarks in 2017fp are substantially higher on the X2 this generation. +17% on namd for example would point to execution performance increases, but +28% in parest, +41% in lbm and +20% in blender showcases a mix of execution performance and memory performance. Overall we’re seeing +19% performance, which is nearer Qualcomm’s 20% mark. Note that this comes with an almost identical amount of energy consumed relative to the X1 core in the S888, with a difference of just 0.2%.

The major difference however is the average power consumed. For example, our biggest single test gain in 519.lbm is +41%, but where the S888 averages 4.49 watts, the new X2 core averages 7.62 watts.  That’s a 70% increase in instantaneous power consumer, and realistically no single core in a modern smartphone should draw that much power. The reason why the power goes this high is because lbm leverages the memory subsystem, especially that 6 MiB L3 cache and relies on the 4 MiB system level cache, all of which consumes power. Overall in the lbm test, the +41% performance costs +20% energy, so efficiency is still +16% in this test. Some of the other tests, such as parest and blender, also follow this pattern.

Comparing against the competition, the X2 core does make a better generation jump when it comes to floating point performance. It will be interesting to see how other processors enable the X2 core, especially MTK’s flagship at slightly higher frequency, on TSMC N4, but also if it has access to a full 14 MiB combination of caches as we suspect, that could bring the power draw during single core use a lot higher. It will be difficult to tease out exactly who wins what where based on implementation vs. process node, but it will be a fun comparison to make when we look purely at the X2 vs. X2 cores.

Unfortunately due to how long SPEC takes to run (1h30 on the X2), we were unable to test on the A710/A510. We’ll have to wait to see when we get a retail unit.

The Snapdragon 8 Gen 1 Machine Learning: MLPerf and AI Benchmark 4
Comments Locked

169 Comments

View All Comments

  • nucc1 - Thursday, December 16, 2021 - link

    I have a desktop and laptop, I don't need a phone that can do desktop duties.
  • michael2k - Tuesday, December 14, 2021 - link

    You do realize that's exactly what Apple does with it's CPUs right? Use them for desktop/laptop parts?
  • eastcoast_pete - Thursday, December 16, 2021 - link

    Actually, what Apple is doing is both annoying (for iPhone owners) and logical (from Apple's bottom line POV). This and the prior generation iPhone certainly have the hardware oomph to drive a desktop setup akin to Dex, but that would, of course, mean fewer ipad pro and iMac mini sales. The ability to run a desktop-type setup on an iPhone used to be minimal due to the lower RAM older generations used to have, but that has changed. Being able to run a desktop environment on a $ 1,500 iPhone would really add value.
  • Raqia - Tuesday, December 14, 2021 - link

    That said, the lagging performance of Apple's CPU+GPU in AI benchmarks proves most sites overstate the usefulness of CPUs in phones use cases when headlining with CPU specific performance metrics. Yes it's not an apples to oranges comparison, but it's proof that you should care about more than CPU benchmarks (particularly the consumer oriented Geekbench suite) even for Apple products when making comparisons between mobile phones.

    CPU performance for notebook form factors will matter a lot more, but on phones CPU bottlenecked use cases are typically web browser / apps using Javascript and app compilation, and even for most of those cases your bottleneck will be connectivity rather than local processing. Heavy lifting is much more often done by ISP and various DSPs that are harder to benchmark.

    As Andrei stated in his introduction to the S8G1:

    "Qualcomm gave examples such as concurrent processing optimizations that are meant to give large boosts in performance to real-world workloads that might not directly show up in benchmarks."

    This seems to be borne out by a reviewer of an anonymous device here:

    https://youtu.be/IpQRiM5F370?t=1002

    despite some seeming inefficiencies for the other IP blocks when individually pinned by a benchmark. It also seems like SPEC17 is showing better efficiency whilst Geekbench is showing worse which indicates that Geekbench may need to optimize better for this year's ARMv9 implementations. Still a modest improvement for CPU this year though when all's considered.
  • name99 - Tuesday, December 14, 2021 - link

    "That said, the lagging performance of Apple's CPU+GPU in AI benchmarks proves most sites overstate the usefulness of CPUs in phones use cases when headlining with CPU specific performance metrics. "

    Uh, no!
    It proves that a dedicated NPU does better than a CPU for these tasks.
    The point is that the Android tests go through Android APIs; the Apple tests are probably raw C that goes on the CPU (perhaps the GPU, but that's unlikely in the absence of using special APIs).
    Your complaint is as silly as comparing 3D SW running on a GPU vs emulated 3D running on the CPU.

    But if you prefer to compare browser benchmarks, go right ahead:
    https://www.imore.com/iphone-12-takes-1-spot-ipad-...

    A much better complaint is that I'm guessing all these tests were not compiled with SVE2 -- which could have substantial effects.
    But of course that requires the dev tools and OS to catch up, which means we have to wait for the official release.
  • Raqia - Tuesday, December 14, 2021 - link

    And other companies have decided to dedicate more of their die area to NPUs and other processing blocks than the CPU. This is usually neglected in cursory reviews of the SoC and the pinned-to-the-CPU benches are overemphasized by some reviewers with PC gaming hardware review pedigrees fixated on what's easy to benchmark and what they know rather than what's impactful to actual phone use cases.

    All that said the CPU on the iPhones is a gap step ahead of the competition for now, but Apple has consciously used more die area for this and emphasize this in their marketing. Note that Apple could market just the performance of phones and devices themselves but they unusually (for a consumer electronics oriented company) market the SoC separately in a slide in presentations and product specs. They de-emphasize the modem they use however in favor of stating what seem like phone level performance metrics. This is notable given that this is the same company with the marketing chops to morphed "Made in China" into "Designed in Cupertino. Assembled in China." (Glances at own iPhone. Hmm really...)
  • ChrisGX - Thursday, December 16, 2021 - link

    >> Qualcomm gave examples such as concurrent processing optimizations that are meant to give large boosts in performance to real-world workloads that might not directly show up in benchmarks.

    This seems to be borne out by a reviewer of an anonymous device here:

    https://youtu.be/IpQRiM5F370?t=1002 <<

    That video review can't be used to make a case for hidden virtues of the Snapdragon 8 Gen 1 that for some reason have failed to show up in benchmark results. The reviewer castigated Qualcomm for producing a poor chip with many serious shortcomings. His comments about the X2 core suggest that he saw it as little better than a joke - a power hog that barely improves on earlier generation performance cores.

    The reviewer did acknowledge that the new GPU was fast but he underscored that the performance gain came at a high cost in terms of power consumption. On those occasions that the SD8 Gen 1 showed any substantial performance advantage over the Apple A15 in tests conducted by the reviewer- some games - the advantage had disappeared after 10 minutes as progressive throttling took its toll.

    The reviewer did look at modem performance (I'm not sure whether I understood the full context of the test) and once again the conclusion is the modem is fast and power hungry.

    I don't think the reviewer conducted any AI tests, which I suspect would have been the place that the SD8 Gen 1 excelled.
  • Meteor2 - Friday, December 17, 2021 - link

    What's mind-boggling is the performance using the ARM isa that Apple has achieved. Taken an equally mind-boggling amount of money to do it, more than anyone else can afford.
  • defaultluser - Tuesday, December 14, 2021 - link

    Decent little preview, but if I had to pick a "best core test," given the two hour limit, I would have chosen the little cores!
  • Ian Cutress - Tuesday, December 14, 2021 - link

    I tried running SPEC on the little cores. After 30 mins we were less than 10% complete.

Log in

Don't have an account? Sign up now