Conclusionary Remarks: Arm v9 for Android

When we move through significant revisions of Arm’s architecture, up to v8 and now v9, it’s important to note that the new features defined in the ISA do not always fundamentally improve performance – it’s up to the microarchitecture teams to build the cores to the ISA specifications, and the implementation teams to enable the core in silicon with frequency and power efficiency. Accomplishing that requires a good process node, design technology co-optimization, and then partners that can execute by building the best devices for that processor.

Qualcomm’s target with the Snapdragon 8 Gen 1 is very clearly the 2022 Android Flagship smartphones. New cores, new graphics, enhanced machine learning capabilities, a step function in camera processing power, an integrated X65 modem, all built on Samsung’s 4nm process node technology. The flagship Android space is an area in which Qualcomm has been comfortable for a number of years, however the increased thermals of last generation’s Snapdragon S888 gave a number of analysts in the space a bit of a squeaky bum moment.

It’s hard to tell immediately in our small test if that still remains the case. Samsung’s 4nm node has improvements beyond the previous generation 5nm design, however Qualcomm’s presentational numbers were above and beyond those that Samsung provided, perhaps indicating that additional improvements both in architecture and implementation have led to those performance numbers.

Our testing shows +19% floating point performance on the X2 core, which is almost the +20% that Qualcomm quotes, but only +8% in integer, which is often the most quoted. We’re seeing power efficiency improvements for sure on the X2 core, with an overall efficiency improvement of 17%, but peak power has also increased, in part because some of our tests make use of the additional cache in the system. Our machine learning tests are +75% over the previous generation, although not the 4x numbers that Qualcomm states – we need to do more work here on power efficiency testing however. On the gaming side, our 'first run' numbers showcase some explosive gains in GPU throughput.

Although we’ve only done a few tests here, I would be remiss if I didn’t mention the elephant in the room: MediaTek. In the last month MediaTek announced a return to the high-end with a flagship processor of its own, using the same 1+3+4 configuration with slightly higher frequencies, more cache, and built on TSMC’s N4 process. Implementation here will be the key metric I feel, so how MediaTek has been able to optimize for TSMC N4 vs Qualcomm on Samsung 4nm is going to be analyzed. I should point out here that a processor is more than just the CPU cores, as we’ll see Adreno vs Mali on graphics, the different machine learning approaches, but also how the two companies approach 5G and connectivity, which has been one of Qualcomm’s most prominent strengths to date.

We look forward to testing the Qualcomm S8g1 in more detail in the New Year, as well as how many of the main smartphone OEMs choose Qualcomm for their flagship devices.

System-Wide Testing and Gaming
Comments Locked

169 Comments

View All Comments

  • nucc1 - Thursday, December 16, 2021 - link

    I have a desktop and laptop, I don't need a phone that can do desktop duties.
  • michael2k - Tuesday, December 14, 2021 - link

    You do realize that's exactly what Apple does with it's CPUs right? Use them for desktop/laptop parts?
  • eastcoast_pete - Thursday, December 16, 2021 - link

    Actually, what Apple is doing is both annoying (for iPhone owners) and logical (from Apple's bottom line POV). This and the prior generation iPhone certainly have the hardware oomph to drive a desktop setup akin to Dex, but that would, of course, mean fewer ipad pro and iMac mini sales. The ability to run a desktop-type setup on an iPhone used to be minimal due to the lower RAM older generations used to have, but that has changed. Being able to run a desktop environment on a $ 1,500 iPhone would really add value.
  • Raqia - Tuesday, December 14, 2021 - link

    That said, the lagging performance of Apple's CPU+GPU in AI benchmarks proves most sites overstate the usefulness of CPUs in phones use cases when headlining with CPU specific performance metrics. Yes it's not an apples to oranges comparison, but it's proof that you should care about more than CPU benchmarks (particularly the consumer oriented Geekbench suite) even for Apple products when making comparisons between mobile phones.

    CPU performance for notebook form factors will matter a lot more, but on phones CPU bottlenecked use cases are typically web browser / apps using Javascript and app compilation, and even for most of those cases your bottleneck will be connectivity rather than local processing. Heavy lifting is much more often done by ISP and various DSPs that are harder to benchmark.

    As Andrei stated in his introduction to the S8G1:

    "Qualcomm gave examples such as concurrent processing optimizations that are meant to give large boosts in performance to real-world workloads that might not directly show up in benchmarks."

    This seems to be borne out by a reviewer of an anonymous device here:

    https://youtu.be/IpQRiM5F370?t=1002

    despite some seeming inefficiencies for the other IP blocks when individually pinned by a benchmark. It also seems like SPEC17 is showing better efficiency whilst Geekbench is showing worse which indicates that Geekbench may need to optimize better for this year's ARMv9 implementations. Still a modest improvement for CPU this year though when all's considered.
  • name99 - Tuesday, December 14, 2021 - link

    "That said, the lagging performance of Apple's CPU+GPU in AI benchmarks proves most sites overstate the usefulness of CPUs in phones use cases when headlining with CPU specific performance metrics. "

    Uh, no!
    It proves that a dedicated NPU does better than a CPU for these tasks.
    The point is that the Android tests go through Android APIs; the Apple tests are probably raw C that goes on the CPU (perhaps the GPU, but that's unlikely in the absence of using special APIs).
    Your complaint is as silly as comparing 3D SW running on a GPU vs emulated 3D running on the CPU.

    But if you prefer to compare browser benchmarks, go right ahead:
    https://www.imore.com/iphone-12-takes-1-spot-ipad-...

    A much better complaint is that I'm guessing all these tests were not compiled with SVE2 -- which could have substantial effects.
    But of course that requires the dev tools and OS to catch up, which means we have to wait for the official release.
  • Raqia - Tuesday, December 14, 2021 - link

    And other companies have decided to dedicate more of their die area to NPUs and other processing blocks than the CPU. This is usually neglected in cursory reviews of the SoC and the pinned-to-the-CPU benches are overemphasized by some reviewers with PC gaming hardware review pedigrees fixated on what's easy to benchmark and what they know rather than what's impactful to actual phone use cases.

    All that said the CPU on the iPhones is a gap step ahead of the competition for now, but Apple has consciously used more die area for this and emphasize this in their marketing. Note that Apple could market just the performance of phones and devices themselves but they unusually (for a consumer electronics oriented company) market the SoC separately in a slide in presentations and product specs. They de-emphasize the modem they use however in favor of stating what seem like phone level performance metrics. This is notable given that this is the same company with the marketing chops to morphed "Made in China" into "Designed in Cupertino. Assembled in China." (Glances at own iPhone. Hmm really...)
  • ChrisGX - Thursday, December 16, 2021 - link

    >> Qualcomm gave examples such as concurrent processing optimizations that are meant to give large boosts in performance to real-world workloads that might not directly show up in benchmarks.

    This seems to be borne out by a reviewer of an anonymous device here:

    https://youtu.be/IpQRiM5F370?t=1002 <<

    That video review can't be used to make a case for hidden virtues of the Snapdragon 8 Gen 1 that for some reason have failed to show up in benchmark results. The reviewer castigated Qualcomm for producing a poor chip with many serious shortcomings. His comments about the X2 core suggest that he saw it as little better than a joke - a power hog that barely improves on earlier generation performance cores.

    The reviewer did acknowledge that the new GPU was fast but he underscored that the performance gain came at a high cost in terms of power consumption. On those occasions that the SD8 Gen 1 showed any substantial performance advantage over the Apple A15 in tests conducted by the reviewer- some games - the advantage had disappeared after 10 minutes as progressive throttling took its toll.

    The reviewer did look at modem performance (I'm not sure whether I understood the full context of the test) and once again the conclusion is the modem is fast and power hungry.

    I don't think the reviewer conducted any AI tests, which I suspect would have been the place that the SD8 Gen 1 excelled.
  • Meteor2 - Friday, December 17, 2021 - link

    What's mind-boggling is the performance using the ARM isa that Apple has achieved. Taken an equally mind-boggling amount of money to do it, more than anyone else can afford.
  • defaultluser - Tuesday, December 14, 2021 - link

    Decent little preview, but if I had to pick a "best core test," given the two hour limit, I would have chosen the little cores!
  • Ian Cutress - Tuesday, December 14, 2021 - link

    I tried running SPEC on the little cores. After 30 mins we were less than 10% complete.

Log in

Don't have an account? Sign up now