Conclusion & First Impressions

The Qualcomm Snapdragon 8 Gen 1 is an interesting part, as it represents a fresh start for the series both in a marketing sense, and in a lesser technical sense as well. As a successor to the Snapdragon 888, the new chip  completely revamps the CPU setup to new Armv9 architectures while also bringing a very large GPU improvement, massive new camera features, and a host of other new features.

Qualcomm’s decision to streamline the naming is in my opinion not that necessary. But after the transition from the Snapdragon 865 to the 888, things had arguably already kind of jumped the shark last year, so it’s not completely unexpected. What I really don't like is Qualcomm taking a note out of Apple’s PR strategies and really diminishing the amount of technical detail disclosed, dropping even things such as the IP block generational numbering on the part of the GPU, NPU/DSP or ISP. This kind of opaqueness works for a lifestyle product company, but isn’t a great marketing strategy or look for a technology company that is supposed to pride itself on the tech it develops. Whatever the marketing aspect and shift from Qualcomm, what does matter for most of our readers is the technical side of things.

Technically, the Snapdragon 8 Gen 1 is a larger upgrade in a lot of aspects. While Qualcomm isn’t quite as aggressive as what we saw from recent competitor announcements, the chip boasts a very strong showing on the part of the CPU configuration, featuring a new Cortex-X2 core at up to 3GHz, new Cortex-A710’s middle cores at 2.5GHz, and as well as the new A510 little cores. The performance metrics, at least on the part of the X2, look to be extremely solid, and while power efficiency is still something we’ll have to investigate in more detail in the next few weeks, is also seemingly in line, or better, than the expectations.

The new Adreno GPU really didn’t get the attention it deserved, in my opinion, as things are quite more complex than just what the presentations showcased. While we still don’t expect Qualcomm to be able to catch up with Apple or be as efficient as the upcoming MediaTek part due to lingering concerns on whether the Samsung 4nm process node is able to close the gap with the TSMC competition, the new architecture changes are significant, and we should see major improvements in performance and efficiency compared to the Snapdragon 888.

Finally, the biggest changes this generation were presented on the part of the camera and ISP system. Smartphone cameras over the last few years have seen tremendous progress in terms of capability and image quality, and rather than slowing down (in contrast to other aspects of a SoC), here it seems technology progress is still full steam ahead or even accelerating. The Snapdragon 8 Gen 1 ISP now features fixed function blocks for a lot of the typical “computational photography” techniques we’ve seen pioneered from the last few years, and I think this will enable for far greater camera implementations for many more vendors in 2022 flagship devices. So, while the rest of the SoC can be seen as a % gain in performance or efficiency, the new camera features are expected to really bring new innovation and experiences.

Overall, the Snapdragon 8 Gen 1 looks to be a very solid successor to the Snapdragon 888. And that’s what’s most important for Qualcomm: executing on developing and delivering a chip that the vast majority of vendors can rely on to implement into their devices. While the competition is diversifying and stepping up their game, it’s also going to be extremely hard to match or even surpass Qualcomm’s execution the market, and the 8 Gen 1 is unlikely to disappoint.

Massive ISP Upgrades, AI Uplifts
Comments Locked

219 Comments

View All Comments

  • brucethemoose - Wednesday, December 1, 2021 - link

    As far as laptops go, I think Qualcomm is betting the farm on those upcoming Nuvia cores.
  • Unashamed_unoriginal_username_x86 - Tuesday, November 30, 2021 - link

    QC pointing out that the ISP can now do all photo/video AI processing on its own seemed strange in the context of your previous statement about Apple using their GPU effectively in computational photography. I'm guessing it allows for better power gating/efficiency?
  • mode_13h - Wednesday, December 1, 2021 - link

    Or, maybe it's just them trying to spin a weakness into a strength.

    I had a similar thought. If they effectively fused all of their GPU compute + AI + signal processing, they might deliver more performance on each, while lowering SoC prices (or at least improving their margin) due to smaller area.

    In truth, not that much separates DSP and GPU cores. For AI, Nvidia showed how you can bolt on some tensor multipliers and still feed them from the same GPU register file.
  • michael2k - Wednesday, December 1, 2021 - link

    I assume there is better efficiency in having dedicated hardware blocks from the ISP in the pipeline rather than GPGPU blocks in the pipeline.

    There may be ISP dedicated RAM/cache. Apple has a 32MB system cache that I imagine is used by the GPU for image processing. Qualcomm only has a 4MB system cache, so it would make sense if the ISP has dedicated memory.

    If that were the case then it also makes sense that shuttling data from and to the 4MB system cache for the GPU to use back to the ISP cache for the ISP to use would be computationally and power-wise expensive. Apple would avoid that kind of inefficiency because they would allow the ISP and GPU to both access the same 32MB system cache.

    If the ISP already has access to the 4MB system cache then I don't see any reason to avoid using the GPU, unless the Adreno GPU is poorly suited for GPGPU. It might also just be that Qualcomm is licensing hardware blocks that don't integrate as well since they don't design them the way Apple claims to, and Apple can have multiple cooks in the kitchen as it were between the ML blocks, the NE blocks, the ISP block, and the GPU blocks all working on the same set of memory during photo and video workflows.
  • name99 - Thursday, December 2, 2021 - link

    Isn't this a rerun of GOU fixed function blocks (which were killed by programmable shaders)?

    Sure, if you can be ABSOLUTELY CERTAIN than you know everything your camera will want to do, you can freeze than in the ISP. But doing the work on a more programmable block (some combination of GPU and NPU) leaves you able to retrofit a new idea in two years that you haven't even thought of today.

    Ultimately it probably boils down to split responsibility.
    Apple has the camera SW and HW teams working together.
    QC has the problem (for better or worse) that it has no idea what Google will be doing with cameras in two years, and no strong incentive to ensure that the chip they sell today matches Google's requirements in two years.

    A more interesting aspect is that for Apple's scheme (multiple IP blocks all working together) you need a NoC that can tag requests by QoS (for appropriate prioritization) and by stream (to aggregate locality). Apple certainly does this. The academic literature has plenty of discussion as to how this should be done, but I am unaware of the extent to which anyone in industry apart from Apple does this. Is this part of the standard ARM IP model? Is it something each SoC vendor does in their own way but they all do it differently?
  • mode_13h - Friday, December 3, 2021 - link

    > doing the work on a more programmable block (some combination of GPU and NPU)
    > leaves you able to retrofit a new idea in two years that you haven't even thought of today.

    It's (usually) also a boon for code maintenance, if you can implement features in a similar way, across multiple generations of hardware. Programmable engines (call them DSP, ISP, GPU, or what you will) are the way to do this, with the caveats that there are inevitably hardware bugs and other quirks that need to be worked around in a generation-dependent manner, and that this approach poses additional challenges for realtime (i.e. video), due to shifting performance characteristics and amounts of resources.
  • mode_13h - Friday, December 3, 2021 - link

    > you need a NoC that can tag requests by QoS (for appropriate prioritization)
    > and by stream (to aggregate locality).

    Only realtime apps truly need that. And of those, all that come to mind are video processing and AR. I'd say AR might've been the real driver, here. Sure, audio processing is also realtime, but tends to be so much lower-bandwidth that it wouldn't be as dependent on realtime scheduling.
  • name99 - Monday, December 6, 2021 - link

    Apple was tagging content on the SoC by QoS and stream back with the A4. It's not something new.

    And "realtime" is somewhat flexible. Of course video capture is hard realtime, but even screen animation effects are soft realtime. You mock this as unimportant, but iPhones have been distinguished by the smoothness of their animation, and general lack of visual/touch glitching, from day one (remember Project Butter and all that?)
  • mode_13h - Tuesday, December 7, 2021 - link

    > You mock this as unimportant,

    Huh? Where? I respect rock-solid, smooth animations and consistently good responsiveness.

    I'm not convinced that warrants QoS tagging of associated bus transactions, but that's only because you don't want UI animations to be particularly taxing, for the sake of battery longevity. If they're not, then it should be enough for the OS scheduler to prioritize the supporting CPU & GPU threads.
  • name99 - Tuesday, December 7, 2021 - link

    How do you think prioritization is IMPLEMENTED at the point that it hits that hardware?

    Any particular NoC routing point, or the memory controller, or any other shared resources has to decide who gets access in what order.
    OS scheduling doesn't help here! All OS scheduling has done is decide which code is currently running on which CPU (or GPU), and that code won't change for 10ms (or whatever the scheduling granularity is). If you want the hardware to make better choices (do I prioritize the requests coming from a CPU or the requests coming from the ISP?), it needs to know something about the packets flowing through the router and the requests hitting the memory controller -- which are low latency, which can be delayed so much but no more, which are best effort.
    That's what tagging (by QoS and stream) achieve!

    In the absence of such tags, the best you can do is rely on
    - heuristics (which are never perfect, and frequently far from perfect)
    - massive overprovision of HW (which is sub-optimal in a phone; and never works anyway because demands always expand given that the HW can [to 95th percentile, anyway...] kinda sorta support the new demands. )

Log in

Don't have an account? Sign up now