The Qualcomm Snapdragon 855 Pre-Dive: Going Into Detail on 2019's Flagship Android SoCby Andrei Frumusanu on December 5, 2018 7:00 PM EST
Following today’s early-on coverage of the Day 2 coverage of Qualcomm’s Tech Summit event in Maui, Hawaii, we recap the major story of the day: The new Snapdragon 855 platform. The new platform follows this year’s extremely successful Snapdragon 845 SoC, which we saw power pretty much the vast majority of 2018’s flagship devices.
Qualcomm isn’t standing still, and the Snapdragon 855 represents a new generation, bringing a refresh of the SoC IPs as well a brand new 7nm manufacturing process. Let’s dwell more into today’s details and analyse how the new SoC platform will raise the bar for 2019.
The Finer Details
|Qualcomm Snapdragon Flagship SoCs 2018-2019|
|SoC||Snapdragon 855||Snapdragon 845|
|CPU||1x Kryo 485 Gold (A76 derivative)
@ 2.84GHz 1x512KB pL2
3x Kryo 485 Gold (A76 derivative)
@ 2.42GHz 3x256KB pL2
4x Kryo 485 Silver (A55 derivative)
@ 1.80GHz 4x128KB pL2
|4x Kryo 385 Gold (A75 derivative)
@ 2.8GHz 4x256KB pL2
4x Kryo 385 Silver (A55 derivative)
@ 1.80GHz 4x128KB pL2
|GPU||Adreno 640 @ ?MHz||Adreno 630 @ 710MHz|
|Memory||4x 16-bit CH @ 2133MHz
3MB system level cache
|4x 16-bit CH @ 1866MHz
3MB system level cache
|ISP/Camera||Dual 14-bit Spectra 380 ISP
1x 48MP or 2x 22MP
|Dual 14-bit Spectra 280 ISP
1x 32MP or 2x 16MP
|2160p60 10-bit H.265
HDR10, HDR10+, HLG
|2160p60 10-bit H.265
|Integrated Modem||Snapdragon X24 LTE
DL = 2000Mbps
7x20MHz CA, 256-QAM, 4x4
UL = 316Mbps
3x20MHz CA, 256-QAM
|Snapdragon X20 LTE
DL = 1200Mbps
5x20MHz CA, 256-QAM, 4x4
UL = 150Mbps
2x20MHz CA, 64-QAM
|Mfc. Process||7nm (N7)||10nm LPP|
At the heart of the new Snapdragon 855 lies Arm’s new Cortex-A76 CPUs: We’ve covered the new microarchitecture extensively this year as we dove into the technical tidbits of the CPU in May, and more recently have been able to deep dive test the performance and power efficiency of the new IP inside HiSilicon’s new Kirin 980. The combination of the new Cortex-A76 with the new 7nm manufacturing node made for great leaps in performance and power efficiency, something that bodes very well for the new Snapdragon 855.
Qualcomm’s take on implementing the new Cortex-A76 cores is quite a bit different than what we’ve seen from HiSilicon. Overall there’s still 4 Cortex A76 derived cores (Kryo 485 Gold as Qualcomm markets them), alongside four Cortex-A55 derived CPUs. The differences here lie in the frequencies, the apparent cache configurations, as well as apparent changes in some microarchitectural tuneables.
Interestingly, for the first time since Qualcomm has adopted Arm’s “Built on Cortex Technology” license and with the third iteration of its implementation, Qualcomm has finally released details on the kind of changes that have been commissioned to Arm in terms of changes to the IP. Here Qualcomm reveals that the Cortex-A76 variation in the Snapdragon 855 allows for a bigger out-of-order execution window, most likely referring to an increase in the size of the reorder buffer. The stock A76 has a 128 instruction buffer, whereas Qualcomm's modified A76 has been increased to an undisclosed size.
Alongside what seems to be the ROB increase, Qualcomm has also revealed that the data prefetchers have been optimised for better efficiency. It’s not clear if the “efficiency” here refers to power efficiency or the efficiency in the way data is prefetched, nor are the disclosures here what exactly has changes, whether there’s more or less prefetch streams or if there’s been changes in the other types of prefetchers.
While HiSilicon opted for a 2+2 design, where one pair of A76’s were optimised for high frequencies and the second pair were optimised for higher power efficiency, Qualcomm opted to go with a 1+3 configuration.
The highest performance core, “Kryo 485 Gold Prime” as Qualcomm calls it, is clocked in at 2.84GHz – putting it on its own clock domain – and is seemingly configured with a 512KB L2 cache. The other three cores are clocked at 2.42GHz and retain smaller 256KB L2 caches. This configuration is quite odd – you also would expect Qualcomm to take advantage of the new DynamIQ cluster design, which is able to support different frequency and voltage planes, however things get even odder. The prime core actually doesn’t have its own voltage plane, and thus it has to share its voltage plane with the other three big cores.
This revelation of the prime core not having its own power domain is quite shocking and it invalidates a lot of the benefits of actually having a separate clock plane for a core. In effect the real-world benefit here isn’t any different than simply clock-gating the core.
It is true that there’s a large amount of scenarios where there’s predominantly a single larger thread active, this is particularly true in web browsing workloads. Such a 1+3 configuration would achieve better performance and possible better efficiency than a 2+2 configuration, but because the cores aren’t running on separate voltage planes it means the actual benefits here in real-world applications are just going to be quite minor. The net result is that the setup is leaving a lot of power efficiency on the table: the voltage supplied to both core groups is always going to be the greater of whatever is being asked for, even if one of the two groups could operate on (much) less voltage.
Qualcomm’s 2.84GHz clock is 9.2% higher than HiSilicon 2.6GHz frequency. A big question here is just how far Qualcomm has driven the core up on the power curve – I am expecting it to be less efficient than the Kirin 980 by some margin, how big that margin will be is something we won’t see until we get our hands on commercial devices.
Most interestingly for today’s presentation is that Qualcomm hadn’t made a single concrete mention about CPU power efficiency of the Snapdragon 855, and I’m not sure if this means there’s no improvements or rather just downplaying this aspect of the SoC given the other significant changes.
Lastly, I do find it odd that Qualcomm went for smaller L2 caches on the remaining 3 high performance cores. I still expect these to end up higher performance than HiSilicon’s 1.92GHz A76 units with 512KB L2’s – but it’s nevertheless interesting to see both companies try to achieve the same goal in different ways.
Moving on, we see the four Cortex-A55 derived efficiency cores, which are running at 1.8GHz and coupled with 128KB L2 caches. In this regard, it seems the Snapdragon 855 doesn’t differ from the Snapdragon 845. Here the company has seemingly put all the process node advantages into improving power efficiency of the little cores.
The DynamiQ Shared Unit’s L3 cache
should come in at 4MB – which would be a doubling over the 2MB configuration on the Snapdragon 845. It’s to be noted that we haven’t yet fully confirmed the cache configurations at the time of writing, but I’m strongly leaning towards these figures to be correct. We’ve by now confirmed that the L3 cache has remained at 2MB – this is quite conservative on Qualcomm’s part and there will be an IPC impact compared to the Kirin 980’s 4MB implementation.
In terms of performance, all that Qualcomm publishes is a claim of up to a 45% performance increase over the Snapdragon 845. As with last year, it’s a bit of a mystery exactly what this figure represents, but the number pretty much falls in line exactly where the Kirin 980 performs in relation to the Snapdragon 845 in SPEC2006. The big question for the S855 is how the new generation system level cache will behave in terms of memory latency, as this will be among the biggest aspects differentiating Qualcomm’s new SoC from its Kirin competition.
Another interesting performance comparison that was published today is a showcase of performance figures between the Snapdragon 855, Apple A12, and the Kirin 980 in terms of app launch times. Though Qualcomm doesn't directly name their competitors, competitors A and B should be the Apple A12 and the Kirin 980 respectively, assuming Qualcomm’s colour scheme is also consistent across the GPU comparisons. For me it’s not to surprising to see the Snpadragon 855 perform this well – one thing I did note in my Huawei Mate 20 review is that the Pixel 3 and OnePlus 6 still felt faster in terms of application launch times. Though this could all just be a side-effect of the scheduler and framework of the Snapdragon chipset rather than the raw CPU performance of the hardware. Of course, software still matters immensely and over the last two years Qualcomm has demonstrated absolute leadership in terms of milking out responsiveness and reactivity out of the hardware through its software designs.
Adreno 640 GPU - Iterative Features and Performance
The Adreno 640 graphics block will be the focus for Qualcomm’s gaming efforts. The company went to great lengths to detail how they felt mobile gaming is on the rise, while other platforms for video games are either stagnating or in decline.
In terms of technical specifications, as is traditional with Qualcomm, we didn’t see much in the way of detailed disclosures on the new GPU. What we did get are more conservative figures, such as a 20% increase in performance. This increase is quite small compared to what we tend to usually see, especially given the fact that the Snapdragon 855 is able to take advantage of a major process node transition.
The Snapdragon 845’s GPU was already the smallest among flagship mobile SoCs at a mere 10.69mm², so unless Qualcomm has significantly increased the number of processing elements inside the GPU cores, this generation should be even smaller. Meanwhile in the event presentation there was one actual titbit about the GPU; Qualcomm is saying that they've increased the number of ALUs for FP32 and FP16 operations by 50%. If my previous estimates about the Adreno 630 were correct, then this would mean the new Adreno 640 sports 384 ALUs per core for a total of 768 ALUs. This ALU increase doesn’t match up with the claimed performance increase, so it’s possible Qualcomm is running the GPU at a lower frequency, or the performance claims were made in regards to possibly less ALU sensitive workloads.
Qualcomm showed a side-by-side comparison between the Snapdragon 845 and the new 855 running PUBG on a cycled script at 40fps. The new chipset was able to showcase a 28% reduction in power on this identical workload. It’s to be noted we don’t really know exactly what point on the power curve this measurement is done, so it’s always a bit of a mystery in terms of direct power comparisons when you do the testing at certain capped performance states.
While the performance gains remain a bit vague at time of writing, Qualcomm did disclose a lot in terms of new graphical features. Here we saw claims that the Adreno 640 graphics in the S855 will enable true HDR gaming, as well games built around Physically Based Rendering. The graphics pipeline will support 10-bit color depth and the Rec 2020 gamut to enable HDR, as well as enabling S855 devices to support the HDR10+ and Dolby Vision formats, which QC states is a world’s first. With the Adreno 640, along with the display IP, devices can support 120fps gaming as well as smooth 8K 360-degree video playback (resolving a major complaint about Snapdragon-power). Just don’t ask how much space those 8K 360-degree videos take up.
Qualcomm's support for Physically Based Rendering in graphics is an interesting topic, one we’ll go into detail in a different article, but the concept is not new. In fact we're a bit surprised to see it mentioned in the same breath as actual hardware changes, since conceptually it shouldn't require any new hardware; PBR is just a shader program that all of the Adreno 600 family should be able to run.
In any case, the short version is that with this enabled, it will help add realism to gaming and augmented reality through more accurate lighting physics and material interactions. Qualcomm stated that through the Unity and Unreal 4 engines, developers will be able to use real world materials designed from scientific values created by companies like Quixels and Allegorithmic that will make their environments more lifelike, such as the correct surface roughness / audio reflections or material-on-material interactions. This will also help with lighting and depth perception. More details to come.
Post Your CommentPlease log in or sign up to comment.
View All Comments
B3an - Thursday, December 6, 2018 - linkThe stuff about PBR makes no sense. Many mobile games already use PBR and run perfectly fine on the Adreno 630 for example.
You say "more details to come", any estimates on that?
Ryan Smith - Thursday, December 6, 2018 - linkWe're still trying to find out more. Qualcomm is specifically calling it out as a feature, but we're not currently aware of any reason that PBR wouldn't work on any Adreno 600 GPU. It's basically just a combination of shader programs and design principles.
ZolaIII - Thursday, December 6, 2018 - linkMe by the highest clocked big core is actually designed to be core one of four in order to balance true utilisation to capacity as you know how first core is always most (highest) utilised one & how scheduler is never able to completely balance that.
Me be we all have a wrong picture of this in traditional cluster design. Anyway at least like that it would have some sense delivering a bit more actual performance to theoretical one simply by better utilisation. We will see how it works.
tuxRoller - Thursday, December 6, 2018 - linkAdding a third (2.5?) tier that consists of a single fast core should make the scheduling problem a bit easier.
matten - Thursday, December 6, 2018 - linkIs it for someone clear that the X50 modem will support the EU mmWave band (26Ghz) or will it still be only the 28Ghz band?
mayankleoboy1 - Thursday, December 6, 2018 - linkQualcomm not talking about CPU performance AND efficiency in loud capital letters is worrisome
I feel a rerun of SD810..
@Andrei Frumusanu : WDYT?
Wilco1 - Friday, December 7, 2018 - linkhttps://www.anandtech.com/show/13686/snapdragon-85...
mayankleoboy1 - Thursday, December 6, 2018 - linkslightly OT:
What stops desktop GPU vendors (all 3) from including 4K/360 HEVC/MP4/VP9 encode/decode in cmplete hardware when the mobile vendors can do?
GreenReaper - Thursday, December 6, 2018 - linkWell, for a start, most desktops don't have a need for 360-degree capture...
As discrete video cards, perhaps the argument is that as long as they can do it at sufficient performance in shaders, pure hardware is unnecessary? After all it is extra space. Power and temperature constraints are not as great on desktop which is the number-one driver of this feature.
Intel's Kaby Lake can do all three at least for 4K (except not VP9 10-bit encode):
AMD's Vega core can't do VP9 (but if you just want to record arguably HEVC is better):
AntonErtl - Thursday, December 6, 2018 - linkIf there is only one (CPU-demanding) thread active, the common power domain will not hurt much: The three other A76 will be idle and consume little, despite high voltage.
If there are many CPU-demanding threads active, the common power domain will probably not hurt, either: Then you cannot afford high voltage for any of the fast cores, so the fastest core will run below its maxumum voltage and clock.
Where the design may hurt is if there are two CPU-demanding threads active, but one would need to see exact data to know how much. It may not be a big deal.