Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored
by Brian Klug & Anand Lal Shimpi on October 7, 2011 12:35 PM EST- Posted in
- Smartphones
- Snapdragon
- Arm
- Qualcomm
- Krait
- MDP
- Mobile
- SoCs
Cache & Memory Hierarchy
Qualcomm has a three level exclusive cache hierarchy in Krait. The lower two levels are private per core, while the third level is shared among all cores. Qualcomm calls these caches L0, L1 and L2.
Each Krait core has an 8KB L0 cache (4KB instruction + 4KB data cache). The L0 cache is direct mapped and accessible in a single cycle. Qualcomm claims an 85% hit rate in this level 0 cache, which helps save power by not firing up the larger L1 cache. The hierarchy is exclusive so L0 data isn't necessarily duplicated in L1.
Each core also has a 32KB L1 cache (16KB instruction + 16KB data). The L1 4-way set associative and can also be accessed in a single cycle. There's no way prediction at work here. With 1 cycle latency to both L0 and L1, the primary advantage here is power.
Krait Cache Architecture | |||||
Size | Architecture | Frequency | |||
L0 | 4KB + 4KB | Direct Mapped | Core | ||
L1 | 16KB + 16KB | 4-way set associative | Core | ||
L2 | 1MB (dual core) or 2MB (quad core) | 8-way set associative | 1.3GHz max |
The L2 cache is shared among all cores. In dual-core designs the L2 cache is sized at 1MB (up from 512KB in Scorpion), while quad-core Krait SoCs will have a 2MB L2. Krait's L2 cache is 8-way set associative.
While the L0 and L1 caches operate at core frequency and are on the same voltage plane as their associated core, the L2 cache is separate. To save power the L2 cache runs at its own frequency (up to 1.3GHz depending on the currently requested performance level). The L2 cache is on its own power plane and can be power gated if necessary.
Although Scorpion featured a dual-channel LPDDR2 memory controller, in a PoP configuration only one channel was available to any stacked DRAM. In order to get access to both 32-bit memory channels the OEM had to implement a DRAM on-package as well as an external DRAM on the PCB. Memory requests could be interleaved between the two DRAM, however Qualcomm seemed to prefer load balancing between the two with CPU/GPU accesses being directed to the lower latency PoP DRAM. Very few OEMs seemed to populate both channels and thus Scorpion based designs were effectively single-channel offerings.
Krait removes this limitation and now OEMs can utilize both memory channels in a PoP configuration (simply put two 32-bit DRAM die on the PoP stack) or in an external configuration. The split PoP/external DRAM organization is no longer supported. This change will hopefully mean we'll see more dual-channel Krait designs than we saw with Scorpion, which will in turn improve performance.
Process Technology and Clock Speeds
Krait will be the world's first smartphone CPU built on a 28nm process. Qualcomm is working with both TSMC and Global Foundries, although TSMC will produce the first chips. Krait will be built, at first, on TSMC's standard 28nm LP process. According to Qualcomm there's less risk associated with TSMC's non-HKMG process. Qualcomm was quick to point out that the entire MSM8960 SoC is built on a 28nm LP process compared to NVIDIA's 40nm LPG design in Kal-El. From Qualcomm's perspective, 40nm G transistors are only useful at reducing leakage at high temperatures but for the majority of the time a homogeneous LP design makes more sense.
Just like Scorpion, Krait places each core on its own voltage plane driven at its own clock frequency. Cores can be clocked independently of one another, which Qualcomm insists gives it a power advantage in many workloads.
The first implementation of Krait will be in a dual-core 1.5GHz MSM8960, however a second revision of the silicon will be introduced next year that increases clock speed to 1.7 - 2.0GHz. Qualcomm claims that at the same 1.05V core voltage, Krait can run at 1.7GHz vs. 1.55GHz for Scorpion. At these two clock speeds and at the same voltage, Qualcomm tells us that Krait consumes 265mW of power vs. 432mW running an undisclosed workload. Although it should be possible to draw more power than Scorpion under load, Krait should hopefully be able to improve overall power efficiency by completing tasks quicker and thus dropping down to idle faster than its predecessor. Smartphone and tablet battery life should remain the same at worst and improve at best, as a result.
108 Comments
View All Comments
introiboad - Friday, October 7, 2011 - link
Really? I wasn't aware of anyone else in the industry not using ARM's RTL and designing their cores from scratch.z0mb13n3d - Friday, October 7, 2011 - link
Well then, perhaps you haven't heard of Marvell and their Armada line of SoC's?introiboad - Friday, October 7, 2011 - link
Yes, I have heard of Marvell and Armada, isn't that what's left of XScale? Honestly I thought they had given up on what was XScale and licensed the RTL like everyone else instead, but it looks like I was wrong.metafor - Friday, October 7, 2011 - link
Which is probably why Anand specified tablet/smartphones. Marvell is, for all practical purposes, not a major or even relevant player in tablet/smartphones.It is worthy to note that both nVidia and (thus believed) Apple are utilizing their architectural licenses and are cooking up their own cores currently. But none will likely launch in 2012.
Anand Lal Shimpi - Friday, October 7, 2011 - link
The qualification there was "in the smartphone/tablet space". Marvell hasn't had any significant design wins in the high end Android, iOS, Windows Phone or QNX OS space that we cover.Is there another company you are referring to?
Take care,
Anand
Mike1111 - Friday, October 7, 2011 - link
What about the ST-Ericsson Nova A9600?http://www.stericsson.com/press_releases/NovaThor....
It's a 28nm dual-core Cortex-A15 (up to 2.5 GHz) with an Imagination Rogue GPU (Series 6, 210 GFLOPS). Taped out and set to ship in 2012:
http://www.eetimes.com/electronics-news/4226942/ST...
And I'm sure we will see an Apple A6 in the next 12 months (IMHO could be quite similar to the Nova A9600 in terms of CPU and GPU).
Anand Lal Shimpi - Friday, October 7, 2011 - link
Neither of those options are custom designs using the ARM ISA, they are full IP licenses.You are correct on ST-E's announcement though, I simply haven't been factoring them into discussions lately as they have been pretty much not present in the high-end smartphone space as of late.
Take care,
Anand
partylikeits1999 - Saturday, October 8, 2011 - link
I'm hearing that this one has slipped, and now the ST-Ericsson chipset with Rogue won't sample to OEMs until the first half of next year and therefore won't be in commercial products until the very end of 2012 if at all, whereas the MSM8960 is already sampling to OEMs according to Qualcomm. In other words, schedule-wise, you're probably comparing apples to oranges. I do agree with you though that we'll likely see A6 from Apple, in some form, by this time next year, but I think it'll be higher spec'd and will blow the doors off anything from STE. The more interesting question is whether the quad core 8064 that Qualcomm has mentioned for next year, can keep up with A6, both from a CPU and a GPU standpoint.ArunDemeure - Friday, October 7, 2011 - link
Marvell indeed hasn't had much luck in the high-end so far, but the same latest PJ4 core has been fairly successful both as the HSPA Pantheon 920 at BlackBerry (including some new BlackBerry 7 devices) and the TD-SCDMA Pantheon 910 for various China Mobile-specific phones. So I'm not sure it's a good idea to exclude them completely although they're certainly not in the same league as Qualcomm so I really don't blame you.macs - Friday, October 7, 2011 - link
Are you really sure that the upcoming Nexus Prime will use OMAP 4?Seems unlikely to me... A SGX 540 to power a 720p display when Samsung as their own better SOC with Mali 400?? And the rival iPhone 4S use a GPU that is 7/8 times faster than SGX540.
Sounds really stupid... I can't believe that OMAP 4 is the reference SOC for Android ICS