Analyzing Falkor’s Microarchitecture: A Deep Dive into Qualcomm’s Centriq 2400 for Windows Server and Linuxby Ian Cutress on August 20, 2017 11:00 AM EST
Developing a custom microarchitecture is difficult. Even with all the standards in place and licensing an instruction set such as ARM, the actual development takes time and the right people to put together, then the infrastructure to deploy at scale.
In the mobile space, we’ve seen custom cores – most notably from Apple – deviating from the regular ARM design, but also Samsung and Qualcomm are playing in that space. Qualcomm however is going one further by developing a custom core for the server and enterprise market, focusing purely on typical enterprise workloads. The current commercial ARM success in the data center comes from companies such as Cavium, who use ARM architecture licenses in a custom SoC. By developing its own high-performance core, Qualcomm is hoping to offer something different in the data center, and they’ve lifted the lid on a good chunk of the core.
The Qualcomm Centriq 2400 SoC Family, with the Falkor CPU
Back in December 2016, Qualcomm announced that it has developed its own SoC for the data center, all the while also reveaing details such as the fact that it is a custom core and that Qualcomm will be involved in the Open Compute Project (and is based on the latest version of Microsoft’s Project Olympus). We knew that Qualcomm has been aiming for a 48 core design, using ARM’s instruction set, and is aiming for the data center and enterprise markets. The goal is to carry forward knowledge of the ARM instruction set and custom core design into markets that could potentially leverage it – it also helps that the data center market has a very interesting TAM (total addressable market, in USD) of which even a small slice could reap rewards. Back in December, they were beginning to sample cloud partners and potential future customers.
The first set of products to come out will be the Qualcomm Centriq 2400 family of SoCs. The top parts will feature 48 cores, and while today Qualcomm is ultimately communicating about said 48-core model, they have stated that the 2400-series will be a range of parts segregated by core count, performance, and power. The CPU cores, code named Falkor, will be ARMv8.0 compliant although with ARMv8.1 features, allowing software to potentially seamlessly transition from other ARM environments (or need a recompile). The Centriq 2400 family is set to be AArch64 only, without support for AArch32: Qualcomm states that this saves some power and die area, but that they primarily chose this route because the ecosystems they are targeting have already migrated to 64-bit. Qualcomm’s Chris Bergen, Senior Director of Product Management for the Centriq 2400, stated that the majority of new and upcoming companies have started off with 64-bit as their base in the data center, and not even considering 32-bit, which is a reason for the AArch64-only choice here.
The design team behind the Centriq, as explained to us, was partly formed from the custom core team from the mobile side. On the mobile side we have seen Qualcomm custom cores based on ARM’s instruction set in the form of Krait and Kryo, although this new Falkor design is not derived from either. Qualcomm states that Falkor is their 5th generation of custom CPU core design, and has been a complete ground up design specifically for the data center. The focus, we were told, was on high overall performance, high performance per watt, but also the ability to run at low power. To do this, the Centriq 2400 is set to be the first major data center design built on a 10nm process.
We already know that it will be fabbed on a 10nm process, and various media/analysts have postulated which foundry will be playing that role. Qualcomm currently has 10nm volume with Samsung through the Snapdragon 835, which is shipping in the millions. Samsung’s 10nm processes are more mature than the competition at this point, however Samsung does not have much experience with large silicon dies, tending to favor smaller SoCs due to the naturally higher yields and helping to keep fab production at a high level. The other alternative is TSMC, whose CLN10FF process was technically available for select customer orders later than Samsung, but is currently being used by Apple's A10X in the iPad Pro 2. TSMC also has experience with larger silicon, which would be of considerable benefit. Qualcomm is not announcing who their foundry partner is at this time unfortunately, although it would likely depend on relations, volume, pricing and performance.