NVIDIA Tegra K1 Preview & Architecture Analysis
by Brian Klug & Anand Lal Shimpi on January 6, 2014 6:31 AM ESTThe GPU
Despite the Denver surprise, the big story behind Tegra K1 is its GPU. Prior to K1, all previous Tegra designs implemented some derivative of what became known as the GeForce ULP core. This was a non-unified architecture that, at times, looked a lot like NV40. The design was never all that impressive from a performance or power efficiency standpoint. It was cost effective and often constrained by a narrow memory interface.
Going into Project Logan, which became Tegra K1, NVIDIA made the decision (around 3 years ago) to abandon the GeForce ULP roadmap and instead combine mobile and PC GPU roadmaps. Tegra K1 would be the first design to leverage a PC GPU, in this case Kepler. The bigger implication is that all future Tegra SoCs will integrate PC GPUs. The even crazier part of all of this is that all future NVIDIA GPUs will start out as mobile first designs (including Maxwell). Productization and market availability may happen in a different order, but all architectures will start as mobile designs and then be adopted to fit other, higher power segments. This is very much like Intel’s mobile-first realization of the mid-2000s with regards to notebook processors, but with NVIDIA and smartphone/tablet GPUs.
Kepler makes the move into mobile largely unchanged. This is a full Kepler implementation with the same size register file, shared L1 and is 100% ISA compatible with its big brother. It turns out that Kepler, as it was originally designed, was pretty good for mobile. If you take a GeForce 740M (2 SMX/384 CUDA core design), you’re looking at roughly a 19W GPU. Of that 19W, around 3W is memory IO, PCIe and other non-GPU things. You can subtract another 6W for leakage, bringing you down to 10W. Now that’s a 2 SMX design, so divide it in half and now you’re down to 5W. Drop the clock from 1GHz down to 900MHz, and the voltage as well, and now we’re talking around 2 - 3W for the GPU core and that’s without any re-architecting. Granted you can’t just subtract out things like leakage like that, but you get the point. Kepler wasn’t a bad starting point for a good mobile GPU design.
Tegra K1 features a single SMX (in a single GPC), which amounts to 192 CUDA cores. NVIDIA made the rookie mistake of calling Tegra K1 a 192-core processor, which made for some great headlines but largely does the industry a disservice.
Tessellation and geometry engines aren’t crippled compared to desktop Kepler. FP64 support is also present, at 1/24 the FP32 rate. There are 4 ROPs and 8 texture units, down from 16 in the PC version of Kepler. The big changes however are in the interconnects between all of the parts of the GPU.
The bigger implementations of Kepler have to be able to efficiently move data between multiple SMXes, ROPs and memory controllers. The interconnect fabric needed to do that doesn’t scale down well for mobile, where in many cases we’re dealing with one or two of those things instead of a dozen. By removing the complexity that exists in the bigger Kepler’s fabric you limit the ability for mobile Kepler to scale, but then again mobile Kepler is never going to scale to the sizes of big desktop GPUs so it’s not an issue. There are other changes outside of interconnect, with improved clock gating among other focuses on power efficiency.
NVIDIA updated the texture units to support ASTC, something that isn’t present in the desktop Kepler variants at this point. NVIDIA also hopes to use the GPU’s color compression features to reduce memory bandwidth requirements in UI rendering and not just 3D games.
With the changes NVIDIA made to the design, Kepler ends up being a < 2W GPU perfect for mobile. NVIDIA provided us with some data showing SoC + DRAM power while running GFXBench 3.0 (Manhattan), an OpenGL ES 3.0 test:
The data is presented in NVIDIA’s usual way where we’re not looking at peak performance but rather how Tegra K1 behaves when normalized to the performance of Apple’s A7 or Qualcomm’s Snapdragon 800. In both cases NVIDIA is claiming the ability to deliver equal performance at substantially better power efficiency.
NVIDIA shared some live demos that echoed the data above. Peak performance was capped to that of the A7 or Snapdragon 800, but SoC level power was always lower. It remains to be seen what power consumption looks like in a shipping configuration (which is almost always optimized for peak performance not equal performance at lower power), but it’s safe to say that concerns about Kepler being too power hungry for mobile are overrated.
The most compelling argument in favor of putting Kepler in a mobile SoC actually has to do with its API support. In one swift move NVIDIA goes from being disappointing in API support to industry leading. Since this is a full Kepler implementation (just a lower power/performing version) Tegra K1 maintains full API compatibility with NVIDIA’s flagship GeForce products. OpenGL ES 3.0 is supported but so are full OpenGL 4.4, DX11 and CUDA 6.0.
NVIDIA made it a point to say that high-end games developed for the PC or even current generation consoles could be ported over to Tegra K1 without issue. It’s perhaps over reaching a bit to claim the latter given the delta in performance (which NVIDIA hopes to make up in 4 generations!), but you can definitely argue that titles built for the previous generation of consoles (Xbox 360/PS3) could easily be ported to Tegra K1.
At its CES press conference NVIDIA teased the idea that Tegra K1 is actually more powerful than the last generation of consoles. The slide below attempts to drive that point home:
With a GPU clock of 950MHz (admittedly, a bit on the high end), NVIDIA can deliver substantially more raw horsepower than either previous generation console (192 CUDA cores * 2 FLOPS per core * 950MHz). Peak texture filtering performance and more importantly, memory bandwidth are lower than what was possible on these consoles but the numbers we’re talking about here aren’t substantial enough to prevent porting from happening. There may be some optimization needed but it definitely looks like Tegra K1 is the first mobile platform that can more or less run Xbox 360/PS3 titles, at least from a performance standpoint.
In pursuit of making porting and game development as simple as possible, NVIDIA demonstrated its NSight Tegra plugins for Visual Studio. Without changing the IDE that developers are used to, NSight Tegra allows developers to use the NDK toolchain all within Visual Studio. I’m not enough of a developer to know whether or not NVIDIA’s efforts in this space truly make life easy enough to port Xbox 360/PS3 games over to Android, but its VS integration demos looked convincing at least.
NVIDIA had a port of Serious Sam 3 running on Tegra K1 demo hardware just fine. Any games that are prepped for Steam OS are very easy to port over to Android. Once you make the move to OpenGL, the rest is allegedly fairly simple. The Serious Sam 3 port apparently took a matter of a couple of weeks to get ported over, with the bulk of the effort going into mapping controls to an Android environment.
88 Comments
View All Comments
Nenad - Monday, January 13, 2014 - link
That is not real picture of GPUs/CPUs, it is photoshoped, so we do not know relative size of A15 and Denver cores.chizow - Monday, January 6, 2014 - link
Of course they would, designating it as simply dual-core would intimate it's a downgrade when it clearly is not.MrdnknnN - Monday, January 6, 2014 - link
"As if that wasn’t enough, starting now, all future NVIDIA GeForce designs will begin first and foremost as mobile designs."I guess I am a dinosaur because this makes me want to cry.
nathanddrews - Monday, January 6, 2014 - link
Why? It was the best thing that ever happened to Intel (Core). Desktop graphics are in a rut. Too expensive, not powerful enough for the coming storm of high frame rate 4K and 8K software and hardware.HammerStrike - Monday, January 6, 2014 - link
From a gaming perspective Intel's focus on mobile has let to 10%-15% performance increases in their desktop line whenever they release a new chip series. That's pretty disappointing, from a gaming performance perspective, even though I understand why they are focusing there.Also, I disagree with you on desktop graphics - this is a golden time for them. The competition in the $200-$300 card range is fierce, and there is ton of great value there. Not sure why you think there is a "storm" of 4k and 8k content coming any time soon, as there isn't, but even 2x R9 290, $800 at MSRP (I know the mining craze has distorted that, but it will correct) can drive 4K today. Seeing as most decent 4K monitors are still $3000+, I'd argue it is the cost of the displays, and not the GPU's, that is holding back wider adaptation.
As long as nVidia keeps releasing competitive parts I really don't care what their design methodology is. That being said, power efficiency is the #1 priority in mobile, so if they are going to be devoting mindshare to that my concern is top line performance will suffer in desktop apps, where power is much less of an issue.
OreoCookie - Monday, January 6, 2014 - link
Since Intel includes relatively powerful GPUs in their CPUs, discrete GPUs are needed only for special purposes (gaming, GPU compute and various special applications). And the desktop market has been contracting for years in favor of mobile computers and devices. In the notebook space, thanks to Intel finally including decent GPUs in their CPUs, only high-end notebooks come with discrete GPUs. Hence, the market for discrete GPUs is shrinking (which is one of the reasons why nVidia and AMD are both in the CPU game as well as the GPU game).MrSpadge - Monday, January 6, 2014 - link
> From a gaming perspective Intel's focus on mobile has let to 10%-15% performance increases in their desktop line whenever they release a new chip series. That's pretty disappointingThat's not because of their power efficiency oriented design, it's because their CPU designs are already pretty good (difficult to improve upon) and there's no market pressure to push harder. And as socket 2011 shows us: pushing 6 of these fat cores flat out still requires 130+ W, making these PCs the dinosaurs of old again (-> not mass compatible).
Sabresiberian - Monday, January 6, 2014 - link
I think you are misunderstanding the situation here. What will go in a mobile chip will be the equivalent of one SMX core, while what will go in the desktop version will be as many as they can cool properly. With K1 and Kepler we have the same architecture, but there is one SMX in the coming mobile solution, and 15 SMXs in a GeForce 780Ti. So, 15x the performance in the 780Ti (roughly) using the same design.Maxwell could end up being made up of something like 20 SMXs designed with mobile efficiency in mind; that's a good thing for those of us playing at the high end of video quality. :)
MrSpadge - Monday, January 6, 2014 - link
This just means they'll be optimized for power efficiency first. Which makes a lot of sense - look at Haitii, it can not even reach "normal" clock speeds with the stock cooler because it eats so much power. Improving power efficiency automatically results in higher performance becoming achievable via bigger dies. What they decide to offer us is a different story altogether.kpb321 - Monday, January 6, 2014 - link
My initial reaction was a little like MrdnknnN but when I thought about it I realized that may not be a bad thing. Video cards at this point are primarily constrained on the high end by power and cooling limitations more than anything else. The R9 is a great example of this. Optimizing for mobile should result in a more efficient design which can scale up to good desktop and high end performance by adding on the appropriate memory interfaces and putting down enough "blocks" SMXs in nvidia's case. They already do this to give the range of barely better than integrated video cards to top end 500+ dollar cards. I don't think the mobile focus is too far below the low end cards of today to cause major problems here.