Today AMD is taking the wraps off their upcoming mobile APUs, joining the already discussed desktop Kaveri. While Kaveri will also be coming to laptops at some point in the first half of 2014, the focus during the mobile APU briefing was squarely on the replacements for the current Temash and Kabini APUs, codenamed Mullins and Beema.

We looked at Kabini earlier this year, but while sales of laptops and tablets with the Kabini/Temash APUs have reportedly been quite good, we haven’t had the chance to test any retail laptops. With Intel’s Bay Trail set to give Atom a much-needed kick in the pants as far as performance is concerned, AMD hasn’t been standing still and their next generation of “small core” APUs looks ready to give Silvermont some stiff competition. Here’s what we know right now.

First and foremost, these are actually new cores as opposed to mere tweaks of existing designs. Temash and Kabini used “Jaguar” cores, built on a 28nm process node; Mullins and Beema will also use 28nm technology, with “Puma” cores, but along with improvements to the design to reduce the power use, AMD is also incorporating an ARM Cortex-A5 core with TrustZone technology to help with security. Here’s the quick overview of the current and roadmap:

AMD hasn’t disclosed how much the underlying architecture has changed, and I would guess the Puma cores are actually quite similar to Jaguar cores, but the net result is a 2X improvement in performance per Watt according to AMD. They arrive at that number by dividing the performance in a few common benchmarks by the rated TDP of the APUs. Now that’s a bit contrived, as a 25W TDP APU may not actually be drawing 25W during the tests, but we’ll just ignore the marketing for now and focus on the important metrics. Update: It sounds like most of the performance gains come from frequency increases, while power improvements happen at the SoC level.

First, we have Beema replacing Kabini, and with the change we get the AMD Security Processor (ARM Cortex-A5) and a reduction in TDP on some parts, with 10W being the minimum. Mullins does the same for Temash, only AMD uses SDP (Scenario Design Power) rather than TDP (Thermal Design Power), and the new APUs are ~2W compared to 3-4W for Temash. Apparently the TDP for Temash is 8W and the TDP for Mullins is 4.5W, and that’s what AMD uses for their performance per watt calculations.

While a “2X increase” sounds good, there are many ways to get there. Simply dropping the power use by half but maintaining performance would be one way, or doubling performance at the same power use would yield the same 2X increase. Thankfully, AMD provided details of their performance testing for the old and new APUs as well, which I’ve summarized in the table below, and we can see that performance has increased quite a bit along with the drop in TDP.

AMD APU Performance Results
  Temash A6-1450 (8W) Mullins (4.5W) %Increase Kabini A6-5200 (25W) Beema (15W) %Increase
PCMark 8 Home 1343 1809 35% 1861 2312 24%
3DMark 11 468 570 22% 685 823 20%

Even if we completely ignore the TDP aspect, the performance improvements coming with the new Mullins and Beema APUs look to be quite good. The iGPU performance is up around 20% for both the low-power Beema and the ultra-low-power Mullins APUs, while the CPU/overall performance is a more substantial 35% increase with Mullins and 24% with Beema.

AMD hasn’t disclosed clock speeds or anything else for the upcoming APUs, but given A6-1450 is clocked at 1000-1400MHz with the GPU core running at 300-400MHz, it is possible AMD was able to arrive at the above performance increases simply with higher clock speeds. Also possible is that similar to the Bobcat to Jaguar transition, AMD tweaked other elements of the Puma core (e.g. the scheduler could have more entries).

Core counts on the CPU side have remained the same: 2-4 cores. With the lower SDP of Mullins, AMD also notes that fanless quad-core tablets and laptops will now be possible, which definitely opens some additional doors. When we looked at Kabini performance (granted in a 15W TDP), we found the CPU performance was typically well ahead of Atom at the time, and even Silvermont/Bay Trail are only moderately ahead (and in some cases still slower). How things shake out with Mullins in the 2W market will be something to watch.

While we don’t know if the iGPU has added any additional cores, it remains GCN based and very likely uses the same cores as before, only with higher clocks. Consider that AMD’s GCN architecture breaks things down into Compute Units (CUs) with 64-cores per CU. The existing Kabini/Temash APUs have two CUs and 128 cores, while Hawaii as an example includes a staggering 44 active CUs in the R9 290X; Kaveri goes for the middle ground with up to 8 CUs (512 cores). In order to increase the number of cores in Beema/Mullins, AMD would have to make the jump from 2 to 3 CUs, a 50% increase; given the ~20% performance increases above, it’s far more likely these come from the same number of cores/CUs running at higher clocks than more cores running at lower clocks.

Wrapping things up, there are a few other items we wanted to quickly touch on. First is Kaveri for notebooks, which as noted above will be shipping in H1’14. Kaveri is a GCN 1.1 part, similar to Hawaii only with fewer cores, and it also supports HSA features and AMD’s new TrueAudio. Again, notice that neither of those elements are listed for Mullins/Beema, indicating they’re using the same basic GCN 1.0 GPU design as Temash/Kabini. Kaveri will also be making the transition to 28nm from Trinity/Richland’s 32nm, and we could see a fairly decent bump in performance – but AMD isn’t saying much on the subject of mobile Kaveri performance just yet.

The other items we wanted to quickly discuss (and you can see these and a few other pieces of information in the slide gallery below) are some of the other additions AMD is making with Mullins/Beema. There are three points to discuss: AMD DockPort, Microsoft InstantGo, and the Platform Security Processor.

While DockPort sounds interesting (a non-Intel alternative to Thunderbolt that basically combines DisplayPort 1.2 with USB 3 into a single cable), AMD said precious little about DockPort in their presentation. Someone asked about it, and AMD said it was “up to laptop manufacturers” and that was about it. There’s the above slide as well, showing how a single cable could drive three external displays along with a variety of peripheral devices, but we’ll have to wait and see how many companies are willing to jump on the DockPort bandwagon.

Microsoft InstantGo is another feature that AMD supports. Formerly called Connected Standby, InstantGo allows your laptop to wake up from sleep mode periodically to pull down network updates – email, live tiles, etc. It also allows devices to go from deep sleep to “on” in under 500 milliseconds, basically matching what we get with tablets and smartphones. Much of the implementation of InstantGo will again be left to the device manufacturers (i.e. the “up to 14 days in standby mode” will depend on the battery capacity and other power optimizations made by the OEMs).

Last up is the Security Processor, which consists of an ARM Cortex-A5 core with support for the ARM TrustZone. We discussed this in more detail previously, but the short summary is that the technology is designed to provide a Trusted Execution Environment to help protect against malware and viruses, as well as providing new ways to deal with user authentication, payment processing, etc. How much use the Security Processor will see in the short term is difficult to say, but if ARM can get some traction with it in the smartphone/tablet space, it’s inclusion in AMD’s Mullins/Beema APUs could prove beneficial.

Wrapping things up, Mullins and Beema will be coming out in 2014, but AMD hasn’t given a precise time frame. We have a date for desktop Kaveri (January 14, 2014), but everything else is “first half of 2014”. Given the added pressure AMD is facing from Intel’s Bay Trail, hopefully the Mullins/Beema APUs will arrive sooner rather than later, but that may simply be wishful thinking on my part. As usual, the real challenge is in getting the APUs into a compelling product – one that offers the right features at the right price point. With tablets and Chromebooks taking over the sub-$300 market, creating something that clearly stands out from the crowd is becoming difficult.

Source: AMD Announcment

POST A COMMENT

47 Comments

View All Comments

  • monstercameron - Wednesday, November 13, 2013 - link

    unfortunately, no killer systems/value have used amd temash but here is one
    http://www.bestbuy.com/site/toshiba-satellite-clic...
    Reply
  • blanarahul - Wednesday, November 13, 2013 - link

    We are still on 28 nm?! I heard the industry was slowing down. But this is unexpected!! Reply
  • SaberKOG91 - Wednesday, November 13, 2013 - link

    I think you're missing an important point here: AMD was able to lower power usage and increase performance on the same process node (28 nm) as many of the last gen chips. This means that there is still a lot of potential for efficiency increases on the existing node and that AMD is probably waiting for newer process nodes to mature. This means that when they do move to a smaller node, the process will be much more refined. For AMD this translates to lower leakage currents and fewer faulty chips. For the consumer, this translates to lower power usage and lower processor costs.

    After all, no sense moving to a newer process node when Intel is having trouble getting Broadwell out the door. Not every company can absorb the financial loss of having to delay an entire product line because they jumped at a new process before it was really ready.
    Reply
  • Nagorak - Thursday, November 14, 2013 - link

    It means the process isn't ready for them yet. They are still at a disadvantage compared to Intel in terms of performance and power use. Have no doubt if they could shrink to a smaller process they would do so. Reply
  • SaberKOG91 - Thursday, November 14, 2013 - link

    "Could" is kind of a loaded word. Yes they could sink revenue into moving their manufacturing to a lower node. But that would mean extra costs at a stage when AMD is just starting to prove to investors that they can be profitable again. You also need to take into account that TSMC's 20nm has not reached good yields yet and that GloFo is still at 28nm. Given that these are AMD's major fabs, it makes sense that they aren't pushing lower.

    We are seeing a lot of interesting changes in the way that AMD does business and it really should be no surprise that they stayed at the current node. AMD switched from using their CPU layout tools for CPU design to their GPU tools and were able to decrease overall layout size by > 20%. AMD's newest chips will be competitive in the mobile market, especially having much faster graphics performance than anything Intel can offer at the same power envelope and cost.

    If you really want to be concerned about a threat to AMD (or x86 in general), the focus should be on what ARM-based designs are going to be able to do once 64-bit chips hit the microserver and PC arena. Again, AMD has had the foresight to start design and testing of 64-bit ARM designs that are likely to be paired with the graphics superior GCN cores. This should help them get the jump on Intel in quite a few developing markets, and should make them competitive with ARM in other arenas.
    Reply
  • Penti - Thursday, November 14, 2013 - link

    Are it? Nobody is at a lower node on TSMC. We have been using 28nm TSMC for GPU's since what? 1H of 2012. The industry isn't slowing down, but it won't really follow Moore's law after 16/14 nm and shrinking becomes harder. 40 nm was basically around for 3 generations of GPU's btw, or 2009-2012. Reply
  • yankeeDDL - Thursday, November 14, 2013 - link

    As process becomes exponentially more complex, they also take much longer to mature. Obviously, Intel has a key advantage here: at 14nm they have 4X area savings over AMD's chips, however, 28nm process in 2014 is quite a different thing than 28nm process in 2012. Not to mention that designing properly/efficiently in such a small node is not straightforward. So, lacking Intel's Fab power, I think that exploiting 28nm is a good bet. The best bet, infact, in my opinion. Let's hope it's enough. Reply
  • bleh0 - Wednesday, November 13, 2013 - link

    I'm just waiting mainly on notebook Kaveri and whatever they can put Mullins into. Reply
  • tipoo - Wednesday, November 13, 2013 - link

    I wonder if AMD could ever go the eDRAM route? Bandwidth seems to be the limitation on APUs. Lack of fab space for it maybe? Reply
  • SaberKOG91 - Thursday, November 14, 2013 - link

    This has more to do with three other problems: caching latency, RAM bandwidth, and non-uniform memory addressing for CPU and GPU.

    AMD has had significantly higher memory latencies on all levels than Intel for a little while now, and we are hoping to see this improve in newer designs. This has the greatest impact on performance for CPU and GPU as it causes the processor to become "data starved" and have to wait for data to be retrieved.

    RAM bandwidth is another factor, as highly parallel workloads end up consuming and generating significant amounts of data. While some can be done by developers to alleviate these issues, the root problem is the limited throughput of DDR3 over GGR5 for these kinds of workloads (that's why both XB1 and PS4 have unified GDDR5).

    Having a uniform memory space is important because it eliminates the need for copying between different memory pools, as well as pointer conversions. This alone can have a pronounced affect on high utilization scenarios.

    The problem with eDRAM is the die space and power usage. Intel got away with eDRAM in their processors because they are running at much higher power and thermal envelopes than mobile parts. They also dodged the die space issue by having the eDRAM as a separate die in the same package. AMD is trying to unify their product lines so that they can focus more on developing new architectures. Adding something special to a few different models won't help them in the long run. This was really more of a marketing gimmick for Intel as they try to become recognized as a workstation level graphics solution.
    Reply

Log in

Don't have an account? Sign up now