Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested

Name: Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested
Item: Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested
Author: Brian Klug

by Brian Klug on June 18, 2013 8:00 PM EST

115 Comments | Add A Comment

115 Comments

CPU Performance

The state of CPU performance testing under Android is unfortunately still quite broken. We're using a mix of browser based tests with Java & Native apps (AndEBench).

The key comparisons to look for are the Snapdragon 800 MDP/T vs. the Exynos 5 Octa (4 x ARM Cortex A15s) based Galaxy S 4 (SHVE300S), the Exynos 5 Dual (2 x ARM Cortex A15s) based Nexus 10 tablet and any of the Snapdragon 600 based smartphones (HTC One/T-Mobile Galaxy S 4) running two Krait 300s at 1.7/1.9GHz.

Browsermark 2.0

Google Octane Benchmark v1

Mozilla Kraken Benchmark - 1.1

SunSpider Javascript Benchmark 1.0 - Stock Browser

Krait 400 seems to do very well against ARM's Cortex A15, trading positions in terms of performance depending on the test. As these are browser based benchmarks there's a big software component to variability that prevents big conclusions from being made here, but it's clear that Snapdragon 800 is in a similar performance class to current Cortex A15 based designs.

Vellamo Benchmark - 2.0

AndEBench

AndEBench - Java

AndEBench - Native

The Java and Native client AndEBench tests echo what we've seen elsewhere: Snapdragon 800 can definitely be quicker than ARM's Cortex A15, and at least is in a similar class.

Introduction GPU Performance - 3DMark

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

115 Comments

View All Comments

shodanshok - Thursday, June 20, 2013 - link
I forgot to specify the benchmark used. It is Coremark: http://www.coremark.org/

It is a industry standard benchmark with freely available sources.
Wilco1 - Friday, June 21, 2013 - link
Really? Looking at the published results it shows Exynos 4 does 5560 Coremarks/core at 1.4GHz.

The fastest per-core Atom result is 2.3 CM/MHz for 1 thread, and 3.3 with Hyperthreading.

Cortex-A9 does 4.0 for 1 thread - so it is 74% faster single threaded, and 21% faster core for core.

So the A9 destroys Atom on CoreMark as well. I am surprised several of you are trying to argue that in-order cores beat out-of-order cores despite the facts.
shodanshok - Friday, June 21, 2013 - link
No, it is incredible how you pretend to extrapolate _precise_ performance numbers from vague arch details.

Return to Coremark site, because you misunderstan the benchmark results. The CM/MHz score represent the score of the entire soc - so it don't rule out core count differences. Let see the CM/core score instead and you will find that Atom is in the same field of A9 scores, sometime much better.

Some examples: Atom z520 vs Tegra2 and Atom n2800 vs exynos4 quad.

Please also note that:
- Coremark does not stress l2/memory in any way. This is the only reason why A9 slow memory interface does not interfere here;
- the compiler has enormous importance in it's score.

The real Atom problem was the terrible GPU and companion chipset.

Regards.
Wilco1 - Friday, June 21, 2013 - link
I listed the per core results, as I said A9 is 74% faster single threaded and 21% faster with Hyperthreading enabled. These are results from the EEMBC website, no complex extrapolation involved.

Coremark runs mostly in L1, however it does stress the branch predictor seriously. All benchmarks have a major compiler component. Coremark is horrible like pretty much any EEMBC stuff so I don't think it will become popular.
shodanshok - Saturday, June 22, 2013 - link
I can not agree. From CoreBench site:

### Comparison 1:
Tegra2 @ 1.00 GHz (2 A9 cores):
Coremark: 5866.39
Coremark/Core: 2933.20

Atom Z520 @ 1.33 GHz (1 Atom Core):
Coremark: 3192.17
Coremark/Core: 3192.17

Atom advantage: 9%

### Comparison 2:
Exynos4 Quad @ 1.4 GHz (4x A9 cores)
Coremark: 22243.00
Coremark/core: 5560.75

Atom N2800 @ 1.86 GHz (2 Atom cores)
Coremark: 12286.90
Coremark/Core: 6143.45

Atom advantage: 10%

### Note:
Why the two A9 and Atom scores are so much different (see Tegra2 vs Exynos and Atom Z530 vs N2800)? The reason lie in the compiler: recent GCC version have greatly improved their efficienty with in-order uarch. Moreover, please also note that the high A9 score (Exynos) was obtained with their specific arm compiler. I am sure that, if benchmarked using Intel C Compiler, the Atom score would be higher.

### Summary:
the Atom core is more than capable to compete against A9. You can argue than Atom has an higher clock, but in phone/tablet environmento clocks don't mean nothing. What is important is performance/watt.
This bring us to the two real Atom's problem:
1) a very low efficiency chipset and low integration. Moorestown (intel first attempt to mobile with Atom) was doomed from the start because it require 4/5 chips to enable a full-featured phone;

2) a very slow GPU (with very bad performance/watt).

Moreover, it is widely understand that A9 OoO engine is a mild implementation only. A15 is much stronger in this reguard, sometime (not too often, anyway) even apporaching AMD Bobcat single-thread performance.

Regards.
Wilco1 - Saturday, June 22, 2013 - link
No - the performance comparisons that are useful are:

1. Max score for a SoC - despite running at a far lower clock, in both comparisons A9-based SoCs win by more than 80% in overall performance.
2. Efficiency of a core at the same frequency (IPC) - Without Hyperthreading A9 is 74% faster, with Hyperthreading A9 wins by more than 20%.

Note that your comparison doesn't work. You can't come to a conclusion about A9 vs Atom performance when you compare with wildly different frequencies. Also it means giving Atom the advantage of having 2 threads vs 1 on A9. So to make the comparison fair you need to compare with an equal number of threads or at the same clock.

Yes GCC has improved a lot in recent years, on ARM it has become a reasonable compiler and competitive with ARM's armcc compiler. I don't know how much better ICC would be on Atom, but I suspect the gap is far smaller as well.

A9 is not hugely OoO indeed, just like Silvermont. A15 is aggressive OoO and beats Jaguar.
shodanshok - Saturday, June 22, 2013 - link
No, I don't agree again.
You explicitly talket about CortexA9 and Atom uarch, _not_ their SoC implementation.

You can not use the total SoC score as uarch benchmark - simply because it don't rule out differences in cores number. To measure uarch performances you need to do a core-by-core comparison. Let me do an example: using total SoC score, a 4xA9 SoC is faster then 2xA15 one. However, the latter uarch is considerably more advanced.

A very similar argument can be done for frequency: Atom was _from the start_ designed to hit a relatively high-clock, yet low power target. This was deliberately done to exploit Intel 45/32nm HKMG process, which don't scale power down much for lower frequency target. It is simply a question of design targets: for low power chips, you can get (relatively) high-freq _or_ (relatively) high IPC - not both (actually).

So, you must decide: are you comparing uarch of final SoC implementation? Because, from an uarch point, Atom win. From a performance/watt metric, their bare cores tend to be on par. From a final product specification, A9 is way better because there are many high-integrated, low power, low cost SoCs from a multitude of vendors. On contrast, Atom-based SoCs are offered only by Intel and with a much lower integration factor (and higher cost) - until now,where they latest platform begin to be very competitive against older A9 SoC.

The "little problem" is that ARM is shipping with 2x and 4x A15 cores, and against them Atom is a disvantage.

Regards.
Wilco1 - Saturday, June 22, 2013 - link
While Atom was indeed designed for high frequency, A9 reaches higher frequencies: Atom maxes out at 2GHz on 32nm, while A9 does 1.7GHz on 40nm and 2.3GHz on 28nm. So you can't claim a "microarchitecture" win for Atom when you compare against a low clocked A9.

Secondly, since you argue that frequency is an important aspect of the microarchitecture, I would argue that core count matters equally. A9 was designed to be simple and small, so it is typically used as a quad-core. On the other hand Atom is a large and complex core which uses Hyperthreading rather than multiple cores. So if you want to do a fair comparison with Hyperthreading enabled then you have to use 2 A9 cores for every Atom core. That's how they have been designed to be used.

What is the difference between a module, a HT enabled core and a dual core? These are just different ways of improving multithreaded performance with different hardware tradeoffs - but to software they all appear identical.

In conclusion: you cannot just pick whatever comparison you want. Either you compare the whole SoC, including its frequency as well as core count, or you compare microarchitectures normalized on core count and frequency. You can't include one but not the other as frequency, core count and TDP are related.
shodanshok - Sunday, June 23, 2013 - link
So, you started about in-order vs OoO and now you are speaking of die size and perm/mm2?

1) While CortexA9 was rated for 2 GHz operation, a single A9 core would dissipate more than 2 Watt at this frequency. Atom is not so much different in this reguard. Moreover, can you point me a phone that use a 2 GHz A9 implementation? I bet no.

2) Atom is also MP form the start: it has the same bus unit and MP capability of Netburst uarch. By which metrics these are inferior to the ARM MP implementation?

3) By die size comparison, A9 is clearly better then Atom. However, its performance are lower.

4) HT is simply a smart sharing of some key structure in order to interleave two thread on the same core. You can not count HT as another core. For example, barrel microprocessors can interleave many threads on a single core: Sun T1 can inteleave 4x threads per core, T2 8x core. Do you count T1 as having 32 cores? If so, you are wrong.

Both I and other users pointed you many reviews and benchmarks where Atom is clearly identified as faster then A9. However, you contine to change metrics.

The only benchmark that paint a different picture is Geekbench, which show A9 in the same league as Sandy Bridge. Do you _really_ think this is true? In SPEC benchmarks, SB is quite close to the big, power hungry but powerfull POWER7. Do you really think that A9 is remotely comparable to this core? Really?

I already stated this: if you compare SoCs, well, A9 wins, because there are many well done SoCs based around it. However, from uarch/performance side, Atom wins.

The funny thing is that is now totally irrelevant: A9 is superseeded by A15, and Atom is very near its EOL. Moreover, Jaguar seems to be a very competent table chip.

Regards.
MrPhilo - Sunday, June 23, 2013 - link
Unfair to compare the A9's to Atom. The Tegra 2 was a old revision of A9 while lacking NEON etc. The newer A9 are more fair to compare. Also a single A9 at 2Ghz wont produce 2 watts at all, the 2.3Ghz Tegra 4i would be worse than the A15 if it did. Remember the nm is 28 not the old 40's.

Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested

CPU Performance

AndEBench

Post Your Comment

115 Comments

View All Comments

shodanshok - Thursday, June 20, 2013 - link

Wilco1 - Friday, June 21, 2013 - link

shodanshok - Friday, June 21, 2013 - link

Wilco1 - Friday, June 21, 2013 - link

shodanshok - Saturday, June 22, 2013 - link

Wilco1 - Saturday, June 22, 2013 - link

shodanshok - Saturday, June 22, 2013 - link

Wilco1 - Saturday, June 22, 2013 - link

shodanshok - Sunday, June 23, 2013 - link

MrPhilo - Sunday, June 23, 2013 - link

Log in

Don't have an account? Sign up now