Final Thoughts

In the last few years, AMD hasn't really been able to fight against Intel in the high-end CPU market. Pretty much since the release of the Nehalem microarchitecture in late 2008, Intel has held the crown of fastest CPUs and AMD has only been the best option for budget builds. Bulldozer has suffered from delays and recently AMD delayed it even more because the performance didn't meet their expectations. However, Bulldozer could have the potential to shake Intel's position in other than the budget CPU market.

According to leaked product positioning slides, Zambezi is aimed to fight against Intel's Core i5 and i7 lineups. Zambezi will feature up to eight cores, which is twice as many as i7-2600(K)'s four cores. AMD said that they won't join the Hyper-Threading club and they will deliver as many physical cores as Intel delivers physical and virtual cores combined. It looks like AMD is keeping their word, though they're only delivering half as many "FP/SSE cores". Intel will probably still provide the best single-threaded performance but AMDs aggressive approach with many physcial cores may bring them the trophy of best multi-threaded performance. We shall hopefully see this very soon.  

In the server market, AMD's role is a lot more complex. For some HPC applications, AMD offers the best performance at a much lower price. In the midrange, AMD based servers offer more cores (quad-socket) and (in most cases) higher performance for a relatively small price premium over the typical dual-socket Xeon servers. At the same time, if your applications cannot make good use of all those cores, dual-socket Xeon servers can offer a better performance/watt ratio and lower response times. In the high end, Intel Xeon E7 completely dominates, and AMD has left this market for now. In the low power market, Intel's low power Xeons offer a better performance/watt and AMD can only compete when every dollar counts. In most cases, the price of the server CPU is less important in the grand TCO scheme.

In other words, AMD really needs a server CPU with a much higher performance per core and a better performance/watt ratio. TDP Power Cap or configurable TDP helps AMD's server CPUs keep the electricity bill down by avoiding "bursty" power usage. At the same time, with their implementation, TDP Power Cap should have little effect on the real world (not pure throughput benchmarking) performance if you do not lower the TDP too much. We won't be sure until we have measured it, but it looks like a big step in the right direction: lower TCO and more predictable power usage without a (large) performance penalty.

AMD's Future Plans

Second Generation AMD Fusion lineup
Codename Krishna and Wichita Trinity Komodo Sepang Terramar
Architecture Enhanced Bobcat NG Bulldozer NG Bulldozer NG Bulldozer NG Bulldozer
SOI 28nm 32nm 32nm 32nm 32nm
Core count 1-4 2-4 6-10 Up to 10 Up to 20
DX11 IGP Yes Yes No No No
Socket N/A N/A N/A C2012 G2012

Bulldozer will make its way to mainstream CPUs in 2012. Llano's successor, Trinity, will feature up to four next-generation Bulldozer cores. Next-generation (NG) in this context appears to mean that AMD will tweak the architecture because the CPUs will still be manufactured using 32nm SOI. Zambezi's successor, Komodo, will again increase the core count and make it up to 10.

As for the server market, AMD's approach will be a bit more aggressive. AMD will again increase the amount of cores to up to 20 NG Bulldozer cores. Valencia's successor will be 10-core Sepang and Interlagos' will be 20-core Terramar. The server CPUs will also feature PCIe 3.0 support.

Krishna and Wichita will also replace AMD's current Ontario and Zacate APUs. There will be a die shrink from 40nm to 28nm so at this point, Krishna and Wichita look the most interesting from the 2nd gen Fusion lineup. Doubling the cores should yield a nice performance boost in heavily threaded scendarios, though single-threaded performance is still a sore spot for Bobcat compared to other architectures.

Bulldozer's Power Management
Comments Locked

59 Comments

View All Comments

  • duploxxx - Friday, July 15, 2011 - link

    according to many, anything which is branded "PENTIUM" is the uber CPU doesn't matter what is behind....
  • Broheim - Friday, July 15, 2011 - link

    >according to many

    source?

    don't have one? then gtfo.
  • formulav8 - Friday, July 15, 2011 - link

    Grow up. He was just messing around
  • Broheim - Friday, July 15, 2011 - link

    no, he's a raging AMD fanboy. I have yet to see a single post from him that doesn't bash intel or praise AMD in some form or another.
  • AnandThenMan - Friday, July 15, 2011 - link

    So he's the exact opposite of you.
  • Broheim - Saturday, July 16, 2011 - link

    erm, I have nothing against AMD, this rig has an unlocked HD6950...

    are you just butthurt because I called you out on your bitching about Anand's benchmarking?
  • just4U - Saturday, July 16, 2011 - link

    Currently I am on a Sandy Bridge 2500k and in the last year I've been on a i7 920 a 1055T, and a few $60 amd cheapies. As far as I am concerned they are all good. I didn't notice night and day improvements like I did when I moved to the A64 and Core2. So I think we are sort of at a ceiling limit right now (excepting specific tasks) where just about any new cpu is good enough.
  • JohanAnandtech - Friday, July 15, 2011 - link

    it is possible that your tests are using the x87 FPU. The Phenom can process up to 3 instructions per cycle out of order, while the P4 can hardly sustain one FP per cycle.

    Parallel, multithreaded software is of course much faster on a 6-core than a single P4 core :-).

    And it would be very hard to find a benchmark where P4 at 4 GHz is faster than a Phenom II 2.8 GHz. I can not imagine that anyone has published one. The P4 has a much slower memory interface (very high latency vs Phenom IMC), much smaller caches (16 KB vs 64 KB L1) and is outmatched in every aspect of FP processing power (64 vs 128 SIMD, Tripple fast x87 FPU vs single slow one) ...
  • SanX - Friday, July 15, 2011 - link

    Amazing was that performance increase by factor of two was per CPU of course. The whole 6-core not overclocked AMD CPU was 2.42/0.50 or almost 5 times faster then 2-core overclocked to 3.8GHz Intel E8400!

    Here are the numbers for the parallel algebra (you can download the test code from equation dot com or i have it too for different compilers) for Intel and AMD in seconds when i switch ON different amount of cores

    1 4.64 seconds
    2 2.42

    1 2.46
    2 1.22
    3 0.83
    4 0.67
    5 0.58
    6 0.50

    I invite anyone to do the test on their CPUs.
  • JarredWalton - Friday, July 15, 2011 - link

    Using 64-bit "bench1_gfortran_64.exe":

    Core 2 QX6700 @ 3.2GHz:
    1 CPU = 4.55s
    2 CPU = 2.33s
    3 CPU = 1.62s
    4 CPU = 1.34s

    Core i7-965 @ 3.6GHz:
    1 CPU = 3.93s
    2 CPU = 1.97s
    3 CPU = 1.33s
    4 CPU = 1.01s
    5 CPU = 0.87s
    6 CPU = 0.80s
    7 CPU = 0.72s
    8 CPU = 0.69s

    Of course, none of that really tells us much, because we don't know how the application was compiled or what optimizations are in place. There's only one 64-bit compiled version but there are four 32-bit compiled versions. Let's just see what happens with the 32-bit versions on the QX6700 for a second:

    Core 2 QX6700 @ 3.2GHz Absoft:
    1 CPU = 7.01s
    2 CPU = 3.54s
    3 CPU = 2.40s
    4 CPU = 1.90s

    Core 2 QX6700 @ 3.2GHz gfortran:
    1 CPU = 10.73s
    2 CPU = 5.40s
    3 CPU = 3.67s
    4 CPU = 2.87s

    Core 2 QX6700 @ 3.2GHz Intel Fortran:
    1 CPU = 4.70s
    2 CPU = 2.40s
    3 CPU = 1.76s
    4 CPU = 1.47s

    Core 2 QX6700 @ 3.2GHz Lahey/Fujitsu:
    1 CPU = 5.38s
    2 CPU = 2.73s
    3 CPU = 1.95s
    4 CPU = 1.56s

    What does that tell us? As expected, the Intel compiler version is the fastest in 32-bit mode. What's more, the gfortran 32-bit version is the slowest on Intel. Since the only 64-bit version is from gfortran, it would appear that a 64-bit Intel version would come in around twice as fast. That's only speculation based on the 32-bit compiled executables, but given your above numbers it looks like you're probably using the 64-bit version. (If not, why does my 3.2GHz quad-core outperform your 3.8GHz dual-core when looking at the 32-bit Intel speeds?)

    Anyway, there are certain types of code that AMD does quite well at running, but overall I'd say it's clear that Intel's Nehalem/Lynnfield/Sandy Bridge CPUs are significantly faster than the Phenom II X6 offerings.

Log in

Don't have an account? Sign up now