Benchmark Configuration

Since AMD sent us a 1U Supermicro server, we had to resort to testing with our 1U servers again. That is why we went back to the ASUS RS700 for the Xeon server.

Supermicro A+ server 1022G-URG (1U Chassis)

CPU 2x AMD Opteron Interlagos 6276 (2.3GHz, 8 cores per CPU, 16 integer clusters)
2x AMD Opteron Interlagos 6220 (3.0GHz, 4 cores per CPU, 8 integer clusters)
2x AMD Opteron Magny-Cours 6174 (2.2GHz, 12 cores per CPU)
RAM 64GB (8x8GB) DDR3-1600 Samsung M393B1K70DH0-CK0
Motherboard SuperMicro H8DGU-F
Chipset AMD Chipset SR5670 + SP5100
BIOS version v2.81 (10/28/2011)
PSU SuperMicro PWS-704P-1R 750Watt

The AMD CPUS have four memory channels per CPU. The new Interlagos Bulldozer CPU supports DDR3-1600 and thus our dual-CPU configuration uses eight DIMMs for maximum bandwidth and performance. We ran with one DIMM per channel.

Asus RS700-E6/RS4 1U Server

CPU 2x Intel Xeon X5650 (2.66GHz, 6 cores/12 threads)
RAM 48GB (12x4GB) Kingston DDR3-1333 FB372D3D4P13C9ED1
Motherboard Asus Z8PS-D12-1U
Chipset Intel 5520
BIOS version 1102 (08/25/2011)
PSU 770W Delta Electronics DPS-770AB

To speed up testing, we ran the Intel Xeon and AMD Opteron system in parallel. As we didn't have more than eight 8GB DIMMs, we used our 4GB DDR3-1333 DIMMs for the Xeon server. The Xeon system only ends up with 48GB, but this is no disadvantage as our benchmark with the highest memory footprint (Nieuws.be/SQL Server 5 tiles) uses no more than 30GB of RAM.

We measured the difference between 12x4GB and 8x8GB of RAM and recalculated the power consumption for our power measurements (note that the differences were very small). There is no practical alternative as our Xeon has three memory channels and cannot be optimally configured with the same amount of RAM as our Opteron system (which has four channels).

We chose the Xeons based on AMD's positioning. The Xeon X5649 is priced at the same level as the Opteron 6276 but we didn't have the X5649 in the labs. As we suggested in our previous article, the Opteron 6276 should reach the performance of the X5650 to be attractive, so we tested with the X5650.

Common Storage System

Both servers used intel 710 SSDs for storing the database.

Software configuration

All Windows testing was done on Windows 2008 R2 SP1. The Linux tests are done on Ubuntu 11.10 Linux kernel 3.0.0-14 SMP x86_64.

Other

Both servers were fed by a standard European 230V (16 Amps max.) powerline. The room temperature was monitored and kept at 23°C by our Airwell CRACs. We used the Racktivity ES1008 Energy Switch PDU to measure power. Using a PDU for accurate power measurements might seem pretty insane, but this is not your average PDU. Measurement circuits of most PDUs assume that the incoming AC is a perfect sine wave but it never is. However, the Rackitivity PDU measures true RMS current and voltage at a very high sample rate: up to 20,000 measurements per second for the complete PDU.

Introduction SQL Server 2008 R2 "OLAP" Workload
Comments Locked

46 Comments

View All Comments

  • sonofgodfrey - Thursday, February 9, 2012 - link

    Have you explicitly tested one socket vs. two sockets? We've found an immense increase in contention once a cache-line has to be shared between sockets on some systems.
  • JohanAnandtech - Friday, February 10, 2012 - link

    That is one suggestion I will try out next week. Thanks!
  • Klimax - Thursday, February 9, 2012 - link

    Hello.

    Nice tests.

    However I would like to see MySQL tested on Windows Server 2008 R2
    Would be interesting comparsion.

    (Especially due to http://channel9.msdn.com/shows/Going+Deep/Arun-Kis... )
  • Klimax - Thursday, February 9, 2012 - link

    Title of post is wrong... (I have deleted second thing and forgot to fix title)
  • Scali - Thursday, February 9, 2012 - link

    Unless I'm mistaken, the Xeon 5650 is a 1.17B transistor chip, where the Interlagos 6276 is a 2.4B transistor chip.
    In that light, doesn't that make Intel's SMT implementation a lot better than CMT?
    I mean, yes CMT may give more of a performance boost when you increase the threadcount. But considering the fact that AMD spends more than twice the number of transistors on the chip... well, that's pretty obvious.
    AMD might as well just have used conventional cores.
    The true strength of SMT is not so much that it improves performance in multithreaded scenarios, but that it does so at virtually no extra cost in terms of transistors (and with little or no impact on the single-threaded performance either).
  • JohanAnandtech - Friday, February 10, 2012 - link

    Interlagos is 1.2 billion chip (maybe 1.3 but anyway). Most of those transistors are spend on the L3 cache: about 0.5 billion. Only 213 million transistors are in a module and each module contains a 2 MB L2-cache, probably good for 120 million transistors. That leaves 90 million transistors to the core, and it has been stated that the second cluster added 12%. So that second cluster costs about 12 million transistors, or 48 million on the total 4 module die. That is less than 5% of the total transistor count but you get a 30-90% performance boost!

    So for AMD, this was clearly a great choice.

    SMT is perfect for Intel, as the Intel architecture puts all instructions in one big ROB.

    For very low IPC serverworkloads, I think the CMT approach gives better results. Unfortunately AMD lowered some of the CMT benefits by keeping the datacache so small and the low associativity of the Icache.
  • Scali - Friday, February 10, 2012 - link

    Uhhh, I think you're wrong here... the 4-module Bulldozer is a 1.2B chip (Zambezi). But you tested the 8-module Interlagos (16 threads), which is TWO Zambezi dies in one package.
    Hence 2*1.2 = 2.4B transistors.
  • JohanAnandtech - Friday, February 10, 2012 - link

    Ok, it is two chips of 1.2 billion. That doesn't change anything about our analyses of CMT.
  • Scali - Friday, February 10, 2012 - link

    Not in the article, because you did not factor in transistor count (which is the flaw I tried to point out in the first place... comparing two chips, where once is twice the transistor count of the other, is quite the apples-to-oranges comparison. One would expect a chip with twice the transistorcount to be considerably better in multithreading scenarios, not 'catching up' to the smaller chip).

    But in your above post, I think it changes everything about your analysis. All your figures have to be done times two.
    Which makes it a very poor comparison, not only to Intel, but also to AMD's own previous line of CPUs.
    The 6174 Magny Cours is actually beating Interlagos, with 'only' 12 threads, no kind of CMT/SMT, and 'only' 1.8B transistors.

    How does that make CMT look like a great choice for AMD?
  • slycer.tech - Friday, February 10, 2012 - link

    What i read on benchmark configuration page, Anand used 2x Intel Xeon X5650. So 2x 1.17B = 2.34B. I think it is comparable to AMD CPU used in this test. Am I right?

Log in

Don't have an account? Sign up now