Quick update: more Interlagos testingby Johan De Gelas on December 8, 2011 5:11 AM EST
As promised in our last Opteron "Interlagos" review, we have been taking the time to deepen our understanding of AMD's newest Interlagos server platform and the "Bulldozer" architecture. Server reviewing remains a complex undertaking: some of the benchmarks take hours to set up and run, and power management policies, I/O subsystems and configuration settings can completely alter the outcome of a benchmark. That sounds very obvious right? It is not in practice.
Let me give you an example how subtle server benchmarking can be. One of the benchmarks missing in the original review was the MS SQL server benchmark, and for a reason. We did some extensive scaling benchmarks and our gut feeling told us that some of the results were a bit off the mark. So we kept the benchmark out of the original review until we pinpointed the problem.
Just a few days ago, we found out that a tiny bit of time-outs (1%, caused mostly by a data provider time out setting) can boost the results by about 20% erroneously as the actual workload is decreased. So our MS SQL server benchmark was not as accurate as we thought it was. Luckily we have solved all problems, and the benchmark is now more accurate than ever. You can expect to see the MS SQL server benchmarks on different server platforms and an in depth analysis in a forthcoming article.
While solving the MS SQL Server benchmark issues required a lot of testing, analysis and debate with Dieter, the lead developer of our stress testing tool vApus, we missed a more obvious tweak that could have improved our blender benchmarking. Luckily, we still have a community that is willing to give us valuable feedback. Greg Wereszko point out that our Blender benchmark cuts the render job up into only 64 tiles (X=8, Y=8). The result is that near the end of the test several cores are inactive, especially on the Interlagos Opteron (32 cores/threads).
So we increased the number of tiles beyond 8x8, to check if this improves performance on our 32 and 24 thread machines, and it did. (Quick note: the Blender benchmark on Windows is one of the worst benchmarks for the Opteron Interlagos, so see this as "worst case" performance point.)
Instead of trailing behind the Opteron 6174, the Opteron "Interlagos" 6276 manages to perform a tiny bit better than its older sibling when we use 256 (16x16) tiles. The Opteron 6276 improves performance by 24%, the Xeon X5650 and Opteron 6174 by 19%.
Using more tiles, all CPUs are able to show their top performance. It also shows the rather "fragile performance profile" of the new Opteron. Many users are going to use standard settings and will never bother with this kind of tuning. As a result they are not going to use the full potential of the new Opteron. The Xeon's higher single-threaded performance makes it less vulnerable to less optimal software settings.
At the other side of the coin, once well tuned the Opteron 62xx offers an interesting performance per dollar ratio and this "fragile performance profile" may become very robust in FP intensive applications once the use of AVX gets widespread. We are taking quite a bit of time to make sure that the next server article can give more detailed information, but rest assured that we did not give up: we will update our server benchmarking...when it is finished.