No More Apple Mysteries, Part Twoby Johan De Gelas on September 1, 2005 12:05 AM EST
- Posted in
IntroductionA little bit more than a month ago, AnandTech published "No more mysteries: Apple's G5 versus x86, Mac OS X versus Linux" with the ambitious goal of finding out how the Apple platform compares, performance-wise, to the x86 PC platform. The objective was to find out how much faster or slower the Apple machines were compared to their PC alternatives in a variety of server and workstation applications.
Some of the results were very surprising and caught the attention of millions of AnandTech readers. We found out that the Apple platform was a winner when it came to workstation applications, but there were serious performance problems when you run server applications such as MySQL (Relational Database) or Apache (Webserver). The MySQL database running on Mac OS X and the Dual G5 was up to 10 times slower than on the Dual Opteron running Linux.
We suspected that Mac OS X was to blame as low level OS benchmarks (Lmbench 3.0) indicated low OS performance. The whole article was a first attempt to understand better how the Apple platform - Mac OS X + G5 - performs, and as always, first attempts are never completely successful or accurate. As we found more and more indications that the OS, not the CPU, was the one to blame, it became obvious that we should give more proof for our "Mac OS X has a weak spot" theory by testing both the Apple and x86 machines with Linux. My email was simply flooded with hundreds of requests for some Linux Mac testing...even a month after publication!
That is what we'll be doing in this article: we will shed more light on the whole Apple versus x86 PC, IBM G5 versus Intel CPU discussion by showing you what the G5 is capable of when running Linux. This gives us insight on the strength and weakness of Mac OS X, as we compare Linux and Mac OS X on the same machine.
The article won't answer all the questions that the first one had unintentionally created. As we told you in the previous article, Apple pointed out that Oracle and Sybase should fare better than MySQL on the Xserve platform. We will postpone the more in-depth database testing (including Oracle) to a later point in time, when we can test the new Apple Intel platform.
Why Bother?Why do we bother, now that Apple has announced clearly that the next generation of the Apple machines will be based on Intel? Well, this makes our research even more interesting. As you will see further in the article, the G5 is not the reason why we saw terrible, slow performance. In fact, we found that the IBM PowerPC 970FX, a.k.a. "G5", has a few compelling advantages.
As Apple moves to Intel, the only thing that makes Apple unique, and not yet another x86 PC OEM, is Mac OS X. That is why Apple will attempt to prevent you from running an x86 version of Mac OS X on anything else but their own hardware (using various protection schemes), as Anand reported in "Apple's Move to x86: More Questions Answered". Mac OS X will be the main reason why a consumer will choose an Apple machine instead of a Dell one. So, as we get to know the strengths and weaknesses about this complex but unique OS, we'll get insight into the kind of consumers who would own an Intel based machine with Mac OS X - besides the people who are in love with Apple's gorgeous cases of course....
We also gain insight into the real reasons behind the move to Intel, and what impact it will have for the IT professional. Positive but very vague statements about the move to the Intel architecture have already been preached to the Apple community. For example, it was reported that the "Speed of Apple Intel dev systems impress developers". Proudly, it was announced that the current Apple Intel dev systems - based on a 3.6 GHz Intel Pentium 4 with 2 MB L2 Cache - were faster than a dual G5 2 GHz Mac. That is very ironic for three reasons.
Firstly, Apple's own website contradicts this in every tone. Secondly, we found a 2.5 GHz G5 to perform more or less like a Pentium 4 3 - 3.2 GHz in integer tasks. So, a 2 GHz G5 is probably around the speed of a 2.6 GHz Pentium 4. It is only natural that a much faster single CPU with a better disk and memory system outpaces a slower dual CPU in single threaded booting and development tasks. Thirdly, the whole CPU industry is focused now on convincing the consumers of how much better multi-core CPUs are compared to their "old" single core brethren.
Post Your CommentPlease log in or sign up to comment.
View All Comments
tthiel - Wednesday, May 24, 2006 - linkYou need to redo this entire test. So much has come out about how poorly this was done its hard to believe it came from Anandtech.
iggie - Friday, January 13, 2006 - linkI'm surprised you didn't post the raw VM latency results from lmbench. I found http://www-128.ibm.com/developerworks/library/l-yd...">another article that did a similar performance comparison (Darwin vs. Linux on G5).
mmap latency is 3x greater, but most tellingly, page fault latency is > 900 x greater!
Did you observe similar results in your tests?
I would imagine that page faults would play a greater and greater role as more and more independent clients connect to a server. I have experienced a huge disparity in http://www.openmicroscopy.org/api/omeis/">our own server software implementation for scientific imaging. In our case, all disk access is done via mmap and page faults (its a shared-VM-based image server system meant to serve many terabytes of image data)
asifyoucare - Sunday, September 4, 2005 - linkInteresting article.
If you suspect that thread performance is the bottleneck, why not write a short program to measure how many threads can be created and destroyed per second?
DoctorBooze - Saturday, September 3, 2005 - link
I'm no guru but I don't think that's true now with Native Posix Threads, which you get in 2.6 kernels with a suitable libc (and some distros with 2.4 kernels). Check what your program's linked with: on my Fedora Core 3 system `ldd /usr/libexec/mysqld` shows me MySQL is linked with /lib/tls/libc.so.6 and running that shows it has NPTL. The API may be similar but what happens in the kernel isn't and it makes a big, big difference to MySQL. Still, Linux now has fast native POSIX threads and it looks like OS X doesn't.
ikruusa - Saturday, September 3, 2005 - linkIndeed, as mentioned previously there was some mistakes in gcc options. And SIMD optimization is really basic in 4.0.x - only certain loops can be vectorized automatically. But loops around arrays are most significant part in signal processing and that is where SIMD really matters :)
As we know for NetBurst arch it is recommended to use XMM registers (that is registers for SSE/SSE2) for FP calculations. And that is what gcc 3.x does (4.x too): -mfpmath=sse triggers all x87 stuff to run as scalar math using SSE command-set. As I know AltiVec is SIMD unit which is smoothly added to PowerPC pipeline. How useful there is scalar math instead of usual FP - I have no idea.
What I want to say - my opinion is that if MySQL team has something to say about compiler options then they have documents about it. Using SIMD style processing in DB engine is very challenging exercise for coders. Dont expect magic from compiler here. Hint: maybe Intel's own icc compiler provide some magic but you have to prove it ;) I still believe that the most useful options can be -O[2,3] -funroll-loops and -ffast-math (as you mentioned) with -arch=[processor]. The last one should provide basic branching elimination (e.g. using cmov for x86) and correct instr. ordering.
About testing Linux. I have some skills in Apache testing with JMeter. I have been quite stuck but kernel developers were kind enough to help: http://marc.theaimsgroup.com/?l=linux-kernel&m...">http://marc.theaimsgroup.com/?l=linux-kernel&m...
Then I discovered all OS tuning possibilities in /proc Well, most are still unknown for me but I just want to get your attention here. Oracle talks about shared memory and number of semaphores and some particular Linux /proc parameters. Of course there should be all written in MySQL manual too if any parameter needs tuning. But is it enough to read MySQL manual and create profile for OS'es IPC and process management if we need to stress test MySQL on e.g 8-way SMP?
But still - good start of interesting investigation, anandtech.com!! Thank you and keep going!
kvs - Saturday, September 3, 2005 - linkIf thread-creation is extremely slow in Darwin, maybe MySQL-performance could be helped by enabled the thread cache? A look at 'mysqladmin extended-status' would show how many threads had been created and cached, and should reveal if thread_cache would be needed.
tester2 - Friday, September 2, 2005 - linkWell if ab on Mac OS X was the problem you could have easily tested this from a Linux box over the network.
Because you probably did this as well, and found out that performance tuning done by Apple outperformed the Linux/PPC and Linux/Opteron system by a substantial amount you keept this out of the story ...
So I did some testing, and yes when using ab from a Mac OS X I find the exact figures you report. Using a Linux Pentium 4 based system over Gb network gave me 6150 req/sec substantially faster then anything out there.
Look here for numbers from another source; http://www.pcmag.com/article2/0,1895,1637655,00.as...">http://www.pcmag.com/article2/0,1895,1637655,00.as...
The webserver runs around 60 threads ... go figure.
Yes there is a problem with the Mac OS X - Mysql combo if you are looking for performance, but jugging this as Mac OS X for server applications is a nono is drawing the wrong conclusion. I hope someone with good development skills will look at the mysql code and tune it to work well with Mac OS X.
benh - Friday, September 2, 2005 - linkInteresting article ! One thing that is worth looking into however is wether the YDL kernel is actually a 32 or a 64 bits kernel. This would probably have an impact on some of the numbers. I would expect the ppc64 kernel to perform faster overall on a 64 bits CPU with a small overhead on syscalls from 32 bits applications due to the argument size translation.
Also, the problem with the 2.7Ghz on linux is indeed a slight change in the firmware. It in fact looks like a bug in Apple Open Firmware device tree on those machine where they left out the properties providing the interrupt routing of the i2c controller in the north bridge used to drive the fan controller among others. The OS X driver silently falls back to a polled mecanism, while the linux driver doesn't and (shame on me!) used to have a small bug that would cause it crash when unable to locate those properties.
I posted a patch a while ago fixing that up, I would expect YDL to have an updated kernel/installer available by now.
Finally, you are right about the U3 northbridge having a quite high memory latency, that is definitely not helping the G5. There have been rumours floating around that Apple now has a new bridge that improves that significantly, though it's pretty much impossible to tell if/when they will release a machine using it. IBM also had multicore G5s available for some time now, though Apple is still not releasing any machine using them.
JohanAnandtech - Friday, September 2, 2005 - linkThanks for the very helpful feedback.
Do you have any idea why the U3 came with such high latency. Lack of development time? Lack of expertise? A inherent problem with the FSB of the G5? Rather old technology? You see I am very curious, and couldn't find much info on it.
benh - Friday, September 2, 2005 - linkI don't know for sure. I wouldn't blame the FSB though. I remember reading somewhere that the memory controller in U3 was similar if not identical to the old one they used in U2 on G4 machines and was to blame but I can't guarantee the reliability of that information.