The Quest for More Processing Power, Part Three: "Multi core of Intel and AMD compared"by Johan De Gelas on May 18, 2005 3:15 PM EST
- Posted in
Dual core Opteron versus Pentium-D and DempseyThere is no doubt that the Dual core Opteron architecture is more advanced and elegant than the Pentium-D and even future Netburst based dual cores such as Dempsey (Xeon). The Pentium-D Dual core is more a way of packaging than an actual architecture: two cores cut out the wafer together, and communicate via an external FSB.
With two different L1 and L2-caches, and two CPUs working on the same variables, you risk that one of the CPUs is working on an outdated cached value. You need to make sure that if variable A is cached on both CPUs, and CPU 1 changes the value of variable A, then CPU 2 knows about it. This happens with a protocol called MESI on the Intel CPUs and MOESI on the AMD CPUs. The discussion of these “cache coherency protocols” is outside the scope of this article, but you understand that the more variables that are shared between the two CPUs, the more communication that will happen between the caches of the different CPUs.
In the case of the Pentium-D, the caches talk to each other (to keep cache consistency) via a shared 800 MHz bus, just like two single core SMP Xeons. Not only is 800 MHz relatively slow compared to the CPU (3200 MHz), but exchanging information via a bus also increases latency and lowers bandwidth. Latency is increased as the bus may not always be free - one of the CPUs might be using it to transfer data to or from the memory. This half duplex bus can only transmit signals of one device (CPU 1, CPU 2, chipset) at a given moment. Bandwidth is decreased as the cache coherency exchanges need a small amount of time on the bus too.
Enter the elegant dual core Opteron architecture. Each core in the Dual Opteron/Athlon 64 X2 puts its request on the System Request Queue (SRQ).
Each CPU has its own dedicated port to the on die SRQ (as you can see in the picture below), so cache-to-cache messaging happens at core speed, with minimal latencies.
The big question is now: can we quantify the more sophisticated nature of the Opteron’s dual core architecture? Yes, we can.