AMD Unveils Bulldozer & Bobcat: 2011 Microachitectures
by Anand Lal Shimpi on November 11, 2009 5:00 PM EST- Posted in
- CPUs
I spoke too soon. Earlier today I outlined AMD’s roadmap for 2010 - 2011. In 2011 AMD will introduce two next-generation microarchitectures: Bulldozer for the high end desktop and server space and Bobcat for the price/power efficient ultra mobile market. I originally said that AMD wasn’t revealing any more about its next-gen architectures, but AMD just proved me wrong as they unveiled the first block diagrams of both cores.
First up, Bulldozer. I hinted at the architecture in this afternoon’s article:
“A major focus is going to be improving on one of AMD’s biggest weaknesses today: heavily threaded performance. Intel addresses it with Hyper Threading, AMD is throwing a bit more hardware at the problem. The dual integer clusters you may have heard of are the route AMD is taking...”
And here’s the block diagram:
Bulldozer: AMD's Latest Leap Forward, will it be another K8 to Intel's Sandy Bridge?
This is a single Bulldozer core, but notice that it has two independent integer clusters, each with its own L1 data cache. The single FP cluster shares the L1 cache of the two integer clusters.
Within each integer “core” are four pipelines, presumably half for ALUs and half for memory ops. That’s a narrower width than a single Phenom II core, but there are two integer clusters on a single Bulldozer core.
Bulldozer will also support AVX, hinted at by the two 128-bit FMAC units behind the FP scheduler. AMD is keeping the three level cache hierarchy of the current Phenom II architecture.
A single Bulldozer core will appear to the OS as two cores, just like a Hyper Threaded Core i7. The difference is that AMD is duplicating more hardware in enabling per-core multithreading. The integer resources are all doubled, including the schedulers and d-caches. It’s only the FP resources that are shared between the threads. The benefit is you get much better multithreaded integer performance, the downside is a larger core.
Doubling the integer resources but not the FP resources works even better when you look at AMD’s whole motivation behind Fusion. Much heavy FP work is expected to be moved to the GPU anyway, there’s little sense in duplicating FP hardware on the Bulldozer core when it will eventually have a fully capable GPU sitting on the same piece of silicon. While the first incarnation of Bulldozer, the Zambezi CPU, won't have an on-die GPU, presumably future APUs will use the new core. In those designs the Bulldozer cores and the GPU will most likely even share the L3 cache. It’s really a very elegant design and the basis for what AMD, Intel and NVIDIA have been talking about for years now. The CPU will do what it does best while the GPU does what it is good at.
Fascinating.
AMD’s Next-Generation Ultramobile Core: Bobcat
Next up is Bobcat:
AMD says that a single Bobcat is capable of scaling down to less than one watt of power. Typically a single microarchitecture is capable of efficiently scaling to an order of magnitude of TDP. If Bobcat can go low as 0.5W, the high end would be around 5W. If it’s closer to 1W at the low end then 10W would be the upper portion. Either way, it’s too low to compete in current mainstream notebooks, meaning that Bobcat is strictly a netbook/ultraportable core as AMD indicated in its slides. Eventually Bulldozer will probably scale down to take care of the mainstream mobile market.
AMD provided very little detail here other than it delivers 90% of today’s mainstream performance in less than half of the silicon area. If AMD views mainstream as an Athlon II X2, then Bobcat would deliver 90% of that performance in a die area of less than 60mm^2.
Clearly this is bigger than Atom, but that’s just a guess. Either way, the performance targets sound impressive. SSE1-3 are supported as well as hardware virtualization.
AMD wouldn’t tell me what process it would be made on but they did hint that Bobcat would be easily synthesizable. I take that to mean it will be built on a bulk 28nm process at Globalfoundries and not 32nm SOI.
Both of these cores will be out in 2011. We just need to make it through 2010 first.
68 Comments
View All Comments
JumpingJack - Thursday, November 12, 2009 - link
Actually, I am not so sure AMD would need a 'license', SMT (symmetric multi threading) has been around for years, researched, published, and implemented in many other processor designs besides x86. I am not so sure Intel would hold a patent to something as generic as SMT (i.e. Hyperthreading), there could be underlying details on the exact implementation that are patented but at that level I would suspect there is 'more than one way to skin a cat', meaning so long as the details of any implementation is unique there is no harm no foul.However, it is clear AMD is taking a much more radical approach to multithreading into a single execution core, it is almost a melded dual core chip. This will certainly yield bigger gains in a multithreaded environment over the more simple SMT approach. How much more will remain to be seen.
Alberto - Friday, November 13, 2009 - link
I don't know if this new Amd solution is good. It is less elegant than SMT and very "buldozer" like. Likely the new cpu will be clocked lower than Intel counterpart in order to stay in the TDP limits.So the +90% versus Intel +40% thanks to the new SMT done well, will be undermined by problems of power consumption.
Moreover this cpu seems "not for consumer space" but server only. Two full integer units with L1 cache mean more die space, lower yields, more leakage and they are useless in consumer SW.
This is the reason why Amd do not utilize the new core in Fusion.
Another K8?. Yes it seems.....Amd has done the same old mistake.
They have not the technical resources to make a new Cpu good for all utilizations.
JumpingJack - Saturday, November 14, 2009 - link
I don't think it is fair to call it less elegant, it is similar in some ways but radically different in others.With Intel's SMT implementation, they simply needed logic to track and manage two distinctly different contextual threads, cache, the pipeline, dispatch, execution resources are all shared either fixed sharing or demand based sharing. Translation, it is possible that from the perspective of one thread or the other (depending on the workload/environment) each application could perform worse than if run alone, even if the total number of instructions retired is higher.
AMD is doing much more here, as it appears they are not sharing nearly as much resources but actually duplicating them. There are trade offs, of course, die size and efficiency with respect to performance. AMD seems to be designing to the concept that the software infrastructure will be much more advanced in multithreaded capabilities at the time they release their product. A common quandry of any CPU designer -- what will the market/software environment look like years from now such that 'my product' fits best within that framework.
Calin - Thursday, November 12, 2009 - link
Performance could be improved up to some 90% in integer heavy workloads, but it might come with a small decrease in performance in floating-point heavy benchmarks. Remember that the first hyperthreading version from Intel had a performance increase between +40% and -20% (and in some very particular cases, even worse than that)blyndy - Thursday, November 12, 2009 - link
CPU design is a patent minefeild; all the other/better ways of skinning the cat could well be locked up, so I think they would need at least a few patent licenses from someone.To be honest I think it's amazing that AMD can even compete with Intel considering that Intel is 10x bigger and has 10x more R&D and patents.
Samus - Wednesday, November 11, 2009 - link
why license jackson technology from intel when they have a perfectly competative alternative architecture design?hamiltonham - Friday, May 28, 2010 - link
In short, each module consists of two integer(whole number) coresand one floating point (number with decimals) core.
It's an architecture deigned for integer calculations.
where the majority of Intel's processor architectures are designed for FP calculations.
the computer sees the two Int cores as separate processors,
even though all of the FP calculations go to the same core.
so AMD can market it as a 8 or 16 core processor.
Simple enough?
tomsworkshop - Friday, December 24, 2010 - link
lets say if the upcoming AMD dual core cpu named as Phenom III X2 , 1 Phenom III X2 has 2 bulldozer core inside , each bulldozer core has 2 tiny core inside , and each tiny core has 2 pipeline inside ?