GPU Cheatsheet - A History of Modern Consumer Graphics Processors
by Jarred Walton on September 6, 2004 12:00 AM EST- Posted in
- GPUs
DirectX 9 Performance
Below you can see our plot of the DirectX9 components.
9600 Pro | 400 | 600 | 4 | 1 | 2 | 128 | 1600 | 200 | 9155 | 100.0% | 100.0% | 100.0% | 100.0% |
DirectX 9 | |||||||||||||
GF 6800UE | 450 | 1200 | 16 | 1 | 6 | 256 | 7200 | 675 | 36621 | 450.0% | 400.0% | 337.5% | 475.0% |
X800 XT PE | 520 | 1120 | 16 | 1 | 6 | 256 | 8320 | 780 | 34180 | 520.0% | 373.3% | 390.0% | 470.6% |
X800 XT PE | 520 | 1120 | 16 | 1 | 6 | 256 | 8320 | 780 | 34180 | 520.0% | 373.3% | 390.0% | 470.6% |
X800 XT | 500 | 1000 | 16 | 1 | 6 | 256 | 8000 | 750 | 30518 | 500.0% | 333.3% | 375.0% | 443.1% |
GF 6800U | 400 | 1100 | 16 | 1 | 6 | 256 | 6400 | 600 | 33569 | 400.0% | 366.7% | 300.0% | 426.7% |
X800 GT? | 425 | 900 | 16 | 1 | 6 | 256 | 6800 | 638 | 27466 | 425.0% | 300.0% | 318.8% | 382.7% |
GF 6800GT | 350 | 1000 | 16 | 1 | 6 | 256 | 5600 | 525 | 30518 | 350.0% | 333.3% | 262.5% | 378.3% |
X800 Pro | 475 | 900 | 12 | 1 | 6 | 256 | 5700 | 713 | 27466 | 356.3% | 300.0% | 356.3% | 371.3% |
X800 Pro | 475 | 900 | 12 | 1 | 6 | 256 | 5700 | 713 | 27466 | 356.3% | 300.0% | 356.3% | 371.3% |
X800 SE? | 425 | 800 | 8 | 1 | 6 | 256 | 3400 | 638 | 24414 | 212.5% | 266.7% | 318.8% | 292.6% |
X700 XT? | 500 | 1000 | 8 | 1 | 6 | 128 | 4000 | 750 | 15259 | 250.0% | 166.7% | 375.0% | 290.3% |
GF 6800 | 325 | 700 | 12 | 1 | 5 | 256 | 3900 | 406 | 21362 | 243.8% | 233.3% | 203.1% | 272.1% |
GF 6800 | 325 | 700 | 12 | 1 | 5 | 256 | 3900 | 406 | 21362 | 243.8% | 233.3% | 203.1% | 272.1% |
GF 6600GT | 500 | 1000 | 8 | 1 | 3 | 128 | 4000 | 375 | 15259 | 250.0% | 166.7% | 187.5% | 241.7% |
GF 6800LE | 320 | 700 | 8 | 1 | 5 | 256 | 2560 | 400 | 21362 | 160.0% | 233.3% | 200.0% | 237.3% |
GF 6800LE | 320 | 700 | 8 | 1 | 5 | 256 | 2560 | 400 | 21362 | 160.0% | 233.3% | 200.0% | 237.3% |
9800 XT | 412 | 730 | 8 | 1 | 4 | 256 | 3296 | 412 | 22278 | 206.0% | 243.3% | 206.0% | 218.4% |
GFFX 5950U | 475 | 950 | 4 | 2 | 3 | 256 | 3800 | 356 | 28992 | 237.5% | 316.7% | 178.1% | 207.5% |
9800 Pro 256 | 380 | 700 | 8 | 1 | 4 | 256 | 3040 | 380 | 21362 | 190.0% | 233.3% | 190.0% | 204.4% |
9800 Pro 128 | 380 | 680 | 8 | 1 | 4 | 256 | 3040 | 380 | 20752 | 190.0% | 226.7% | 190.0% | 202.2% |
GFFX 5900U | 450 | 850 | 4 | 2 | 3 | 256 | 3600 | 338 | 25940 | 225.0% | 283.3% | 168.8% | 191.8% |
GFFX 5900 | 400 | 850 | 4 | 2 | 3 | 256 | 3200 | 300 | 25940 | 200.0% | 283.3% | 150.0% | 179.4% |
9700 Pro | 325 | 620 | 8 | 1 | 4 | 256 | 2600 | 325 | 18921 | 162.5% | 206.7% | 162.5% | 177.2% |
9800 | 325 | 600 | 8 | 1 | 4 | 256 | 2600 | 325 | 18311 | 162.5% | 200.0% | 162.5% | 175.0% |
9800 SE 256 | 380 | 680 | 4 | 1 | 4 | 256 | 1520 | 380 | 20752 | 95.0% | 226.7% | 190.0% | 170.6% |
GFFX 5900XT/SE | 400 | 700 | 4 | 2 | 3 | 256 | 3200 | 300 | 21362 | 200.0% | 233.3% | 150.0% | 165.3% |
9800 "Pro" | 380 | 680 | 8 | 1 | 4 | 128 | 3040 | 380 | 10376 | 190.0% | 113.3% | 190.0% | 164.4% |
GFFX 5800U | 500 | 1000 | 4 | 2 | 2 | 128 | 4000 | 250 | 15259 | 250.0% | 166.7% | 125.0% | 153.5% |
9700 | 275 | 540 | 8 | 1 | 4 | 256 | 2200 | 275 | 16479 | 137.5% | 180.0% | 137.5% | 151.7% |
GF 6600 | 300 | 550 | 8 | 1 | 3 | 128 | 2400 | 225 | 8392 | 150.0% | 91.7% | 112.5% | 141.7% |
9800 SE 128 | 325 | 580 | 8 | 1 | 4 | 128 | 2600 | 325 | 8850 | 162.5% | 96.7% | 162.5% | 140.6% |
GFFX 5700U GDDR3 | 475 | 950 | 4 | 1 | 3 | 128 | 1900 | 356 | 14496 | 118.8% | 158.3% | 178.1% | 129.0% |
GFFX 5700U | 475 | 900 | 4 | 1 | 3 | 128 | 1900 | 356 | 13733 | 118.8% | 150.0% | 178.1% | 126.6% |
X600 XT | 500 | 740 | 4 | 1 | 2 | 128 | 2000 | 250 | 11292 | 125.0% | 123.3% | 125.0% | 124.4% |
GFFX 5800 | 400 | 800 | 4 | 2 | 2 | 128 | 3200 | 200 | 12207 | 200.0% | 133.3% | 100.0% | 122.8% |
9500 Pro | 275 | 540 | 8 | 1 | 4 | 128 | 2200 | 275 | 8240 | 137.5% | 90.0% | 137.5% | 121.7% |
9600 XT | 500 | 600 | 4 | 1 | 2 | 128 | 2000 | 250 | 9155 | 125.0% | 100.0% | 125.0% | 116.7% |
9600 Pro | 400 | 600 | 4 | 1 | 2 | 128 | 1600 | 200 | 9155 | 100.0% | 100.0% | 100.0% | 100.0% |
X600 Pro | 400 | 600 | 4 | 1 | 2 | 128 | 1600 | 200 | 9155 | 100.0% | 100.0% | 100.0% | 100.0% |
GFFX 5700 | 425 | 500 | 4 | 1 | 3 | 128 | 1700 | 319 | 7629 | 106.3% | 83.3% | 159.4% | 98.9% |
9500 | 275 | 540 | 4 | 1 | 4 | 128 | 1100 | 275 | 8240 | 68.8% | 90.0% | 137.5% | 98.8% |
GFFX 5600U FC | 400 | 800 | 4 | 1 | 1 | 128 | 1600 | 100 | 12207 | 100.0% | 133.3% | 50.0% | 80.3% |
9600 | 325 | 400 | 4 | 1 | 2 | 128 | 1300 | 163 | 6104 | 81.3% | 66.7% | 81.3% | 76.4% |
X300 | 325 | 400 | 4 | 1 | 2 | 128 | 1300 | 163 | 6104 | 81.3% | 66.7% | 81.3% | 76.4% |
GFFX 5600U | 350 | 700 | 4 | 1 | 1 | 128 | 1400 | 88 | 10681 | 87.5% | 116.7% | 43.8% | 70.2% |
9600 SE | 325 | 400 | 4 | 1 | 2 | 64 | 1300 | 163 | 3052 | 81.3% | 33.3% | 81.3% | 65.3% |
X300 SE | 325 | 400 | 4 | 1 | 2 | 64 | 1300 | 163 | 3052 | 81.3% | 33.3% | 81.3% | 65.3% |
GFFX 5200U | 325 | 650 | 4 | 1 | 1 | 128 | 1300 | 81 | 9918 | 81.3% | 108.3% | 40.6% | 65.2% |
9550 | 250 | 400 | 4 | 1 | 2 | 128 | 1000 | 125 | 6104 | 62.5% | 66.7% | 62.5% | 63.9% |
GFFX 5700LE | 250 | 400 | 4 | 1 | 3 | 128 | 1000 | 188 | 6104 | 62.5% | 66.7% | 93.8% | 63.2% |
GFFX 5600 | 325 | 500 | 4 | 1 | 1 | 128 | 1300 | 81 | 7629 | 81.3% | 83.3% | 40.6% | 58.1% |
9550 SE | 250 | 400 | 4 | 1 | 2 | 64 | 1000 | 125 | 3052 | 62.5% | 33.3% | 62.5% | 52.8% |
GFFX 5500 | 270 | 400 | 4 | 1 | 1 | 128 | 1080 | 68 | 6104 | 67.5% | 66.7% | 33.8% | 47.6% |
GFFX 5200 | 250 | 400 | 4 | 1 | 1 | 128 | 1000 | 63 | 6104 | 62.5% | 66.7% | 31.3% | 45.5% |
GFFX 5600XT | 235 | 400 | 4 | 1 | 1 | 128 | 940 | 59 | 6104 | 58.8% | 66.7% | 29.4% | 43.9% |
GFFX 5200LE | 250 | 400 | 4 | 1 | 1 | 64 | 1000 | 63 | 3052 | 62.5% | 33.3% | 31.3% | 36.0% |
* RAM clock is the effective clock speed, so 250 MHz DDR is listed as 500 MHz. | |||||||||||||
** Textures/Pipeline is the maximum number of texture lookups per pipeline. | |||||||||||||
*** NVIDIA says their GFFX cards have a "vertex array", but in practice it generally functions as indicated. | |||||||||||||
**** Single-texturing fill rate = core speed * pixel pipelines | |||||||||||||
+ Multi-texturing fill rate = core speed * maximum textures per pipe * pixel pipelines | |||||||||||||
++ Vertex rates can vary by implementation. The listed values reflect the manufacturers' advertised rates. | |||||||||||||
+++ Bandwidth is expressed in actual MB/s, where 1 MB = 1024 KB = 1048576 Bytes. | |||||||||||||
++++ Relative performance is normalized to the Radeon 9600 pro, but these values are at best a rough estimate. |
There are numerous footnotes that are worth pointing out, just in case some people missed them. For starters, the memory bandwidth is something that many people may not like. Normally, all companies list MB/s and GB/s calculating MB as one million bytes and GB as one billion bytes. That's incorrect, but since everyone does it, it begins to not matter. However, in this chart, real MB/s values are listed, so they will all be lower than what the graphics card makers advertise.
Fill rate can also be calculated in various ways, and for ATI's older Radeon cards (the DX7 models), they could apply three textures per pipeline per pass, or so they claimed. Two of the texture lookups, however, had to use the same texture, which made it a little less useful. Anyway, these are all purely theoretical numbers, and it is almost impossible to say how accurate they are in the real world without some specialized tools. To date, no one has created "real world" tools that measure these values, and they probably never will, so we are stuck with synthetic benchmarks at best. Basically, don't take the fill rate scores too seriously.
You can read the remaining footnotes above, and they should be self-explanatory. We just wanted to clarify those two points up front, and they apply to all of the performance charts. Now, on to the comments specifically related to DirectX 9.
The most important thing to point out first is that this chart has an additional weighting. This is due to the discrepancies in features and performance that exist among the various models of DirectX 9 hardware. The biggest concern is the theoretical performance of the GeForce FX cards. Most people should know this by now, but simply put the FX cards do not manage to live up to expectations at all when running DirectX 9 code. In DirectX 8.1 and earlier, the theoretical performance is a relatively accurate reflection of the real world, but overall the cards are far from perfect. We felt that the initial sorting was so unrealistic that a further weighting of the scores was in order, however you can view the unweighted chart if you wish. Newer features help improve performance at the same clock speed for cards as well, for example the optimizations to the memory controller in the GF6 line make the 6800 vanilla a faster card in almost all cases compared to the FX5950U and 9800 Pro cards. In fact, the GF6 cards are really only beaten by the X800 cards, and that's still not always the case.
The weighting used was relatively simple (and arbitrary). After averaging the fill rate, bandwidth and vertex rate scores, we multiply the result by a weighting factor.
NV3x Series: 0.85
R3xx Series: 1.00
R4xx Series: 1.10
NV4x Series: 1.20
This gives a rough approximation of how the features and architectural differences play out. Also note that certain chips lack some of the more specialized hardware optimizations, so while theoretical performance of the 5200U appears better than the 5600 and 5700LE, in most situations it ends up slower. Similarly, the X600 Pro and X300 chips should beat the 9600 Pro and 9600 chips in real performance, as the RV370 and RV380 probably contain a few optimizations and enhancements. They are also PCI Express parts, but that is not something to really worry about. PCI Express, at least for the time being, seems to be of little impact in actual performance - sometimes it's a little faster, sometimes it's a little slower. If you're looking at buying a PCIe based system for the other parts, that's fine, but we recommend that you don't waste your money on such an expensive system solely for PCIe - by the time PCIe really has a performance lead, today's systems will need upgrading anyway.
If you refer back to the earlier charts, you will notice that the X600 and X300 do not include any of the SM2.0b features. This is not a mistake - only the forthcoming X700 cards will bring the new features to ATI's mid-range cards. This is in contrast to the 6600 cards, which are functionally identical to the 6800 cards, only with fewer pipelines. The X700 is likely to have a performance advantage over the 6600 in many situations, as it will have a full six vertex pipelines compared to three vertex pipelines on the 6600. Should the 6800LE become widely available, however, it could end up the champion of the $200 and under segment, as the 256-bit memory bus may be more important than clock speeds. Having more than 25 GB/s of memory bandwidth does not always help performance without extremely fast graphics cores, but having less than 16 GB/s can slow things down. We'll find out how things play out in a few months.
43 Comments
View All Comments
JarredWalton - Thursday, October 28, 2004 - link
43 - It should be an option somewhere in the ATI Catalyst Control Center. I don't have an X800 of my own to verify this on, not to mention a lack of applications which use this feature. My comment was more tailored towards people that don't read hardware sites. Typical users really don't know much about their hardware or how to adjust advanced settings, so the default options are what they use.Thera - Tuesday, October 19, 2004 - link
You say SM2.0b is disabled and consumers don't know how to turn it on. Can you tell us how to enable SM2.0b?Thank you.
(cross posted from video forum)
endrebjorsvik - Wednesday, September 15, 2004 - link
WOW!! Very nice article!!does anyone have all these datas collected into an exel-file or something??
JarredWalton - Sunday, September 12, 2004 - link
Correction to my last post. KiB and MiB and such are meant to be used for size calculations, and then KB and MB can be used for bandwidth calculations. Now the first paragraph (and my gripe) should be a little more clear if you didn't understand it already. Basically, the *bandwidth* companies (hard drives, and to a lesser extent RAM companies advertising bandwidth) proposed that their incorrect calculations stand and that those who wanted to use the old computer calculations should change.There are problems, however. HDD and RAM both continue to use both calculations. RAM uses the simplified KB and MB for bandwidth, but the accepted KB and MB (KiB and MiB now) for size. HDD uses the simplified KB and MB for size, but then they use the other KB and MB for sustained transfer rates. So, the proposed change not only failed to address the problem, but the proposers basically continue in the same way as before.
JarredWalton - Saturday, September 11, 2004 - link
#38 - there are quite a few cards/chips that were only available in very limited quantities.39 - Actually, that is only partially true. KibiBytes and MibiBytes are a *proposed* change as far as I am aware, and they basically allow the HDD and RAM people to continue with their simplified calculations. I believe that KiB and MiB are meant for bandwidths, however, and not memory sizes. The problem is that MB and KB were in existence long before KiB and MiB were proposed. Early computers with 8 KB of RAM (over 40 years ago) had 8192 bytes of RAM, not 8000 bytes. When you buy a 512 MB DIMM, it is 512 * 1048576 bytes, not 512 * 1000000 bytes.
If a new standard is to be adopted for abbreviations, it is my personal opinion that the parties who did not conform to the old standard are the ones that should change. Since I often look at the low level details of processors and GPUs and such, I do not want to have two different meanings of the same thing, which is what we currently have. Heck, there was even a class action lawsuit against hard drive manufacturers a while back about this "lie". That was the solution: the HDD people basically said, "We're right and in the future 2^10 = KiB, 2^20 = MiB, 2^30 = GiB, etc." Talk about not taking responsibility for your acttions....
It *IS* a minor point for most people, and relative performance is still the same. Basically, this is one of my pet peeves. It would be like saying, "You know what, 5280 feet per mile is inconvenient Even though it has been this way for ages, let's just call it 5000 feet per mile." I have yet to see any hardware manufacturers actually use KiB or MiB as an abbreviation, and software that has been around for decades still thinks that a KB is 1024 bytes and a MB is 1048576.
Bonta - Saturday, September 11, 2004 - link
Jarred, you were wrong about the abbreviation MB.1 MB is 1 mega Byte is (1000*1000) Bytes is 1000000 Bytes is 1 million Bytes.
1 MiB is (1024*1024) Bytes is 1048576 Bytes.
So the vid card makers (and the hard drive makers) actually have it right, and can keep smiling. It is the people that think 1MB is 1048576 Bytes that have it wrong. I can't pronounce or spell 1 MiB correctly, but it is something like 1 mibiBytes.
viggen - Friday, September 10, 2004 - link
Nice article but what's up with the 9200 Pro running at 300mhz for core & memory? I dun remember ATI having such a card.JarredWalton - Wednesday, September 8, 2004 - link
Oops... I forgot the link from Quon. Here it is:http://www.appliedmaterials.com/HTMAC/index.html
It's somewhat basic, but at the same time, it covers several things my article left out.
JarredWalton - Wednesday, September 8, 2004 - link
I received a link from Matthew Quon containing a recent presentation on the whole chip fabrication process. It includes details that I omitted, but in general it supports my abbreviated description of the process.#34: Yes, there are errors that are bound to slip through. This is especially true on older parts. However, as you point out, several of the older chips were offered in various speed grades, which only makes it more difficult. Several of the as-yet unreleased parts may vary, but on the X700 and 6800LE, that's the best info we have right now. The vertex pipelines are *not* tied directly to the pixel quads, so disabling 1/4 or 1/2 of the pixel pipelines does not mean they *have* to disable 1/4 or 1/2 of the vertex pipelines. According to T8000, though, the 6800LE is a 4 vertex pipeline card.
Last, you might want to take note of the fact that I have written precisely 3 articles for Anandtech. I live in Washington, while many of the other AT people are back east. So, don't count on everything being reviewed by every single AT editor - we're only human. :)
(I'm working on some updates and corrections, which will hopefully be posted in the next 24 hours.)
T8000 - Wednesday, September 8, 2004 - link
I think it is very good to put the facts together in such a review.I did notice three things, however:
1: I have a GF6800LE and it has 4 enabled vertex pipes instead of 5 and comes with a 300/700 gpu/mem clock.
2: Since gpu clock speeds did not increase much, they had to add more features (like pipelines) to increase performance.
3: Gpu defects are less of an issue then cpu defects, since a lot of large gpu's offered the luxory of disabling parts, so that most defective gpu's can still be sold. As far as I know, this feature has never made it into the cpu market.