GPU Cheatsheet - A History of Modern Consumer Graphics Processors
by Jarred Walton on September 6, 2004 12:00 AM EST- Posted in
- GPUs
Let's Talk Performance
This section is likely to generate a lot of flames if left unchecked. First, though, we want to make it abundantly clear that raw, theoretical performance numbers (which is what is listed here) rarely manage to match real world performance figures. There are numerous reasons for this discrepancy, for example the game or application in use may stress different parts of the architecture. A game that pushes a lot of polygons with low resolution textures is going to stress the geometry engine, while a game that uses high resolution textures with lower polygon counts is more likely to stress the memory bandwidth. Pixel and Vertex Shaders are even more difficult to judge, as both ATI and NVIDIA are relatively tight-lipped about the internal layout of their pipelines. These functions are the most like an actual CPU, but they're also highly proprietary and the companies feel a need to protect their technology (probably with good cause). So while we know that AMD Athlon 64 chips have a 12 stage Integer/ALU pipeline and 17 stage FPU/SSE pipeline, we really have no idea how many stages are in the pixel and vertex pipelines of ATI and NVIDIA cards. In fact, we really don't have much more than a simplistic functional overview.
So why even bother talking about performance without benchmarks? In part, by looking at the theoretical performance and comparing it to the real world performance (you'll have to find such real world figures in another article), we can get a better idea of what went wrong and what worked well. More importantly, though, most people referring to a GPU Guide are going to expect some sort of comparison and ranking of the parts. It is by no means definitive, and for some people choosing a graphics card is akin to joining a religion. So, take these numbers with a grain of salt and know that they are not intentionally meant to make one card look better than another. Where performance seriously fails to match expectations, it will be noted.
There are numerous factors that can affect performance, other than the application itself. Drivers are a major one, and it is not unheard of for the performance of a particular card to increase by as much as 50% over its lifetime due to driver enhancements. In light of such examples (i.e. both Radeon and GeForce cards in Quake 3 performance increased dramatically over time), it is somewhat difficult to say that theoretical performance numbers are really that much worse than changing real world numbers. With proper optimization, real world numbers can usually approach theoretical numbers, but this really only occurs for the most popular applications. Features also play a part, all other things being equal, so if two cards have the same theoretical performance but one card is DX9 based and the other is DX8 based, the DX9 card is should be faster.
Speaking of drivers, we would be remiss if we didn't at least mention OpenGL support. Brought into the consumer segment with GLQuake back in 1997, OpenGL is a different platform and requires different drivers. NVIDIA and ATI both have full OpenGL drivers, but all evidence indicates that NVIDIA's drivers are simply better at this point in time. Doom 3 is the latest example of this. However, OpenGL is also used in the professional world, and again NVIDIA tends to lead in performance, even with inferior hardware. Part of the problem is that very few games other than id Software titles and their licensees use OpenGL, so it often takes a back seat to DirectX. However, ATI has vowed to improve their OpenGL performance since the release of Doom 3, and hopefully they can close the gap between their DirectX and OpenGL drivers.
So, how is overall performance determined - in other words, how will the tables be sorted? The three main factors are fill rate, memory bandwidth, and processing power. Fill rate and bandwidth have been used for a long time, and they are well understood. Processing power, on the other hand, is somewhat more difficult to determine, especially with DX8 and later Pixel and Vertex Shaders. We will use the vertices/second rating as am estimate of processing power. For the charts, each section will be normalized relative to the theoretically fastest member of the group, and equal weight will be given to the fill rate, bandwidth, and vertex rate. That's not the best way of measuring performance, of course, but it's a start, and everything is theoretical at this point anyway. If you really want a suggestion on a specific card, the forums and past articles are a better place to search. Another option is to decide which games (or applications) you are most concerned about, and then go find an article that has benchmarks with that particular title.
To reiterate, this is more of a historical perspective on graphics chips and not a comparison of real world performance. And with that disclaimer, let's get on to the performance charts.
43 Comments
View All Comments
MODEL 3 - Wednesday, September 8, 2004 - link
A lot of mistakes for a professional hardware review site the size of Anandtech.I will only mention the de facto mistakes since I have doubts for more.I am actually surprised about the amount of mistakes in this article.I mean since I live in Greece (not the center of the world in 3d technology or hardware market) I always thought that the editors in the best hardware review sites of the world (like Anandtech) have at least the basic knowledge related to technology and they make research and doublecheck if their articles are correct.I mean they get paid, right?I mean if I can find so easily their mistakes (I have no technology related degree although I was purchase and product manager in the best Greek IT companies) they must be doing something very,very wrong indeed.Now onto the mistakes:ATI :
X700 6 vertex pipelines: Actually this is no mistake since I have no information about this new part but it seems strange if X700 will have the same (6) vertex pipelines as X800XT.I guess more logical would be half as many (3) (like 6800Ultra-6600GT) or double as many as X600 (4).We will see.
Radeon VE 183/183: The actual speed was 166/166SDR 128bit for ATI parts and as low as 143/143 for 3rd party bulk part
Radeon 7000 PCI 166/333 The actual speed was 166/166SDR 128bit for ATI parts and as low as 143/143 for 3rd party bulk part (note that anandtech suggests 166DDR and the correct is 166 SDR)
Radeon 7000 AGP 183/366 32/64(MB): The actual speed was 166/166SDR for ATI parts and as low as 143/143 for 3rd party bulk part (note that anandtech suggests 166DDR and the correct is 166 SDR) also at launch and for a whole year (if ever) it didn't exist a 64MB part
Radeon 7200 64bit ram bus: The 7200 was exactly the same as Radeon DDR so the ram bus width was 128bit
ATI has unofficial DX 9 with SM2.0b support: Actually ATI has official DX 9.0b support and Microsoft certified this "in between" version of DX9.When they enable their 2.0b feutures they don't fail WHQL compliance since 2.0b is official microsoft version (get it?).Feutures like 3Dc normal map compression are activated only in open GL mode but 3Dc compression is not part of DX9.0b.
NVIDIA:
GF 6800LE with 8 pixel pipelines has according to Anandtech 5 vertex pipelines: Actually this is no mistake since I have no information about this part but since 6800GT/Ultra is built with four (4) quads with 4 pixel pipelines each isn't more logical the 6800LE with half the quads to have half the pixel (8) AND half (3) the vertex pipelines?
GFFX 5700 3 vertex pipelines: GFFX 5700 has half the number of pixel AND vertex pipelines of 5900 so if you convert the vertex array of 5900 into 3 vertex pipes (which is correct) then the 5700 would have 1,5
GF4 4600 300/600: The actual speed is 300/325DDR 128bit
GF2MX 175/333: The actual speed is 175/166SDR 128bit
GF4MX series 0.5 vertex shader: Actually the GF4MX series had twice the amount of vertex shaders of GF2 so the correct number of vertex shader is 1
According to Anandtech, the GF3 cards only show a slight performance increase over the GF2 Ultra, and that is only in more recent games : Actually GF3 (Q1 01) was based in 0,18 nm technology and the yields was extremely low.In reality GF3 parts in acceptable quantity came in Q3 01 with GF3Ti series 0,15 nm technology .If you check the performance in open GL games at and after Q3 01 and DX8 games at and after Q3 02 you will clearly see GF3 to have double the performance of GF2 clock for clock (GF3Ti500 Vs GF2Ultra)
Now, the rest of the article is not bad and I also appreciate the effort.
JarredWalton - Wednesday, September 8, 2004 - link
Sorry, ViRGE - I actually took your suggestion to heart and updated page 3 initially, since you are right about it being more common. However, I forgot to modify the DX7 performance charts. There are probably quite a few other corrections that should be made as well....ViRGE - Tuesday, September 7, 2004 - link
Jared, like I said, you're technically right about how the GF2 MX could be outfitted with either 128bit SDR or 64bit SDR/DDR, but you said it yourself that the cards were mostly 128bit SDR. Obviously any change won't have an impact, but in my humble opinion, it would be best to change the GF2 MX to better represent what historically happened, so that if someone uses this chart as a reference for a GF2 MX, they're more likely to be getting the "right" data.BigLan - Tuesday, September 7, 2004 - link
Good job with the articleLove the office reference...
"Can I put it in my mouth?"
darth_beavis - Tuesday, September 7, 2004 - link
Sorry, now it's suddenly working. I don't know what my problem is (but I'm sure it's hard to pronounce).darth_beavis - Tuesday, September 7, 2004 - link
Actually it looks like none of them have labels. Is anandtech not mozilla compatible or something. Just use jpgs pleaz.darth_beavis - Tuesday, September 7, 2004 - link
Why is there no descriptions for the columns on the graph on pg 2. Are just supposed to guess what the numbers mean?JarredWalton - Tuesday, September 7, 2004 - link
Yes, Questar, laden with errors. All over the place. Thanks for pointing them out so that they could be corrected. I'm sure that took you quite some time.Seriously, though, point them out (other than omissions, as making a complete list of every single variation of every single card would be difficult at best) and we will be happy to correct them provided that they actually are incorrect. And if you really want a card included, send the details of the card, and we can add that as well.
Regarding the ATI AIW (All In Wonder, for those that don't know) cards, they often varied from the clock and RAM speeds of the standard chips. Later models may have faster RAM or core speeds, while earlier models often had slower RAM and core speeds.
blckgrffn - Tuesday, September 7, 2004 - link
Questar - if you don't like it, leave. The article clearly stated its bounds and did a great job. My $.02 - the 7500 AIW is 64 meg DDR only, unsure of the speed however. Do you want me to check that out?mikecel79 - Tuesday, September 7, 2004 - link
#22 The Geforce256 was released in October of 1999 so this is roughly the last 5 years of chips from ATI and Nvidia. If it were to include all other manufacturers it would be quite a bit longer.How about examples of this article being "laden or errors" instead of just stating it.