Budget Battle: HyperMemory vs. TurboCache
by Derek Wilson on May 12, 2005 9:00 AM EST- Posted in
- GPUs
Round 1: Architecture
The technology in these products has to do with making games think that they have more graphics memory than what the cards physically have on board. ATI and NVIDIA have taken different approaches to solving the problem.NVIDIA has a solution that goes way down to the inner workings of the GPU. They haven't released details about the specifics on what has been changed with their TurboCache parts, but they state that everything they've done has been to hide the latency of system memory accesses in their pixel and ROP pipelines. Likely, this includes adding larger local caches and doing other things to increase the number of pixels that can be inflight at any given time. A very important factor of NVIDIA's architecture is that it is designed to operate on system memory as if it was local - the only thing that NVIDIA doesn't allow to operate directly in system RAM is the front buffer.
The ATI approach is distinctly more software based, though they do state that the memory controller on their GPU is what makes HyperMemory possible. The extent of these changes is significantly less than the NVIDIA solution. The ATI approach creates more of a virtualized memory system for the graphics card, allowing the driver to allocate system memory as needed and page data in and out of graphics RAM at will. The system memory is windows-managed and so, is virtualized out to the hard disk if necessary (which could really kill performance). Of course, if enough RAM is being used to page graphics data, there are more issues at hand that are likely also causing performance problems.
We haven't talked about GART memory very much since the decline of AGP, but the brief explanation is that GART memory is linearly addressable non-paged memory allocated to the graphics subsystem for external storage. With PCI Express based systems, it seems that the graphics driver manages GART memory completely rather than allowing the system BIOS to set a default size. We haven't been able to get solid details on how this memory is managed from either ATI or NVIDIA.
To take a further step back, the organization of ATI's graphics memory is set up in stages. First the driver determines what surfaces are the highest priorities and loads those into local memory. Whenever anything new comes along after local memory gets crowded, ATI demotes lower priority surfaces to GART memory. When GART memory gets too full, surfaces can further be demoted to pageable, Windows managed system memory. This system memory is requested by the driver as necessary and freed when memory pressure decreases again.
Microsoft's next Windows OS will require graphics drivers to support fully virtualized and windows managed graphics memory. Along with their VPU recover (graphics hardware reset), HyperMemory may be a product of ATI's preliminary Longhorn work. To be sure, including the ability to incorporate windows-managed memory with driver-managed local RAM is a drop in the bucket compared to handing over all local and system graphics memory management to the OS.
The inclusion of virtualized graphics memory is actually something that workstation users have been calling for for quite some time. It's very interesting to see the technology end up in a value product first. Hopefully, ATI will follow 3Dlabs' lead and bring their virtualization technology to the workstation space as well.
The major difference between TurboCache and HyperMemory is that the latter must first load a required surface into local memory before operating on it - possibly requiring the driver to kick something else off of local memory into system RAM. The separation of up and down stream bandwidth in PCI Express makes this relatively painless. TurboCache, on the other hand, sees all graphics memory as local and does not need to load a surface or texture to local RAM before operating on it. Shaders are able to read and write directly over the PCI Express bus into system RAM. Under the NVIDIA solution, the driver carries the burden of keeping the most used and most important bits of data in local memory.
The underlying architectures of these cards dictate the comparison points that we will chose. The ATI card needs more local RAM than the NVIDIA card because it isn't rearchitected to support operating on the majority of its data at across the high latency of a system bus. More fast local RAM is good, but with more RAM comes more cost. The balance will be found in who can afford to charge the least - ATI with a pretty much stock R42x and more RAM, or NVIDIA with less RAM and a rearchitected GPU. Price is a huge factor in determining the better solution here, and performance often comes as an afterthought.
Happily, we embrace the new move to eliminate graphics API features as a distinguishing factor in graphics hardware decisions.
33 Comments
View All Comments
A554SS1N - Tuesday, May 17, 2005 - link
Hello Derek, are you planning to test the AGP 6200 version which uses the NV44A core - it's virtually the same core as the PCI-E Turbo-Cache cards, except it is a native AGP solution. They are all 64-bit as far as NVidia have told me and although all board makers only provide passive cooling, that's all it needs. I've only seen one brief review on another website so far (of the Inno3D one) that didn't really have enough information. As reviews on Anandtech are very thorough, it would be good to test and compare this card. The core is clocked at 350MHz like the Turbo-cache cards and the card has 128mb of 500MHz effective memory.Zoomer - Monday, May 16, 2005 - link
http://www.anandtech.com/video/showdoc.aspx?i=2413Typo?
" ...the bear minimum in graphics cards supports DX9 level graphics...
"
Shouldn't bear be bare?
pxc - Friday, May 13, 2005 - link
The x300 HM results wouldn't be so disappointing if ATI had not been saying that HM cards would be faster than TC cards.Derek: 128MB HM doesn't make much of a difference. I had an XPRESS 200M laptop with 128MB dedicated memory plus HyperMemory and performance was horrible, much much slower than a x300 SE. I wouldn't be surprised if turning off HM made the card faster.
PrinceGaz - Friday, May 13, 2005 - link
#23 Jarred- "Despite the fact that the system was high-end, the performance of the cards is the limiting factor. A 3.0 GHz or 3000+ CPU with the same amount of RAM would likely post similar scores. However, 512MB of RAM would have something of an impact on performance. Anyway, $100 gets you 1GB of RAM these days, so we're more or less through with testing anything with 512MB of RAM or less."Whilst I agree that 1GB and it is not unreasonable for even budget systems to have that amount, $100 certainly won't buy you 1GB of OCZ PC3200 with 2-2-2 timings. Given that the performance of these cards is very dependent on system memory and a budget system is likely to use budget memory, it is important to have benchmarks done with 2.5-3-3 timings or even 3-3-3.
I support AT's use of a very fast CPU etc in all their usual graphics-cards reviews, and a very fast GPU for all their CPU reviews, so that performance is dependent on the component being tested rather than the rest of the system. However with the TurboCache and HyperMemory cards, the system memory effectively becomes an important part of the card, so it is important to test with the sort of budget memory these cards would be used with.
By using premium memory, AT has probably reduced the importance of the quantity of onboard memory and skewed the relative results of the tested cards. The 64MB TC may have performed significantly better in comparison with the others, had the test system been outfitted with 1GB of budget memory at 3-3-3 timings.
kmmatney - Thursday, May 12, 2005 - link
#28, you can get decent integrated graphics with the ATI XPRESS 200 chipset - however the only motherboard at NewEgg is $102, which is a bit high. I once saw them for around $80.http://www.newegg.com/Product/Product.asp?Item=N82...
bupkus - Thursday, May 12, 2005 - link
"...perhaps upcoming integrated graphics solutions from ATI and NVIDIA will be as compelling as these parts show value products can be."I have a need for cheap but functional graphical PCs for smaller children, so I'm still waiting for a replacement for the NVIDIA nForce2 IGP. What's the holdup?
RadeonGuy - Thursday, May 12, 2005 - link
Moral of the story is they both suckDerekWilson - Thursday, May 12, 2005 - link
Our comparison tests will continue to be done on higher end hardware. Our environments are designed to test the performance of graphics hardware not to determine the actual performance an end user can expect.We run these games on a fresh install of Windows XP SP2. Audio hardware is not installed and the service is disabled in Windows. Nothing that isn't necessary to get the graphics to the screen is not enabled.
Most people have lots of background programs, services, and audio running. With the way Intel and AMD are approaching dual core, multitasking is perched to become even more pervasive. This will enevitably widen the gap between our tests and real world performance when we are looking at graphics cards. Our CPU performance tests will continue to grow and include more and more multitasking as we are able to come up with the tests.
Of course, #25 has a very valid point -- it may well be worth it to run one test in a price targeted system running a more "typical use" environment for reference purposes. Budget systems cluttered with loads of apps running in the background are no place to build a hardware comparison, but that doesn't mean they are entirely unuseful.
jediknight - Thursday, May 12, 2005 - link
#17It would be useful, however, to include benchmarks with all budget hardware - to see if you really can get acceptable (30fps+) gaming performance with these cards in realistic settings.
Wellsoul2 - Thursday, May 12, 2005 - link
Thanks Derek.Seems crazy that there's no DVI since CRT's are
going away fast. Even my old budget 9200SE card
had DVI.
Just wondering..I would think with 17inch LCD's
around $175 and 19inch around $300 (with DVI)
that this is the future.
What do you all think?