The Polaris Architecture: In Brief

For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.

In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.

At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.

Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.

Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.

Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.

Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.

Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.

AMD's Path to Polaris Gaming Performance
Comments Locked

449 Comments

View All Comments

  • Yojimbo - Wednesday, June 29, 2016 - link

    Why? Who is going to switch from a GTX 980 to an RX 480 for performance in two DX12 games? I don't think those two DX12 titles can accurately be thought of as being fully representative of DX12 performance, especially since they seem to favor AMD cards to begin with, so it doesn't show the sort of categorical comparison you are implying.
  • smackosaurus - Wednesday, June 29, 2016 - link

    Well if there are more like me out there and have several family members each with their own PC.. cards have to last a couple of years AND be affordable for the average joe. Since DX 12 is replacing DX1 and every major title announced lately and probably every major title in the future will be Dx12.. the Value of the 480 sticks out. Especially since ever the flagship 1080 has to use software emulation for some DX12 features, because Nvidia decided to just plain leave it out of the hardware. Preemption=/=async
  • Yojimbo - Thursday, June 30, 2016 - link

    I don't think it's certain that DX12 will completely replace DX11, even in major titles. But even if so, that does nothing to change the fact that the number of DX 12 titles available now to benchmark is quite small, resulting in a small sample size. With a small sample size one is not able to make the broad inferences you would like to make. The 980 is likely to be discontinued anyway shortly after the 1060 comes out, completely destroying any reason at all for having it in the charts.

    Maxwell does have hardware support for asynchronous compute. Pascal has an enhanced version of it. async != ACE. ACEs are task schedulers which are used in AMD's method of supporting asynchronous compute. I have a feeling that the idea that NVIDIA does asynchronous compute "in software" has to do with the fact that NVIDIA was working on driver optimizations to try to make the asynchronous compute implementation in Ashes of the Singularity show benefit on NVIDIA's hardware. I'm not sure if NVIDIA ever achieved that or if they've given up or what, but to my understanding NVIDIA was turning the feature off in their driver profile for Ashes of the Singularity because with Oxide's implementation and NVIDIA's method asynchronous compute actually caused the game to run more slowly than it did without it. Again, one game and one implementation is a small sample size. It doesn't tell you much on its own. Independent testing showed there were situations, even on Maxwell hardware, where NVIDIA's method produced a larger speed boost from asynchronous compute than AMD's method. Another thing to consider is that the speed boost that AMD gets in DX 12 with asynchronous compute has something to do with the fact that AMD tends to make less efficient use of their compute throughput than NVIDIA without asynchronous compute. Finally, considering that AotS was designed from the beginning as a Mantle game, it isn't just one data point, it's also perhaps not a very reliable one.

    For more information on asynchronous compute in Pascal, perhaps this video will be informative:
    https://www.youtube.com/watch?v=Bh7ECiXfMWQ
  • cocochanel - Thursday, June 30, 2016 - link

    DX12 is the latest and the best graphics API Microsoft has ever made. All future games, be they for the PC or Xbox One will use it. How can anyone sell a card that performs poorly on it ? And sell it for 800+ dollars ?
  • Yojimbo - Thursday, June 30, 2016 - link

    The GTX 1080 does not perform poorly on DX12 games. What benchmarks have you been looking at?!
  • cocochanel - Saturday, July 2, 2016 - link

    http://www.pcworld.com/article/3071037/hardware/nv...
  • crimson117 - Wednesday, June 29, 2016 - link

    Typo: For the 480 4GB model, VRAM is listed as 8GB in the table on page 1,
  • webdoctors - Wednesday, June 29, 2016 - link

    I called it a month ago when the new cards starting coming out.

    The $200 market is really competitive and you're competing against not just the current cards but previous generations one level up. You're already seeing GTX970 cards at the same pricepoint after rebate in the USA as this "new" RX 480 card.

    Will have to wait for the 1060 review, but it'll likely wipe this card out of the landscape since this one only matches Maxwell Perf/W and perf is only GTX970 levels...win for consumers but not so much for AMD.
  • maccorf - Wednesday, June 29, 2016 - link

    Love comments like this...so a $330+ card with no currently known benchmarks will "likely wipe" a $240, well performing card "out of the landscape"? That is an entire jump in cost market, and you don't even know what the 1060 will do. "I called it..." LOL called what? This card being the new standard for performance and price? The only reason those GTX970 cards are coming down are because of the RX 480. People are so absurd with this, is it even possible for you trolls to ever admit you're completely and utterly biased?
  • mdriftmeyer - Wednesday, June 29, 2016 - link

    It's not possible for these grown children to act like adults.

Log in

Don't have an account? Sign up now