HSA 1.1 Specification Launched: Extending HSA to More Vendors & Processor Typesby Ryan Smith on May 31, 2016 10:00 AM EST
For the last few years now we have been keeping tabs on the development of the Heterogeneous System Architecture, a set of technical standards and an associated instruction language designed to allow efficient heterogeneous compute. Originally envisioned by AMD, HSA has been the cornerstone of their efforts to develop a fully functional ecosystem for heterogeneous hardware and software. Define a standard, make it easy(ier) for developers to create software around it, and, if all goes well, AMD’s big bet on GPU technology made almost 10 years ago will pay off.
However while AMD was the birthplace of what would become HSA, the standard as a whole has been about more than just one company. Which is why HSA has been under development of the HSA Foundation since 2012. The foundation is composed of many members, including a number of CPU/GPU design heavyweights such as ARM, Qualcomm, Imagination, and Samsung, all of whom are contributing to the development of HSA, and in turn we expect are at least giving themselves the option to leverage it in the future.
With all of that said, while HSA itself is a group project, in practice the first iteration of the standard was AMD-focused. This owed to not only AMD’s early involvement, but also due to the fact that the standard needed hardware to be developed against – to have a proof of concept, which was AMD’s Carrizo APU. As a result the HSA 1.0 specification did not offer as much flexibility with other architectures as the rest of the foundation would like. This is something that they have been working behind the scenes to address, and today the Foundation will finally be taking their next step with the publication of the HSA 1.1 standard.
The big (though not sole) focus of the 1.1 standard then is extending it to better work with non-AMD hardware. This includes not only SoCs and other integrated devices from other vendors (e.g. ARM), but additional classes of accelerators such as DSPs. The HSA foundation wants to make the HSA standard truly heterogeneous for more than just AMD APUs, and for more than just CPU + GPU combinations. The 1.1 standard, in turn, is their effort to form a more perfect standard for heterogeneous compute.
What’ on the table for HSA 1.1 then is not a radical departure from HSA 1.0, but it is an extension and a further tightening of the standard to meet the goals of the Foundation. On the hardware side this is fully backwards compatible – meaning it will work on 1.0 hardware such as Carrizo – while setting up the updated standard to work on additional hardware. This is especially the case with additional classes of accelerators, as 1.0 was primarily focused on abstracting away the GPU (per AMD’s APU needs).
As the first version of the true multi-vendor specification then, 1.1 is designed to help vendors be able to mix and match HSA-capable blocks in an effective manner. The Foundation itself has a heavy mobile SoC presence (be it integrators or IP providers), and it’s easy to imagine how someone like MediaTek would want to be able to ensure that they can easily make HSA-capable SoCs using both Imagination and ARM GPUs, or how Qualcomm may want to use HSA in the future in their Hexagon DSPs. To get there, the 1.1 specification makes transparent a number of system level issues. Information about shared page tables, signals, queues, and more are now better exposed through the updated standard, which plays a big part in bringing about multi-vendor support.
The 1.1 specification also addresses some other issues that were felt at one point or another with HSA. The memory model now has a formal definition (using cat/HERD, for those few of you who know those tools). Along those lines there is additional memory functionality such as support for non-temporal memory accesses, which specifically comes into play when you need to tell the cache to flush an item after it’s used (good for one-time-use items). And on the signal side of matters, HAS code can now understand how to wait on multiple signals, an improvement over 1.0 where code could only wait on a single device.
Finally, 1.1 also includes updates to help optimize performance and make the API play nicer with other code. A new profiling API has been introduced, which exposes hardware-level counters and other information to allow better profiling of the performance of HSA program execution, which can then be fed back into profile guided optimizations. Meanwhile the HSA finalizer has been reworked so that its internal representation isn’t quite so obscure, with the new, more standard-looking representation making the finalizer more suitable for linking to other tools.
Yet despite all of these updates, the change should be completely transparent to application developers. Because HSA is ultimately a means of abstracting away the work and the quirks necessary to make heterogeneous compute work well, application developers won’t see these changes. Who does see these changes are the hardware developers, who write the associated runtimes that actually compile HSAIL intermediate code down to device code. And even then, we’re told that updating a 1.0 implementation to 1.1 is not especially painful, particularly when compared to writing an implementation to begin with. AMD for their part already has a 1.1 implementation up and running, which, logically enough, is being used as the basis of their Radeon Open Compute Platform (ROCm), where ROCm adds the additional discrete GPU functionality AMD specifically needs.
Though with that said, the existence of ROCm and other platforms does bring up the struggle the HSA Foundation is facing on adoption. While ROCm is HSA-based, at the same time AMD is doing an end-run around the HSAIL, preferring to compile direct-to-ISA. This still utilizes the HSA Runtime, and as a result benefits from and validates the basic HSA strategy, but it’s an example of how the HSAIL aspect of the standard has struggled.
Similarly, earlier this week we saw ARM pass on embracing HSAIL as well for the heterogonous aspects of their Mali-G71 GPU, following the same train of thought as AMD. The G71 GPU is for all intents and purposes an HSA 1.1 GPU - following the HSA hardware specification to implement heterogonous computing in a common and sane way - but the company is not utilizing HSAIL.
ARM for their part is following an OpenCL-centric software strategy for exposing the heterogeneous aspects of their hardware to developers. The interesting thing about the ARM implementation is that they have gone above and beyond the basic aspects of OpenCL 2.0, offering finer-grained sync that OpenCL requires at a minimum. Finer-grained sync that would otherwise be something better suited for HSA. As a result part of the HSA Foundation’s efforts are focused on showcasing the additional benefits of HSA over OpenCL 2.0, to entice hardware manufacturers and developers into using HSA.
Ultimately, in our discussion with Foundation president John Glossner (who replaced Phil Rogers), despite these setbacks he’s still bullish on HSAIL. It’s his hope that as finalizers continue to improve, there will be less reason for companies like AMD to bypass HSAIL as they do now. And in the meantime, further success with HSA and HSAIL in general should help to encourage more hardware vendors to adopt HSAIL.
Finally, to pull one more slide out of the HSA deck as a “this is cool” subject, Continuum Analytics’ Numba Python compiler has recently added direct HSA support. This makes it a lot easier for developers writing code in Python to easily add HSA-compliant vectorization to their programs. Python is a widely used language, and while even “automatic” parallelization isn’t the true holy grail of no-effort program parallelization, it does get HSA one step closer, at least in this case.
Wrapping things up, today’s launch of the HSA 1.1 specification, despite the minor version number increment, should in the long run prove to be a significant event for the HSA Foundation. By finally getting the specification to the point where it can more readily support multi-vendor hardware, the ecosystem as a whole will have the opportunity to evolve into a true multi-vendor ecosystem, the ultimate purpose of AMD spinning their work off into the HSA Foundation to begin with. The hard part as an outside observer will be waiting; while the specification was ratified back in April, there is still going to be some lag between the ratification and when additional hardware will be ready to support it.
Post Your CommentPlease log in or sign up to comment.
View All Comments
pogostick - Tuesday, May 31, 2016 - linkSo what's the deal with ARM being part of the HSA group, yet being like: "ARM did not have too much to say on the matter, but has stated that they never “totally bought into” HSAIL." I'd like to hear more about that attitude and what it means for the future of HSA.
Ryan Smith - Wednesday, June 1, 2016 - linkSure. The following was just added to the G71 article.
With yesterday's announcement of the HSA 1.1 specification, I went back to ARM to ask them whether the new specification impacts the company’s heterogenous compute plans at all, especially given that their architecture doesn’t support the 1.0 standard. As it turns out, ARM is going a route very similar to AMD’s ROCm platform: while the company isn’t utilizing the HSAIL – and thus in the strictest sense isn’t a complete HSA platform – they are using the HSA standard in the development of their hardware.
At a hardware level, the HSA specification standardizes a number of aspects of the hardware for common interoperability and easier programming purposes, including signals, queues, floating point number handling, and other, low-level minutiae about how heterogeneous execution should work. This is separate from the HSAIL, which is more concerned with the software aspects of heterogeneous programming, and though helpful, is not necessary for heterogeneous compute. As a result while Mali-G71 is technically not an HSA platform, in practice it is HSA hardware, using the HSA specification as a means to offer a common and well understood execution model for heterogeneous compute. So ARM is very much on-board with HSA – and is essentially supplying one of the first non-AMD HSA 1.1 hardware designs – even if they’re not using HSAIL itself.
R3MF - Tuesday, May 31, 2016 - link"being used as the basis of their Radeon Open Compute Platform (ROCm), where ROCm adds the additional discrete GPU functionality AMD specifically needs."
Please tell me more about HSA on discrete gpu's?
Ryan Smith - Tuesday, May 31, 2016 - linkThis is the project formally known as Boltzmann.
Underneath they essentially use HSA, with some additional functionality for Non-Uniform Memory Access (NUMA) - since dGPUs don't have physical locality - and better peer-to-peer communication.
R3MF - Wednesday, June 1, 2016 - linkthanks Ryan.
does this have read-across into their consumer products?
MrSpadge - Tuesday, May 31, 2016 - linkI really wish Intel would embrace and push HSA. And put the "huge" percentage of their dies dedicated to GPUs to better use. Trying to use mine for OpenCL number crunching with limited success due to driver bugs.