AMD Reassembles the Radeon Technologies Group: New Leadership Hired & Semi-Custom Folded Into RTG
by Ryan Smith on January 23, 2018 5:00 PM EST- Posted in
- GPUs
- AMD
- Radeon
- Radeon Technologies Group
Ever since Raja Koduri abdicated his leadership post at AMD’s Radeon Technologies Group in favor of a similar post at Intel, the big question on everyone’s mind has been what would become of the RTG and who would lead it. While AMD – and ATI before it – is far from a stranger to graphics, the formation of the RTG in 2015 marked an important point in the development of AMD’s graphics business, as it was finally re-made whole under the AMD umbrella. As a business-within-a-business, RTG was largely seen as Raja’s baby, so his departure was destined to have a large impact.
Now just a bit over two months later, we finally know the fate of RTG. Today AMD is announcing the new leadership for RTG, bringing on two new Senior Vice Presidents to run the division. Meanwhile the company is also making some organizational changes to RTG, which although will keep RTG intact as a singular entity, will see it internally organized into dedicated engineering and business teams, each in turn reporting to their respective SVP heads. This change means that RTG doesn’t have a true, singular leader – not that Raja had full overview over RTG’s business operations, I’m being told – but rather both of the new SVPs report to AMD’s CEO, Dr. Lisa Su, and together represent the business unit.
Finally, while in the process of reorganizing and rebuilding the leadership of the RTG, AMD is announcing that they have also reassigned the semi-custom business unit. Semi-custom is now being folded into the business side of RTG, and while AMD’s announcement doesn’t note that there are any changes in how the semi-custom unit will operate, it will fall under the purview of the head of RTG’s business group.
RTG Engineering Group Leadership: David Wang, Senior Vice President of Engineering
First off then, let’s talk about the engineering side. The new head of RTG’s engineering efforts (and arguably the de-facto successor to Raja) will be David Wang. Wang is a name that some long-time followers may be familiar with, as until earlier this decade he was a long-term AMD employee that rose through the ranks to become a Corporate Vice President of AMD’s GPU and SoC efforts. And as you might expect for someone hired for AMD’s top GPU engineering post, Wang has a long history in the graphics industry, working for SGI before joining ArtX and going through the ArtX-to-ATI-to-AMD acquisition chain. Specific to his experience at AMD, Wang has worked on every AMD GPU between the R300 and the Southern Islands (GCN 1.0) family, so he’s seen the full spectrum over at AMD.
More recently, he’s been serving as the Senior VP of Synaptics, where he was one of several former ATI/AMD employees to jump ship over there around the turn of the decade. So for David Wang, in a sense this is coming back home to AMD. Which off the top of my head makes him the third such high profile engineer to jump out and back in over the last decade, after Raja Koduri and CPU guru Jim Keller.
Wang re-joins AMD at a critical time for its engineering group. With the Vega launch behind it, RTG’s engineering staff is in the middle of development of the Navi GPU architecture and beyond, all the while putting the finishing touches on Vega Mobile for this year and squeezing in a 7nm Vega design for servers for 2019. Vega’s launch has been a contentious one – for engineering reasons as much or more so than business reasons – so Wang may very well be a much-needed breath of fresh air for RTG’s engineering teams.
Officially, AMD is stating that in his position, Wang will be responsible for “graphics engineering, including technical strategy, architecture, hardware and software for AMD graphics products,” the last item in particular being notable, as this confirms that software is staying under the control of the engineering group rather than being more distributed through AMD as it once was.
RTG Business Group Leadership: Mike Rayfield, Senior Vice President and General Manager
David Wang’s counterpart on the business side of matters will be Mike Rayfield, who is being tapped to serve as the Senior VP and the General Manager of the business group. Rayfield is another industry veteran, albeit one without the same immense GPU history of Wang. Instead Rayfield’s history includes posts such as the VP and General Manager of NVIDIA’s mobile business unit (Tegra) and a director of business development at Cisco. Most recently (and for the past 5 years), Rayfield has been at Micron serving as the memory manufacturer’s VP and GM for their mobile business unit, which houses Micron’s mobile-focused memory and storage product groups.
AMD notes that Rayfield has “30 years of technology industry experience, including a deep understanding of how to grow a business and drive results,” which says a lot about AMD’s direction in very few words. Instead of hiring another GPU insider for the post, AMD is bringing in someone from outside the industry altogether, someone who has plenty of experience with managing a growing business. RTG’s business struggles are of course well-known at this point, so this offers AMD the chance to reset matters and to try to fully righten the conflicted business unit.
Rayfield is definitely joining RTG and AMD at an interesting time. On the business side of matters RTG is contending with the fact that Vega (and practically every other GPU AMD makes) is proving incredibly popular with cryptocurrency mining groups, to the point that, above the entry-level market, the North American consumer market has been entirely depleted of AMD-based video cards. In the short term this means that AMD is selling everything they can make, but Rayfield will have to help AMD grapple with the long-term effects of this shift, and how to keep the newly minted mining customers happy without further losing disillusioned gaming customers.
AMD Semi-Custom Folded Into RTG
Meanwhile along with overseeing the traditional business aspects of the GPU group, Rayfield is also inheriting a second job, overseeing AMD’s semi-custom business unit. Previously a part of the Enterprise & Embedded unit, semi-custom is being moved under the business side of RTG, putting it under Rayfield’s control. This reorganization, in turn, will see the Enterprise & Embedded unit separated from semi-custom to become the new Datacenter and Embedded Solutions Business Group, which will continue to be operated by SVP and GM Forrest Norrod.
Truth be told I’m still not 100% sure what to make of this change. AMD’s semi-custom focus was a big deal a few years back, and while the business has done well for itself, the focus at the company seems to have shifted back to selling processors directly, built on the back of the technological and sales success of the Zen architecture and its derivative products. Meanwhile, AMD is citing the fact that graphics is a core part of semi-custom designs as the rationale for putting it under the RTG. AMD still believes that the semi-custom business is more than just providing processors to the game console industry, but I do think that after a few years of pursuing this route, that this is the realization that game consoles are going to remain the predominant semi-custom customer. So in this respect it’s a bit of a return to status quo: game consoles are now once again part of the graphics business. And in the meantime semi-custom has certainly become a lower priority for AMD, as it’s a very regular in volume but relatively low in gross margin business for the company compared to their increasing CPU fortunes.
Finally, speaking of finances, it’s worth noting that AMD is calling this an increase in the company’s investment in RTG. The company isn’t throwing around any hard numbers right now, but it sounds like overall the RTG budget has been increased. AMD has of course struggled mightily in this area as a consequence of their lean years – in particular, it seems like AMD can’t complete as many GPU die designs per year as they once did – so combined with the shift in leadership, I’m optimistic that this means that we’re going to see AMD shift to an even more competitive position in the GPU marketplace.
Source: AMD
50 Comments
View All Comments
mode_13h - Wednesday, January 24, 2018 - link
Not really. Basically, nobody serious is using GPUs for AI without architecture-optimized libraries in the stack. Most major frameworks have support for multiple different backends, or there are at least forks available with vendor-specific optimizations.And that goes especially for the tensor cores, which must be programmed via special intrinsics (or inline asm). Even packed 8-bit integer arithmetic is probably coded explicitly, in inferencing paths, since it's a method one uses very much intentionally.
But leaving aside special features, simply sizing work units to optimally exploit on-chip memory and tuning kernels to use the right amount of unrolling and pipelining is probably more than enough to justify the use of vendor-optimized libraries.
That said, it's nice that, to write a custom layer-type, you needn't be concerned with the particulars of the specific hardware generation just to have it run reasonably fast and be reasonably portable.
Kevin G - Wednesday, January 24, 2018 - link
Intel has at least two different AI chips. Knights Mill is out now and then there is the Nervana technology which will be appearing on package with some Xeons later this year.This excludes FPGA which also has a niche in this market.
mode_13h - Wednesday, January 24, 2018 - link
Knights Mill was too little, too late. It was already outmatched by P100, and now left in the dust by V100.It's just big egos at Intel trying to succeed by doubling down on a bad idea. Intel is big enough to have succeeded in brute forcing x86 into the server & cloud markets, but embedded & AI are too sensitive to its weaknesses.
BenSkywalker - Wednesday, January 24, 2018 - link
On so many different levels this is highly improbable.This is an area a lot of companies are going to scramble like mad to catch up in, only to face significantly greater challenges then were faced going up against that third rate cheap computing solution- x86.
First major obstacle- CUDA is already x86 in AI. People can throw fits, turn blue in the face, be upset about it however they so choose, but it is already a done deal. Despite the idealism behind open standards being appreciable, having a company that has a vested and exclusive reason to invest enormous resources improving one platform simply has enormous real world benefits. Now we are seeing companies waking up to the market that, quite frankly, nVidia built from the ground up.
Second segment of the first problem- development in this field is taught on nV hardware using CUDA. Idealism and practicality don't match up all that often, and this is another instance of that happening. Obviously they aren't the only option, but they are far more dominant now then x86 was in say 1988. What would it have taken to stop that train?
Second major problem- this type of AI is simply fundamentally different in how it is being built. The development end is simply figuring out how to give the processor enough information to learn on its own. This is not to be underestimated.
If Volta started out with a five year advantage over the other players(which may be a bit conservative) that rift grows with every passing day as Volta parts are 'learning' how to run faster on Volta chips running code built for Volta. With Intel devoting billions into the R&D side it may be optimistic thinking they will get a Volta class part out within the next three years, at which point nVidia will have tens of millions of hours worth of machine learning optimized for their significantly faster then Volta parts.
Don't underestimate how huge of a factor this is going to be. Remember when multi core CPUs first started coming out and great lengths were gone through documenting how fundamentally different it was to try and get your code base to multi thread properly- nVidia has pushed out hundreds of thousands of GPUs that are crunching code to figure out how to span that code base to run on thousands of cores.
'Bombe'- Turing's Enigma cracking machine could handle one trillion operations in five years. This machine was built because it was significantly faster then the smartest people in the world working in tandem. One trillion operations in five years. We are dealing with one hundred trillion operations per second *per GPU* now. When you consider the data analysis structure for how to thread thousands of concurrent threads concurrently and make it work- it would be a problem not unlike Enigma on a fundamental level. A machine is going to beat a human at this, this truly is thematically much like the first Turing machine.
Now if we were going to see this market wouldn't be mature for another twenty years, then maybe it would be a more level playing field. Reality is we are going to see a logarithmic increase in throughput of operations as GPUs get better at 'programming' themselves and in five years this market will be far larger then data centers are today.
This is fundamentally different then what we had even in science fiction as a notion as to what AI was going to entail. Creation wasn't ever something that was contemplated for a machine, but we are already at that point, in a rudimentary fashion at least.
Market wise, this becomes a much bigger issue as even if Intel shows off some amazing new technology that can directly go toe to toe with nV's biggest and baddest three years from now- the workload actual performance rift will be staggering- thousands of EFLOPS worth of combined computing power for years is simply going to be a staggering obstacle to overcome.
Now, I'm not saying it is impossible- but we would have to assume Intel would execute perfectly- use their superior fabrication capabilities, hire all of the best AI low level guys they can and get that code base up and running *NOW* to get some level of optimization hammered out years before their product actually hits(the logarithmic element here is their only possible saving grace).
Unfortunately, AMD is pretty much DoA here. Intel is throwing more money then AMD's net worth at trying to compete with nVidia in this segment already- and they aren't making much progress so far(yes, they have all this R&D coming- so does nV). They may be able to do OK if they can come up with AI chips for handling your toaster, or maybe the thermostat in your house, but they simply don't have the money to even attempt to play in this field. The best they can hope for is that nVidia gets distracted enough for them to stumble in the consumer GPU space(possible- their top talent is going to be AI/Datacenter focused now, we know this because they aren't idiots).
Google already came up with a dedicated ASIC- Volta beats it at the singular task it can do while being a complete compute solution. Yes, for particular tasks, specifically simple ones, ASICs are going to be completely viable and very efficient to boot. For the big dollar, big iron tasks? nVidia is an absurdly prohibitive favorite right now.(Check the markets, there is a legit discussion on if Apple or nVidia will be the first trillion dollar company).
mode_13h - Wednesday, January 24, 2018 - link
Damn, dude. You sure like pontificating, for someone who's obviously not a practitioner. Several errors in your statements/thinking, but you'd probably just waste my time arguing with me if I did you the favor of pointing them out.Protip: https://en.wiktionary.org/wiki/then https://en.wiktionary.org/wiki/than
(sorry I can't post a better link, but it gets flagged as spam)
BenSkywalker - Thursday, January 25, 2018 - link
Why not take the time in pointing out the 'errors'? Should be interesting. How much positive revenue has your input on this market generated over the last few years? You are saying my market analysis makes it clear I'm not a practitioner, just want to set some clear guidelines for what matters in this conversation.
mode_13h - Friday, January 26, 2018 - link
> CUDA is already x86 in AIHPC? yes. AI is different in that people use deep learning frameworks. While Nvidia succeeded in getting popular frameworks optimized for their GPUs, first, they do not dominate with their proprietary frameworks, and most vendors have optimized backends for popular frameworks like Caffe, TensorFlow, etc. or at least support their models.
> Volta parts are 'learning' how to run faster on Volta chips running code built for Volta.
It's not the parts that are learning - it's the developers. But I assume you're just employing a rhetorical flourish, here. Not advisable, as less knowledgeable will take it literally.
> get some level of optimization hammered out years before their product actually hits
You overestimate how much code there is that needs to be optimized. Just take a look at the cuDNN API - it's not huge. The latest version has just a single header file < 2k lines.
The reason so many players think they can compete in this space is that the core of deep learning tech is really rather simple. Of course, there's an ever-increasing number of details between that and a usable and competitive product, but optimistic entrepreneurs and investors are clearly drawn to the simplicity at the heart of it.
> AMD is pretty much DoA here.
Vega is competitive against the P100, for training, and has even more strength in inferencing. The problem for them is the V100, but it could be answerable if they build something analogous to tensor cores. That they're planning a 7 nm machine learning processor would give them a chance to deliver a chip even better focused on AI than V100 (which also targets HPC and graphics applications). The big question is what competition it'll have, by the time it launches.
BenSkywalker - Friday, January 26, 2018 - link
You seem to be focusing almost entirely on the semantics side, not the actual implications the statements were making.CUDA is already x86- Going back to 1988 you wouldn't have the situation with CUDA comparable to where x86 was? HPC is looking more like 1999- that fight is over. I cited a year I thought the overall markets were comparable.
Saying it's not the parts that are learning- it was shorthand for what real world implications there are.
AMD is DoA in this market, that's not semantics(this is the one point I would very strongly argue that you took issue with), that's every logical conclusion.
https://instinct.radeon.com/en/product/mi/radeon-i...
Seven months now, where is it? If you were a project manager and *this* is the company you were building our mission critical platform around- I would fire you in an instant. I can assure you I'm not alone. They are simply too inept to trust to something that important as they have repeatedly and loudly demonstrated.
Vega's hype train was 'Poor Volta'. The MI 25, if it actually shipped, was supposed to be one quarter the speed of Volta with a higher power draw- and they couldn't even manage to ship that. Not only are they, on a technology basis, significantly behind- even in a hypothetical situation where they did manage to release a competitive part- who is going to risk their jobs buying into what has been a long running history of overhyped BS?
Intel and IBM if you bet 'wrong' with them, you can make a strong case for yourself, obviously nV is the prohibitive favorite so that is a safe bet. If you are throwing down millions on a platform that is much slower, more power hungry and, what is likely by their biggest issue- nine months late coming on line? I think only a fool of a PM would even consider AMD, and that was if they were actually remotely competitive(which we know they aren't).
mode_13h - Saturday, January 27, 2018 - link
If you're talking about where AMD is today, then I agree that they're not a significant player. My original comment stated that they're well-positioned to deliver deep learning solutions vs. the ASIC players, due to the inherent flexibility and scalability of GPUs.In point of fact, their software situation has been a major hurdle for them. ROCm was the right thing to do, but a couple years too late. Had Vega arrived when it was originally rumored, this would've been disastrous. As it is, it kept us from considering Polaris for a more cost-sensitive application.
I think AMD's CPU division has demonstrated an ability to execute, and I believe what we're seeing are the winds of change now sweeping through their GPU division. I think their execution on Vega shows the lingering effects of years worth of budget and staff cuts. We'll see if they can right that ship, but I believe in Lisa Su.
edzieba - Thursday, January 25, 2018 - link
Why hello there Fermat.