An Interview with Lisa Spelman, VP of Intel’s DCG: Discussing Cooper Lake and Smeltdownby Dr. Ian Cutress on August 15, 2018 8:00 AM EST
The main highlight of Intel’s Data-Centric Innovation Summit last week was the official disclosure of the company’s roadmap for the enterprise Xeon processor family. While various codenames and features had been suggested in the press before, at the event Intel officially put up the signs declaring Cascade Lake for 2018, Cooper Lake for 2019, and Ice Lake for 2020. Along those announcements were a few new features in the mix, although not all have been disclosed so far. Cooper Lake was also mentioned on 14nm, with Ice Lake on Intel’s 10nm process.
As part of the summit, we were given an opportunity to sit down with Lisa Spelman, VP of Intel’s Data Center Group and General Manager of Xeon Products, about the new processors. Over the course of 22 minutes, we touched on the new platforms, some of the new features, the progression of the different product families such as 3D XPoint, how Intel is implementing hardware fixes for side channel attacks such as Spectre and Meltdown, as well as some discussion about Intel’s new generation: Raja Koduri and Jim Keller.
Lisa Spelman is categorically what I call an ‘Intel Lifer’, having spent almost 20 years in the company. Her role has evolved very much into one of the faces of Intel’s Data Center business and the Xeon product portfolio, discussing technical aspects of the product lines but also business and marketing strategy in detail. Her previous roles have included being an internal analyst, technical advisor to the CIO, the Director of Client Services, and Director of Datacenter Marketing.
Ian Cutress: Traditionally Intel’s enterprise Xeon portfolio goes through refreshes every 12 to 18 months, normally on the latter end of that spectrum. The announcement today would seem to suggest that the next two generations, at maximum, would only be 12 months apart. With a lot of your customers and partners being used to the longer cadence, will Intel be marketing to upgrade early based of generation-specific features? What are the customers saying to this new cadence?
Lisa Spelman: I think that one of the things that we face as Intel, even though we talk about it a lot, is that the perception of who our customers are is usually quite small – our customer base is very large. You have your cloud service providers, enterprise, government, comms service providers and such, and they have very different things that they need. We have certain customers that have a higher affinity towards staying on top and are always the fastest to transition. Their goal is to stay on top of everything. Compare that to a carrier, who might wait for products with specific features. A lot of people still think of us as a server CPU company without looking at this breadth of what we’re solving for. That doesn’t mean that every single company is going to do Cooper and Ice.
What we will see is a fast ramp onto Cascade Lake, with the Optane Memory not just as a differentiator for Intel but having actual workload value. We’re going to see a lot of people who are interested to move there. The AI extensions for inference, hitting that workload, and there will be a subset of the market interested and attracted to that. Then we have the security mitigations, the hardware based security mitigations, and so we think that will lead to a fast transition over Skylake. Then with Cooper and Ice Lake we expect that customers, particularly end customers, will look at both and figure out which of the feature sets and timing and all of that best fits in with what they need to achieve.
IC: So do you expect both of those platforms, Cooper Lake and Ice Lake, to have significant overlap in their availability?
LS: We see those platforms as being concurrently available. It will be a change for some of our ecosystem partners and customers, but at the same time some of that has already been built in. For example, when we launched Xeon Scalable, we unified the stack – we had 2/4/8 sockets systems ready at the same time whereas previously we had always had multiple month gaps between E5 and E7 launches. So we have worked with our ecosystem through transitions like this before and we will we work with them on it again. We will see customers make choices. So when we say Cooper will start shipping in 2019 and then Ice Lake as a ‘fast follow-on’, rather than think of Ice Lake as a 12-18 month cadence you should think of it as much shorter than that. We’re talking the middle of 2020.
IC: So it comes back to the worry if customers should adopt a platform like Cooper Lake if Ice Lake is only a short time away.
LS: So there will be choices that people make within that timeframe. The Cooper Lake platform might have the right timing, or it might have the feature set they are most interested in, or the Ice Lake new features might not be the ones they are betting their business on. There are a variety of things that will happen and we’ll talk about what the feature roll-out is going to be later down the road.
But we are talking to customers about it now, we’re trying to clarify what those choices and options will be, as well as taking our customers feedback about how we enable those parts to co-exist in the market together.
IC: There was a graph today in the presentation showing Skylake adoption of unit shipments, and there are still some Broadwell shipments and even Haswell. When Intel states that Cascade will be a fast ramp, do you expect that customers will look at the product stack and move from older platforms ignoring Skylake?
LS: I don’t think that anything we’ve done with Cascade Lake is something that would lead to demand stall, because Cascade fits so comfortably with the Purley platform. So any of the work that our customers do now to qualify and validate is getting that customer ready for that. We see this with some of the big guys, not just the cloud service providers but enterprise as well that are interested in Optane. They’re doing the work on Skylake in preparation for Cascade. So there tends to be a faster transition within in the same platform, as long as there is no memory transition from say DDR3 to DDR4. The added addition of the security mitigations will motivate a lot of people to do it faster.
IC: From the announcement of 3D XPoint through to storage and now persistent memory, when you say that Intel is hoping for a quick ramp and adoption of this new technology within the markets that can use it, do you feel confident that production of Optane memory will satisfy demand? It sounds like you are expecting a lot of demand.
LS: We see no capacity issue on that side. Rob Crooke’s team has invested on behalf of the company in fab capacity in order to be able to satisfy that demand. The very best way to run a memory or storage business is to have a fully loaded fab and so we’re going to help fill that fab with the demand on memory. We think that this is one that we will satisfy volume out of the gate. We think because it is new, you get a lot of interest and the way that the market is shaped enables you to liken it some of the detailed work that happens on Silicon Photonics. What I mean is that you’ve got these definitional customers that you are working with while you’re deeply characterising the workload and market to make sure that you hit those needs, and so you have a pretty good idea of sizing. What is going to happen in the first year of its release, which a lot of the engineering teams are super excited about, is to see what didn’t we think of, what use cases are out there that once it gets in the hands of developers we didn’t think of. You’re looking at large VM instances, SAP HANA, and other database applications, things like that.
IC: Given the recent announcement about the upcoming split with Micron beyond second generation 3D XPoint, you are now suddenly in a position to deliver the first generation of a product that has just had its roadmap blurred. What are you telling customers?
LS: You know honestly, Navin (Shenoy) was not joking when he said that we typically don’t talk to customers about process nodes, it’s always about products. Since I joined DCG, almost five years ago, I’ve never talked to a single customer about their expectations of our Micron agreement. They’re like ‘whatever, manage it’, and they look at us and the product and the delivery of it, and I think that part of the reason that that is not a particular concern from our customer perspective is because they’ve seen and watched the deep integration we’ve done of making Optane memory with Xeon CPUs. So they know of all the work that has been done in the memory controller on the CPU has been 100% by Intel and the media (the 3D XPoint) has progressed along to the point where there are SSDs out on it. So I don’t perceive that we have a confidence problem there – we’re always interested in feedback – but it just is not a thing that as we talk about all the things we’re going to do to Optane and what the changes will be to the applications that look at DRAM in a typical way and look at SSDs in a typical storage way, the issue of our venture with Micron is in no way coming to the top of those discussions with customers.
IC: On the roadmap slide we did see mention of Spectre and Meltdown mitigations. (LS: It said hardware security mitigations!) Are we expecting a hardened design?
LS: Yes that is definitely the intent. So you see everything that we learned even ahead of the release of those and starting to work back in changes you would make inside the silicon. So we did software mitigations, then working back to hardware as soon as we could, and we were able to intercept timing for Cascade Lake in order to get it in. That will continue through.
IC: So you are already at a point in the design of Cascade Lake where you could re-architect parts of the core to have the relevant mitigations?
LS: Yes. Ronak (Singhal), one of our lead CPU architects, led that effort. We had to make choices in order to do that. To keep everything on the best case schedule we had to make a choice to intercept and stop tape-ins to make sure that we put this in. So we had to do the engineering work and then get it in. It was our belief that that is what we had to do, and the right decision for our customers in the ecosystem.
IC: As more and more of these side-channel exploits are popping up and people are realising that this new type of attack vector exists, can Intel take measures against similar vulnerabilities that might occur in the future?
LS: We did that first round of hardware mitigations into Cascade Lake. From my perception was from inside the company, at the time that we were putting those in, we knew that Spectre and Meltdown would not be the only ones. That doesn’t mean that you know what all the other ones will be, and you know it is a new family, and we said that a bunch of PhDs all changed their theses for it! That’s not to make light of it because it has become this new area to look at so you will end up with some really smart people across the planet looking into the box to see what they can do. We have similar folks inside the company doing the same thing. We’ve seen already that there have been follow-ons to Spectre and Meltdown from fast research that has been published, and you should have hopefully seen a consistent response from Intel. With that consistency we expect that the hardware mitigations that we have included into Cascade Lake will also mitigate against this new attacks.
IC: So are there more dials and options inside the hardware now to adjust should new issues come along?
LS: So we always go for our fastest fix where possible, which is usually software/firmware, and silicon fixes require a longer lead time. So we will still work hard on microcode updates and we have certainly focused our research more intently at this problem as well. What we would never say is that the situation is ‘complete’ – it would be foolish to do so. Researchers are still looking at it, our own internal teams are looking at it, and we will continue to evolve and iterate on them. We will continue to make good decisions on this – it was the right decision to stop a product, get the mitigation in, take the couple of weeks hit, and move on. We will continue to do that too.
IC: Are you saying that it took two weeks to fix and design in hardware?
LS: It certainly took longer than that, but if you look at the intercept of our product flow, we held back the tape-in until we had the IP in place. And now we’re making it up in the back end of silicon development to keep the product on track.
IC: When the software mitigations came through, there was talk about a performance deficit, roughly of 3-10% on the latest platforms depending on the workload and system. Do the hardware changes put performance back on track – were you able to implement the changes without the same performance deficit caused by the software mitigations?
LS: Our expectation is that the hardware fixes put the performance back on track. We will be measuring this as we get through final testing of the silicon but at the same time we are also getting silicon enhancements, so the comparison is sort of apples to oranges – either way performance is set to be increased. To be honest, I don’t know if the logic about if the mitigations still causes a performance deficit is relevant, given that the overall platform performance with each new generation with the new features is set to be higher than the pre-patched systems regardless. We are always moving forward to increase performance per core. We always design the core with security in place – I’m not saying that those security measures will always have zero performance impact, but what I would say is that Intel is both a performance and security driven company and so we will continue working on those. But the expectation is, and this is not unique to security, is that moving something from software to hardware generally you accelerate performance.
IC: Ice Lake on the chart was shown for 2020, still two years away, but on 10nm. Can you say if that is 10nm or 10+?
LS: I think that’s just getting into a mess of naming things that doesn’t serve anyone’s interests.
IC: I certainly understand that it might not matter from a product perspective, when you tell the story of the performance of a product and the solution, however from the engineering perspective and understanding Intel as a company, this is important.
LS: I’ll tell you more about Ice Lake at a later date. How about that?
IC: One of Intel’s big technology enhancements has been the embedded multi-die interconnect bridge (EMIB). We have seen it in some of the FPGA products – when can we expect to see it in the Xeon Scalable product line?
LS: I will tell you more about that at a later date! But I can say we are invested in our packaging technologies and EMIB has a lot of promise. It was kind of fun with Jim (Keller) here today on stage talking about all the stuff that Intel does and puts together into products, so when an outside team comes in they end up being a little surprised about all the things that we can do. The way that outside people can come in and think about all the ways in which we can use our advancements is really cool, but I don’t have a product intercept announcement to share with you.
IC: So how much have you worked with Jim Keller and Raja Koduri since they joined Intel?
LS: I’ve worked with them a lot more recently having worked with them for this event! For today we’ve been working with general roadmap stuff and they’re getting aligned with the folks on their team in the data center and it has been good to get them in there. Having Jim provide his outside-in product development experience has been great – I mean we have a lot of highly capable and smart engineers at Intel for all the products from design through to production that have been Intel only for over 20 years, so when you get someone who has done a variety of products at multiple companies with different mechanisms and foundries, he is able to provide a lot of good insight to us on things that design team can do and the manufacturing teams can do to improve yield.
But, Jim is very funny! When Navin introduced him onto the stage and mentioned that people call him a rockstar and that he might be embarrassed, and he said he was absolutely not embarrassed. I love it. When I was talking to him about getting ready for the event, about planning and everything, he jokingly said to me ‘do you plan on having me on stage and look grumpy because I can definitely do that’, and I said ‘sure, you can do that! I’m not trying to script you’. I think he has a good energy and he’s driving some of the enthusiasm in our engineering teams as well.
IC: Intel has discussed discrete graphics plans for the future – how do you envisage that intersecting with the Data Center Group especially given there are already so many products in the portfolio to choose from that can kind of cover those areas?
LS: We have already shared that we will be doing a discrete graphics product in client, and we will do one in data center too. But it is an established category.
IC: It is, but it is a new category for Intel.
LS: My point was more that we have other silicon opportunities that we are working on to address some of these workloads – it is something that I think our customers will be able to easily recognize and figure out where they are going to use it. Through all of the work with customers on the graphics front, the media front, the AI front, and talking with them about this and our hardware portfolio, I think that there is a lot of alignment around this ‘one size does not fit all’, but also the need to think through your software strategy in support of that. So it’s our job to make our customers use the right Intel silicon for their workloads. So we are working on that deeply and concurrently with product development. Which is good for us!
Many thanks to Lisa for her time! Up next on the interview list, hopefully, is Raja Koduri. We spoke for a few seconds before the start of the event, and he said he will be ready to speak when he's got something to say! He also sneakily took this photograph while I was concentrating on the presentations: