Hot Chips 2020 Live Blog: Intel's Raja Koduri Keynote (2:00pm PT)by Dr. Ian Cutress on August 17, 2020 4:50 PM EST
04:54PM EDT - Hot Chips has gone virtual this year! Lots of talks on lots of products, including Tiger Lake, Xe, POWER10, Xbox Series X, TPUv3, and a special Raja Koduri Keynote. Stay tuned at AnandTech for our live blogs as we commentate on each talk. Intel recently had its own Architecture Day 2020, with Raja Koduri and other Intel specialists disclosing details about process and products. It will be interesting to see if Raja discusses anything akin to roadmaps in this keynote.
04:58PM EDT - Raja M. Koduri, Senior Vice President, Chief Architect, and General Manager of Architecture, Graphics, and Software, Intel'
05:01PM EDT - The title of the talk is 'No Transistor Left Behind'. Raja has had it on a t-shirt at a number of events
05:04PM EDT - 'Raja has spent his career enhancing accelerate compute in the technology industry, across graphics, vector compute, consoles, and semi-custom designs'
05:05PM EDT - First, paying tribute to Frances Allen, who recently passed away
05:08PM EDT - The balance of software abstraction and performance hardware execution is the boundary that Frances worked on and we still work on today
05:09PM EDT - A little of 20 years, Intel senior architects (hardware and software) got together to discuss heterogenity in Intel's roadmap and software roadmaps
05:09PM EDT - They all knew each other, but many of them were meeting each other for the first time
05:10PM EDT - That discussion is where the phrase 'No Transistor Left Behind' comes from
05:10PM EDT - David Blythe is Xe senior architect
05:11PM EDT - The role of hardware/software in our lives
05:11PM EDT - COVID has shown how vital the progress of the decades of tech improvements has become
05:11PM EDT - Technology has led disruptions
05:12PM EDT - Predicting the future is tough, but we expect to see 100 billion devices - intelligent computing
05:12PM EDT - Accessing data and compute from anywhere - exascale for everyone
05:12PM EDT - like electricity
05:12PM EDT - 10x growth opportunity for the industry
05:14PM EDT - A balance of performance vs general purpose
05:14PM EDT - Leveraging data to build intelligence - data that isn't analyzed isn't useful
05:14PM EDT - We need more capacity and more bandwidth at every level
05:14PM EDT - We need bandwidth to achieve exponential growth
05:15PM EDT - Gaps between what we have for memory today for AI vs what we need
05:15PM EDT - We need superhuman-style computing
05:15PM EDT - Now Moore's Law
05:16PM EDT - People have predicted the end of Moore's Law for decades
05:16PM EDT - Moore's Law is how we've built the last two eras of computing
05:16PM EDT - It has been harder and harder to deliver the required metrics
05:16PM EDT - But it's definitely not over yet
05:17PM EDT - There is plenty of room at the top
05:17PM EDT - Software helps us to get there as much as hardware does
05:17PM EDT - Python vs AVX512
05:17PM EDT - Over 100x perf on the same CPU with software updates
05:17PM EDT - New AI workloads allows vector optimization opportunities that weren't there before
05:18PM EDT - Transistor scaling though isn't helping as much as it used to
05:18PM EDT - Whatever we call the Moore Law in the modern age, we believe transistor density 50x easily
05:18PM EDT - 3x in FinFET itself
05:19PM EDT - x2 in Nanowire
05:19PM EDT - Nanowire stackeds for another 3x
05:19PM EDT - This is regular pitch scaling might stop
05:19PM EDT - then wafer-to-wafer stacking for 2x
05:19PM EDT - Then die on wafer stacking for 2x
05:19PM EDT - All of this is happening in labs across the world
05:19PM EDT - The vision will play out over the next decade or more
05:20PM EDT - Heat dissipation is a challenge too
05:20PM EDT - Room at voltage, capacity scaling, new pacakinbg, frew scaling, new architectures
05:21PM EDT - Also packaging - the future of Foveros is hybrid bonding
05:21PM EDT - Simpler interconnects with lower capacitance and lower power
05:21PM EDT - Stacked SRAM test chip recently taped out
05:22PM EDT - Significant investment allows Intel to drastically adjust its view on next gen packaging for end-user product
05:22PM EDT - Now memory hierarchy
05:23PM EDT - (the dreaded pyramid of optane)
05:23PM EDT - And the inverse next gen pyramid
05:23PM EDT - Need 10x improvement across the board
05:24PM EDT - Brainstorm next gen requirements with Tim Sweeney about next gen MMO
05:24PM EDT - Support 1000s of users or more at once with Hardware and Software
05:24PM EDT - But also make general purpose and accessible to everyone
05:25PM EDT - First, this is how hardware companies think:
05:25PM EDT - This is the concept we were thinking
05:25PM EDT - 25 cores per CPU - with density, go up 100x - 4x boards, then racks for 1million cores
05:26PM EDT - It's all about the interconnect!
05:26PM EDT - Now software
05:26PM EDT - The grumpy person reminds Raja of Jim Keller
05:27PM EDT - This contract between hardware/software is what matters
05:27PM EDT - All about ISA + OS developers
05:27PM EDT - It's all about performance and generality
05:28PM EDT - Rich software stack on x86 today
05:28PM EDT - The more abstraction, the more developers
05:28PM EDT - Abstractions are very leaky
05:29PM EDT - It's a Sisyphean effort
05:29PM EDT - What are the hardware/software contracts of the future?
05:29PM EDT - x86, Arm, RISC-V, AI, GPU, Memory, Network
05:30PM EDT - Intel is adding heterogenity in the CPU socket
05:30PM EDT - Beware of beyond Cooper Lake
05:31PM EDT - 3-5 years to see adoption of new hetero ISA extensions
05:31PM EDT - That's a broad software ecosystem statement
05:31PM EDT - The key to this is to give developers performance at every level
05:32PM EDT - Ninja developers at the low level can offer non-linear improvements higher up the stack
05:32PM EDT - Any abstraction needs to be scalable - open and accessible to all, Have to retain productivity at all levels while also maintaining perf
05:32PM EDT - Misconception that python isn't used for performance
05:33PM EDT - Ninja programmers are rare, but very important for performance
05:33PM EDT - Important to support ninjas
05:33PM EDT - Scaling across every product
05:33PM EDT - Level sub-zero
05:34PM EDT - OneAPI
05:34PM EDT - Still early days
05:34PM EDT - OneAPI beta available on Intel Dev Cloud
05:36PM EDT - Scale from sensors to edge to cloud
05:36PM EDT - Where will be in 2021
05:36PM EDT - milliwatts to Megawatts
05:37PM EDT - XeHP GPU !
05:37PM EDT - 1000x in compute by 2025
05:38PM EDT - Exascale for everyone
05:38PM EDT - Now time for Q&A
05:39PM EDT - Actually a few more comments first
05:45PM EDT - More complex hardware in the future
05:45PM EDT - Now Q&A
05:48PM EDT - Q: Integration between CPU and GPU A: We've been doing a lot time for the PC space, what hasn't been done yet is in the DC and at scale. The key is figuring out the programming model that scales - at the moment we see them a scalar/vector/matrix and it's all about combining them and building the programming model. Physical integration is also key, at high performance.
05:48PM EDT - Q: Does intel plan the open source Xe dGPU code, as with the Gen11, or will it be closed stack? A: We are pretty active in Linux open source. Xe drivers in Linux will be Open Source.
05:51PM EDT - Q: Does ISA matter in a future of accelerators? A: Great Question. It's the central thesis of the talk. DSA - do you need an ISA, or not? My thesis on ISA is important is for the general purpose, for the mass install base - architectural impact based on that hardware software contract. Lots of us have worked on DSAs, but when you talk generality, today, ISA still matters. If you move the contract up the stack, is that in the form of an ISA, how does it look? It's a trillion dollar question, I'm not proposing that I have an answer, but my talk is that we are working on it, and we will share what we find through OneAPI, and in some ways it's a call to action for the whole community. It has to cover the whole industry, not just one vendor or architecture.
05:53PM EDT - Q: Security HW vs SW, direction in industry vs academia? A: Great Question. I could have spent more time on Security if I had more time and Intel's vision! It's super important. The surface area we are generating over all these layers of hierarchy - the security attack surface area is growing more than exponential. It's scary! It's a big call to action for the community to. Security is hardware as we move forward, not easier. Architecture opportunities and simplications, in both hw and sw is daunting.
05:56PM EDT - Q: ML revolution - libraries or GP stack? A: Great question. We already have special paths in TF and pytorch - the inner loops have been phenominally accelerated in the last few years. As we do the analysis in the workload, we are seeing the bottlenecks shifting around as we optimize. The algorithm rate of change is quite high - with the community and our customers, I have a lot of conversations about generality of future approaches. It's a whack-a-mole. Right now the need for generality software stack is potent and there are lots of discussions that are active. Is there an API that develops a better scalable contract? It's hard to tell.
06:00PM EDT - Q: Other approaches of wide purpose compute like OpenCL haven't succeeded. What makes OneAPI work? A: At one point there was abstractions of the GPU hardware, and even with all the limiotations a decade ago. I don't believe at OpenCL it wasn't really taken a step back to look at the overall compute problem. If you go back more than two decades ago, the work in high performance computing systems in languages there is a lot of golden nuggets and answers that sit on those infrastructures. That's one of the things we look at. Personally I also have been a big fan of abstraction of Apple's grand central dispatch. I know swift and concurrency models in swift have made some amazing progress, then there's Apache Spark too. If you look at those models on those software frameworks, there is somehting there for us a hardware community to pay attention to. I won't say OpenCL is a great example is a great example to cover all forms of parallelism (dense, sparse, async, task) and memory heterogenity is a big deal - how do we cover that? That's a harder problem in my opinion.
06:01PM EDT - Q: About mem power efficiency how do you see 10x required BW/power scaling? A: The opportunities I see is that we have to get compute closer to memory (or memory closer to compute). As I alluded to, we're doing so interesting things with new products, like Rambo cache which we've announced. If you look at both capacity and latency at every level, you do see that 10x opportunity. 10x doesn't seem to hard, but memory is hard, because memory isn't just a hardware/tech problem, it's also a big business problem!
06:04PM EDT - That's the end of the Q&A. Now onto the third session. First up, a RISC-V talk