Hot Chips 2020 Live Blog: IBM's POWER10 Processor on Samsung 7nm (10:00am PT)
by Dr. Ian Cutress on August 17, 2020 1:00 PM EST- Posted in
- CPUs
- Enterprise CPUs
- IBM
- Live Blog
- POWER10
- Hot Chips 32
01:08PM EDT - Time for Power10! Bill Starke and Brian Thompto
01:09PM EDT - Bill is chief architect of POWER10
01:09PM EDT - Brian is chief core architect
01:09PM EDT - Power roadmap - power is about the enterprise
01:09PM EDT - It's the building block for the world's most powerful supercomputers
01:10PM EDT - Financial systems, commercial, healthcare, governments
01:10PM EDT - Power10 is made smarter for everyone
01:10PM EDT - First hardware back in the laps
01:10PM EDT - On track to deliver systems in 12 months
01:10PM EDT - New abilities, ground-up rearchitecting for power efficiency
01:10PM EDT - maturing AI landscape
01:11PM EDT - AI acceleration in the processor core
01:11PM EDT - Integrating into enterprise workflows
01:11PM EDT - 18B transistors on Samsung 7nm, 602B transistors
01:11PM EDT - Two versions of the core: SMT4 and SMT8. This chip is the SMT8 version
01:12PM EDT - 16 physical cores, but 15 will be enabled. Improves economics of yield
01:12PM EDT - High bandwidth PHYs, OMI and PowerAXON and PCIe G5
01:12PM EDT - Two packaging options: Single and Dual chip modules
01:12PM EDT - SCM allows for 16-socket, DCM is 4-socket
01:12PM EDT - Dual chip module is two 602 mm2 chips into one package
01:13PM EDT - 16-socket for big iron systems
01:13PM EDT - PowerAXON and OMI support 1TB/sec each
01:13PM EDT - 150 micron bumps
01:13PM EDT - optimized placement for packaging
01:14PM EDT - PowerAXON is for chip-to-chip connectivity
01:14PM EDT - Several new scaling capabilities
01:14PM EDT - OMI is OpenCAPI Memory Interface
01:14PM EDT - Grandchild of Centaur memory
01:15PM EDT - Tech agnostic - supports any media with OMI buffer
01:15PM EDT - Supports DDR4 at 410 GB/sec bandwidth per Power10 CPU
01:15PM EDT - Will support DDR5 when DDR5 is ready - no new system, just need new OMI buffer chip
01:15PM EDT - Also supports GDDR for up to 800 GB/sec
01:16PM EDT - Also supports storage class memory up to 2 TB
01:16PM EDT - PowerAXON supports direct attach SCM or ASIC/FPGA
01:17PM EDT - Memory Inception comes to Power10 - access memory from any socket in the cluster
01:17PM EDT - Full hardware load/store access to other server memory
01:17PM EDT - Only +150ns compared to accessing far memory within the same server
01:18PM EDT - Supports up to 2 PB of memory
01:18PM EDT - Connect multiple 16-socket systems with Memory Inception
01:18PM EDT - Or servers without memory borrowing from a big server
01:19PM EDT - Paging tables as routing tables
01:19PM EDT - Robust virtual channel management
01:19PM EDT - Allows 1000s of nodes to access memory across the whole system
01:19PM EDT - Pod-level memory resource pooling with extra gear
01:19PM EDT - Memory disaggregation becomes a reality.
01:19PM EDT - Also 64 lanes of PCIe G5
01:20PM EDT - 2.2-4.4x socket performance compared to Power9
01:20PM EDT - *602mm2, correction from earlier
01:21PM EDT - Up to 8 threads per core
01:21PM EDT - +30% average perf against POWER9, +20% in ST
01:21PM EDT - 2.6x perf/watt improvement
01:21PM EDT - DCM is more efficient
01:22PM EDT - In SMT8 mode, 15 cores per chip. In SMT4 mode, 30 cores per chip
01:22PM EDT - Core is modular
01:22PM EDT - Container based stack support over PowerVM hypervisor
01:23PM EDT - High performance nested hypervisors with enhanced security
01:23PM EDT - Power ISA 3.1
01:23PM EDT - 64-bit prefix instructions in a RISC-friendly away
01:23PM EDT - New op-code space for instruction instruction
01:24PM EDT - Optimizations for memory tiers
01:24PM EDT - Security and isolation
01:24PM EDT - Crypto perf for future algorithms already accelerated
01:24PM EDT - Secure containers supported at hardware and virtualization layers
01:24PM EDT - Full memory encryption
01:25PM EDT - Active management for enhanced performance and avoids side channel
01:25PM EDT - Here's a core diagram - this is half an SMT8 core
01:25PM EDT - Each SMT4 segment can do 2x512b and 4x128b per cycle
01:26PM EDT - 4x in mixed math acceleration
01:26PM EDT - 1.5x L1-cache, 4x L2, 4x TLB
01:26PM EDT - 1000 instructions in flight per SMT8 core
01:26PM EDT - L2 is 13.5 cycle
01:26PM EDT - L2 is 13.5 cycle
01:26PM EDT - L3 is 27.5 cycle
01:26PM EDT - New tag predictors
01:26PM EDT - Branch execution has been improvement
01:27PM EDT - New instruction fusion opportunities
01:27PM EDT - Eliminates dependencies
01:27PM EDT - Fuse consecutive load/store instructions, double wide load/store bw
01:27PM EDT - Improved clock gaiting
01:27PM EDT - each design element was redesigned for performance and efficiency
01:28PM EDT - Redesigned major structures such as queues
01:28PM EDT - 1.3x perf at 0.5x power vs Power9
01:28PM EDT - = 2.6x perf/watt overall at the core level
01:28PM EDT - 3x perf/watt at socket level
01:29PM EDT - Also improved memory bandwidth
01:29PM EDT - 2x bytes from all sources: L1, L2, L3, OMI
01:29PM EDT - 4x 32B loads, 2x 32B stores per SMT8 core (Fusion required)
01:29PM EDT - OMI to one core - 256 GB/sec peak, 120 GB/s sustained, 3x L3 prefetch and mem prefetch extensions
01:30PM EDT - 8 SIMD 128-bit engines per SMT8 core
01:30PM EDT - supports fixed, float, permute
01:30PM EDT - 4 512b engines per SMT8 core
01:30PM EDT - supports FP64, FP32, FP16, BF16, INT16, INT8, INT4
01:31PM EDT - New MMA enhanced infernece acceleration
01:32PM EDT - Simple library update needed in most cases
01:32PM EDT - Implements data-reuse efficiency
01:32PM EDT - 3x inference latency reduction
01:32PM EDT - Improvements over POWER9
01:33PM EDT - Time scale for Power10 is that initial systems for IBM partners will be available Q4 2021
01:33PM EDT - (IBM usually does this - announce a core/product 12 months in advance)
01:33PM EDT - To allow for customers and developers to adjust
01:34PM EDT - Q&A time
01:35PM EDT - Q: PCIe Gen6? Will future Power10 enable this? A: No talk about our future products. We're glad that PCIe is speeding up, we always look at market conditions to create chips.
01:36PM EDT - Q: Read latency increase with OMI DIMM? A: less than +10ns
01:37PM EDT - Q: Did power delivery get upgraded, or still on-die LDOs? A: Go into detail at ISSCC. Still similar delivery platform of Power9
01:39PM EDT - Q: Does POWER and z work together? A: Yes, all the time. Peer review each other. We get questions about arch differences - each product is suited for each client bases. Extremely justified. We do the peer review, so we becomes experts in both. We share IP as well, like OMI, as well as other features. Also physical design etc. Lots of synergy, but also lots of differences
01:39PM EDT - That's a wrap. Next talk is ThunderX3 from Marvell
16 Comments
View All Comments
plopke - Monday, August 17, 2020 - link
Aah hotchips one of the most fun times of a year to read Anandtech all tough i only vaguely understand half of the stuff being talked about xDJon Tseng - Monday, August 17, 2020 - link
nb I thought he said 50-100ns worse latency for the funky off-server DMA (but I could be wrong!)zamroni - Monday, August 17, 2020 - link
Processor for FUD-ed DB2 database customers. I am sure the core banking applications can work with X86-64 version of DB2.Desierz - Monday, August 17, 2020 - link
They can, but obviously Power brings something that makes them preferable for some situations. I don't think they buy them because they are Power fanbois. ;)Meteor2 - Thursday, August 20, 2020 - link
We do!Well, really it's because it's cost-prohibitive to move to x86.
Cheesecake16 - Monday, August 17, 2020 - link
2 Petabytes of addressable RAMLord of the Bored - Monday, August 17, 2020 - link
Hell to the yes.eastcoast_pete - Monday, August 17, 2020 - link
Thanks Ian! Always like to read those updates, even though I am not in any danger of buying or leasing an IBM mainframe (:Question: how are these chips and other components cooled? Are they intended to be liquid-cooled, and designed with that assumption?
Lastly, that memory-sharing technology is fascinating - is that a heritage from the Cell processor days?
Meteor2 - Thursday, August 20, 2020 - link
Ours are water-cooled.thunng8 - Monday, August 24, 2020 - link
Most Power chips are air cooled. The only exception to this rule are the supercomputer clusters where they need to fit many chips in a small amount of space.Power server are distinct from mainframes. Look at the z15 if you are interested in mainframe chip - they are liquid cooled.