Intel's Skylake GPU - Analyzing the Media Capabilitiesby Ganesh T S on August 26, 2015 2:00 PM EST
- Posted in
- Trade Shows
- Quick Sync
- IDF 2015
At IDF in San Francisco last week, Intel provided us with lots of insights into Skylake, the microarchitecture behind the 6th generation Core series processors. Skylake marks the introduction of the Gen9 Intel HD Graphics technology. In advance of our full Skylake architecture analysis (coming soon), I wanted to get a head start and explain the media side (including Quick Sync and the image processing pipeline) of Skylake in a separate piece.
Media Capabilities and Quick Sync in Intel HD Graphics - A Brief History
Quick Sync has evolved through the last five years, starting with limited hardware acceleration and usage of the programmable EU array in Sandy Bridge. The second generation engine in Ivy Bridge moved to a hybrid hardware / software solution with rate control, motion estimation and intra estimation as well as mode decision happening in the programmable EU array. Usage of the EU array enabled tuning of the algorithms. Motion compensation, intra prediction, forward quantization and entropy coding were done in hardware in the MFX (multi-format codec engine). Haswell added JPEG / MJPEG decode to the MFX, a dedicated VQE (video quality engine) for low power video processing and a faster media sampler.
Around the time Broadwell was introduced, we had the major transitions taking place in the video codec front - HEVC adoption was picking up, and VP8 / VP9 was also gaining support. In order to tackle these aspects and build on consumer feedback, Intel made major updates to the media block / Quick Sync engine late last year.
Broadwell was also the first microarchitecture to support two BSDs (bit stream decoder) in the GT3 variants. Each BSD allows a set of commands to decode one video stream.
Broadwell's updates (when compared to Haswell) are summarized in the slide below.
The detailed discussion of Broadwell's media capabilities above is relevant to the improvements made in Skylake.
Skylake's Gen9 Graphics
The Gen9 graphics engine comes in multiple sizes for different power budgets. There are three main variants, GT2, GT3/GT3e and GT4e. In the slide below, the important aspect to note is that the media processing hardware (Media FF - Media Fixed Function) resides in the 'Unslice'. While the GT2 comes with the minimum possible Media FF logic, the GT3 and GT3e come with additional hardware capabilities. This strategy is similar to what was adopted in Broadwell.
The Unslice can operate at a different voltage and frequency compared to the Slices. This is especially important for video decoding / processing where the Media FF can run at higher clocks for better performance while ensuring minimal power consumption. From the viewpoint of tools such as GPU-Z and HWiNFO, it will be interesting to see if real-time statistics on voltage and clocks can be gathered for both the Unslice and the Slices. For additional power saving, power gating can be used at the Slices level or the EU group level.
Amongst the media improvements made in Skylake, we have:
- An additional fixed function video encoder in the Quick Sync engine
- Additional codec support (both decode and encode): HEVC, VP8, MJPEG
- RAW imaging capabilities
Quick Sync in Skylake
Intel classifies the Quick Sync modes in Broadwell and previous generations as 'PG-Mode' (Processor Graphics). It is optimized for faster than real-time encoding and flexibility. The new mode, 'FF-Mode' (Fixed Function) is optimized for real-time H.264 encoding, with focus on lowering the latency and reducing the power consumption. Except for programmable rate control, all other aspects of the encoding algorithm are handled in the MFX itself. Since rate control is in the hands of the application software, it is possible to do a 2-pass adaptive mode even with the FF hardware.
The new mode could possibly enable better user-experience with features such as Wi-Di, screen recording etc.. Note that Skylake offers developers the flexibility to use either the PG mode or the FF mode in their applications. PG mode still retains the TUx (Target Usage level) discussed in one of the above slides.
Skylake's MFX engine adds HEVC Main profile decode support (4Kp60 at up to 240 Mbps). Main10 decoding can be done with GPU acceleration. The Quick Sync PG Mode supports HEVC encoding (again, Main profile only, with support for up to 4Kp60 streams).
The DXVA Checker screenshot (taken on a i7-6700K, a part with Intel HD Graphics 530 / GT2) for Skylake with driver version 10.18.15.4248 is produced below. HEVC_VLD_Main10 has a DXVA profile, but it is done partially in the GPU (as specified in the slide above). VP8 DXVA profile doesn't seem to be activated yet. There are new DXVA profiles (enabled) for the SVC (scalable video coding) extension to H.264.
Video Post Processing & Miscellaneous Aspects
Additional improvements include a scalar and format converter (SFC) that can work with MFX and VQE (without using the EUs or the media sampler). This enables power-efficient rotation and color space conversion during media playback.
Yet another power-saving trick introduced in Skylake is the media memory bandwidth compression. The compression is lossless and managed at the driver level.
Skylake's VQE also brings about new features with RAW image processing support (16-bit image pipeline), spatial denoising and local adaptive contrast enhancement (LACE). Power efficiency is also improved, with claims of the VQE consuming less than 50mW during operation.
The new fixed function hardware in the performance-sensitive stages enables even low power mobile Skylake parts to support 4Kp60 RAW video processing. LACE support is not available for 4K resolution on the Y-series Skylake parts, though.
In terms of display support, Skylake can drive up to three simultaneous displays. The supported resolutions are provided in the table below. At IDF, Intel was showing off the Skylake platform driving three 4K monitors simultaneously.
One of the disappointing aspects is the absence of a native HDMI 2.0 port with HDCP 2.2 support. Intel's solution is to add a LSPCon (Level Shifter - Protocol Converter) in the DP 1.2 path. Various solutions such as the MegaChips MCDP28 family of products exist for this purpose. According to one of leaked Intel slides from earlier this year, the Alpine Ridge Thunderbolt 3 controller can also act as a LSPCon and provide a HDMI 2.0 output. At IDF, Intel indicated that we could see Alpine Ridge supporting HDMI 2.0 towards the end of the year (something corroborated unofficially by a few motherboard manufacturers)
The display sub-system also provides hardware support for Multi-plane Overlay (MPO) that allows alpha blending of multiple layers. This saves power by selective disabling of un-needed planes. Usage applications include certain video playback scenarios and HUD (heads-up display) gaming. The table below lists out the updated support for MPO as one moves from Broadwell to Skylake. The NV12 feature is particularly interesting from a media playback perspective - it is a video format that avoids conversion as video data moves between the decoder, post processing and the display blocks. With Skylake, post-decoded NV12 content can also be provided directly to a MPO display plane, and there is no need for the video post processor to do a NV12 to RGB conversion.
Intel indicated that the new Skylake MPO feature could save as much as 1.1W when playing back 1080p24 video on a 1440p panel - which is a substantial amount when mobile devices are considered. Power savings are also achieved by altering the core display clock based on the display configuration, number of displays and the resolution of each display.
Systems utilizing eDP with Windows 8.1 or later can also take advantage of hardware support for reducing refresh rate based on video content frame rate (for example, 24 fps video streams can be played after reducing the panel refresh rate to 48 Hz - eliminating 3:2 pull-down issues while also providing power savings). Obviously, the panel and TCON should support this.
Additional power saving can also be achieved on supported panels using Panel Self Refresh Media Buffer Optimization (PSR MBO). It is an Intel-developed optimization on top of the Panel Self Refresh feature of eDP 1.3.
The media-related changes in Skylake's Gen9 GPU are best summarized by the slide below.
Skylake brings a lot of benefits to content creators - particularly in terms of improvements to Quick Sync and additional image processing options (including real-time 4Kp60 RAW import). However, it is a mixed bag for HTPC users. While the additional video post processing options (such as LACE for adaptive contrast enhancement) can improve quality of video playback, and the increase in graphics prowess can possibly translate to better madVR capabilities, two glaring aspects prove to be dampeners. The first one is the absence of full hardware acceleration for HEVC Main10 decode. Netflix has opted to go with HEVC Main10 for its 4K streams. When Netflix finally enables 4K streaming on PCs, Skylake, unfortunately is not going to be as power efficient a platform as it could have been. The second is the absence of a native HDMI 2.0 / HDCP 2.2 video output. Even though a LSPCon solution is suggested by Intel, it undoubtedly increases the system cost. Sinks supporting this standard have become quite affordable. For less than $600, one can get a 4K Hisense TV with HDMI 2.0 / HDCP 2.2 capability. Unfortunately, Skylake is not going to deliver the most cost-effective platform to utilize the full capabilities of such a display.
Post Your CommentPlease log in or sign up to comment.
View All Comments
MrSpadge - Thursday, August 27, 2015 - linkThe duty cycle optimization is a perfect match for Core M.
frankpc - Wednesday, August 26, 2015 - linkGigabyte now shows that the GA-Z170X-Gaming 7 complies with HDMI 2.0 and HDCP 2.2, and it uses the MegaChip LSPCon. It is stated above that the system cost will be higher with that device. Is $220 about as low as a MB will be priced? Or is that still too high?
Byte - Wednesday, August 26, 2015 - linkYou can wait for a combo deal, last 4 releases they had cpu and mobo for $50-$75 on the egg. Don't know about this time though, since the prices are all up for the i7 series
gue22 - Tuesday, September 1, 2015 - linkFrom
of August 5 2015:
"For USB 3.1, GIGABYTE would seem to have an initial exclusive of Intel’s Alpine Ridge controller. This is a four-lane PCIe controller that supports full speed on two USB 3.1 ports as well as Power Delivery 2.0 support (up to 36W) on Type-C connectors. For the G1, we get a Type-A and a Type-C here. With HDMI 2.0 support through Alpine Ridge, we’re at a bit of a conundrum here – on the board in white letters it explicitly states HDMI 2.0, but none of the marketing materials from GIGABYTE I have actually use it in any as a marketable point. So we’re unsure if it is indeed HDMI 2.0 capable, what standard, if this is via the AR controller or if this is a separate LS-Pcon. Then again, this is a motherboard designed for discrete gaming cards rather than integrated graphics.
Edit: We can confirm that HDMI 2.0 is via a separate LS-PCon, although it will need a future firmware update before it can be used."
Reflex - Wednesday, August 26, 2015 - linkI am curious if Skylake can drive more than one Miracast display, and if it counts against the display cap of 3. I found something interesting in my Win10 update: In Windows 8, my Surface Pro 3 i7 could drive 3 displays total via DP (using daisy chaining). If I connected to my tv via Miracast, one of the LCD's would get disabled. But in Windows 10 I can add a Miracast display as a fourth display in addition to my 3 directly connected displays.
I was surprised at this change. Anyone have any info on that?
veek - Wednesday, August 26, 2015 - linkidiotic article filled with buzzwords - you'd think this was a PhD thesis paper except it's filled with marketing crap.
name99 - Thursday, August 27, 2015 - linkPerhaps the problem is with the reader, not with the article?
You are welcome to not be interested in how these features are developing over time. (I'm certainly not especially interested in, for example, how Intel Enterprise features are developing.)
But to imagine that *no-one* is interested in these features is myopic, narcissistic, and, above all, foolish.
BrokenCrayons - Thursday, August 27, 2015 - linkYeah, agreed. While I personally don't find anything at all interesting about media capabilities beyond the basic idea of whether or not a video I want to watch (something rare anyway since I don't find it enjoyable to view video content in general no matter what screen or device its on) plays and if a device is on battery power at the time, that the battery isn't drained too quickly, there are other people who are very, very interested in this sort of article and find the details interesting and meaningful.
gue22 - Tuesday, September 1, 2015 - linkWith idiots it´s often like with cages and population in the zoo:
It´s topologically difficult to tell who´s inside and who´s outside!
ToTTenTranz - Thursday, August 27, 2015 - link4k60p at 240Mbps. That's twice the maximum bitrate of the UHD BluRay standard.
I can only imagine how good that would look, given the proper viewing equipment of course.