Decoding and Rendering Benchmarks

Our decoding and rendering benchmarks consists of standardized test clips (varying codecs, resolutions and frame rates) being played back through MPC-HC. GPU usage is tracked through GPU-Z logs and power consumption at the wall is also reported. The former provides hints on whether frame drops could occur, while the latter is an indicator of the efficiency of the platform for the most common HTPC task - video playback.

Enhanced Video Renderer (EVR) / Enhanced Video Renderer - Custom Presenter (EVR-CP)

The Enhanced Video Renderer is the default renderer made available by Windows 8. It is a lean renderer in terms of usage of system resources since most of the aspects are offloaded to the GPU drivers directly. EVR is mostly used in conjunction with native DXVA2 decoding. The GPU is not taxed much by the EVR despite hardware decoding also taking place. Deinterlacing and other post processing aspects were left at the default settings in the Intel HD Graphics Control Panel (and these are applicable when EVR is chosen as the renderer). EVR-CP is the default renderer used by MPC-HC. It is usually used in conjunction with MPC-HC's video decoders, some of which are DXVA-enabled. However, for our tests, we used the DXVA2 mode provided by the LAV Video Decoder. In addition to DXVA2 Native, we also used the QuickSync decoder developed by Eric Gur (an Intel applications engineer) and made available to the open source community. It makes use of the specialized decoder blocks available as part of the QuickSync engine in the GPU.

Power consumption shows a tremendous decrease across all streams. Admittedly, the passive Ivy Bridge HTPC uses a 55W TDP Core i3-3225, but, as we will see later, the power consumption at full load for the Haswell build is very close to that of the Core i3-3225 build despite the lower TDP of the Core i7-4765T.

In general, using the QuickSync decoder results in a higher power consumption because the decoded frames are copied back to the DRAM before being sent to the renderer. Using native DXVA decoding, the frames are directly passed to the renderer without the copy-back step. The odd-man out in the power numbers is the interlaced VC-1 clip, where QuickSync decoding is more efficient compared to 'native DXVA2'. This is because there is currently no support in the open source native DXVA2 decoders for interlaced VC-1 on Intel GPUs, and hence, it is done in software. On the other hand, the QuickSync decoder is able to handle it with the VC-1 bitstream decoder in the GPU.

The GPU utilization numbers follow a similar track to the power consumption numbers. EVR is very lean on the GPU, as discussed earlier. The utilization numbers provide proof of the same. QuickSync appears to stress the GPU more, possibly because of the copy-back step for the decoded frames.

madVR

Videophiles often prefer madVR as their renderer because of the choice of scaling algorithms available as well as myriad other features. In our recent Ivy Bridge HTPC review, we found that with DDR3-1600 DRAM, it was straightforward to get madVR working with the default scaling algorithms for all materials 1080p60 or lesser. In the meanwhile, Mathias Rauen (developer of madVR) has developed more features. In order to alleviate the ringing artifacts introduced by the Lanczos algorithm, an option to enable an anti-ringing filter was introduced. A more intensive scaling algorithm (Jinc) was also added. Unfortunately, enabling either the anti-ringing filter with Lanczos or choosing any variant of Jinc resulted in a lot of dropped frames. Haswell's HD4600 is simply not powerful enough for these madVR features.

It is not possible to use native DXVA2 decoding with madVR because the decoded frames are not made available to an external renderer directly. (Update: It is possible to use DXVA2 Native with madVR since v0.85. Future HTPC articles will carry updated benchmarks) To work around this issue, LAV Video Decoder offers three options. The first option involves using software decoding. The second option is to use either QuickSync or DXVA2 Copy-Back. In either case, the decoded frames are brought back to the system memory for madVR to take over. One of the interesting features to be integrated into the recent madVR releases is the option to perform DXVA scaling. This is particularly interesting for HTPCs running Intel GPUs because the Intel HD Graphics engine uses dedicated hardware to implement support for the DXVA scaling API calls. AMD and NVIDIA apparently implement those calls using pixel shaders. In order to obtain a frame of reference, we repeated our benchmark process using DXVA2 scaling for both luma and chroma instead of the default settings.

One of the interesting aspects to note here is the fact that the power consumption numbers show a much larger shift towards the lower end when using DXVA2 scaling. This points to more power efficient updates in the GPU video post processing logic.

DXVA scaling results in much lower GPU usage for SD material in particular with a corresponding decrease in average power consumption too. Users with Intel GPUs can continue to enjoy other madVR features while giving up on the choice of a wide variety of scaling algorithms.

Refresh Rate Handling - 23.976 Hz Works! Network Streaming Performance - Netflix and YouTube
Comments Locked

95 Comments

View All Comments

  • eio - Sunday, June 23, 2013 - link

    great example! very interesting.
    I agree with Montage that for most snapshots, HD4600 is significantly better than HD4000 for retaining much more texture, even for this frame 4 in 1080p.
    but in 720p HD4600 shows its trade off of keep more fine grained texture: looks like HD4600 are regressed in low contrast, large scale structral infomation.
    as you said, this type of regression can be more evident in video than snapshots.
  • eio - Sunday, June 23, 2013 - link

    another thing that surprises me is: x264 is a clear loser in this test. I don't understand why, what are the specific params that handbrake used to call x264?
  • nevcairiel - Monday, June 3, 2013 - link

    @ganeshts

    I'm curious, what did you use for DXVA2N testing of VC-1?
    LAV Video doesn't support VC-1 DXVA2 on Intel, at least on Ivy Bridge, and i doubt Haswell changed much (although it would be a nice surprise, i'll see for myself in a few days)
  • ganeshts - Monday, June 3, 2013 - link

    Hendrik,

    I made a note that DXVA2N for interlaced VC-1 has software fallback.

    That issue is still not fixed in Haswell. That is why you see QuickSync consuming lower power compared to DXVA2N for the interlaced VC-1 sample.
  • zilexa - Monday, June 3, 2013 - link

    To be honest, now that I have a near-perfect Raspberry setup, I would never buy a Core ix/AMD Ax HTPC anymore. Huge waiste of money for almost un-noticable image quality improvement.
    The Raspberry Pi will use max 6.5w, usually much lower. Speed in XBMC is no issue anymore, and it plays back all my movies just fine (Batman imax x264 rip 7-15MBps). I play mostly downloaded tv shows, streams and occasionally a movie. It also takes care of the whole download process in the background. So I don't even have a computer anymore at home. I sold my old AMD 780G based Silverstone M2 HTPC for €170 and it was the best decision ever.

    Still cool to read about the high end possibilities of HTPC/MadVR or actually just video playback and encoding, cos thats what this is really about. But I would never buy a system to be able to support this. HTPC in my opinion is to be in a lazy mode and able to playback your shows/movies/watch your photos and streams in good HD quality and audio.

    If you need HTPC, in my opinion there is no need for such an investment in a computer system which is meant for a huge variety of computing tasks.
  • jwcalla - Monday, June 3, 2013 - link

    It's going to depend on individual needs of course, and I think your Raspberry Pi is on the other end of the extreme, but otherwise I kind of have the same reaction. This has got to be an $800+ build here for an HTPC and then I begin to wonder if this is a practical approach.

    Owing to the fact that Intel's entire marketing strategy is to oversell to the consumer (i.e., sell him much more than he really needs), it seems that sometimes these reviews follow the strategy too closely. For an HTPC? Core i3 at the max. And even that's being generous. If one needs certain workloads like transcoding and such then maybe a higher end box is needed. But then I question if that kind of stuff is appropriate for an HTPC.
  • superjim - Monday, June 3, 2013 - link

    Playback a raw M2TS 1080p 60fps file on your Pi and get back to me.
  • phoenix_rizzen - Monday, June 3, 2013 - link

    How did you get around the "interface is not accelerated" issue on the RPi? I found it completely useless when trying to navigate the XBMC interface itself (you know, to select the show to watch). Sure, once the video was loaded, and processing moved over to the hardware decoder, things ran smooth as silk.

    I sold my RPi two weeks after receiving it due to this issue. Just wasn't worth the headaches. Since moved to a quad-core AthlonII running off an SSD with a fanless nVidia dGPU. So much nicer to work with.
  • vlado08 - Monday, June 3, 2013 - link

    What about Frame Rate Conversion (FRC) capability?
  • ericgl21 - Monday, June 3, 2013 - link

    Ganesh,

    Let's assume you have two 4K/60p video files playing in a loop at the same time for a duration of 3 hours.
    Is it possible that Iris or Iris Pro could play those two video streams at the same time, without dropping frames and without the processor throttling throughout the entire movie playback ?
    I mean, connecting two 4K TVs, one to the HDMI port and the other to the DisplayPort, and outputting each video to each TV. Would you say the Iris / Iris Pro is up to this task? Could you test this scenario?

Log in

Don't have an account? Sign up now