H.265/HEVC

What is MV-HEVC (Multiview High Efficiency Video Coding)?

MV-HEVC for Apple Vision Pro Virtual Reality
matteo naccari visionular inc
Matteo Naccari, PhD

Principal Video Codec Engineer, Visionular Inc.

The whole world really loves entertainment. Across the globe, people constantly search for new and exciting things. They are just dying to find new ways to be caught up in a story and new opportunities to make themselves a part of the story.

Think of Pokemon Go, app-based watch parties, multi-angle video capture of sporting events, and The Sphere in Las Vegas. There’s undoubtedly innovation happening on both the demand and supply side!

People nowadays don’t find it enough to see a movie’s plot progress – they want to become part of the story themselves! People seek a more profound relationship with what they are watching and desire to be part of the story – almost as if it were happening to them!

The passion for immersion is what has driven television far from its passive origins, where the interaction between the viewer and the story was in essence one way.

Virtual reality (VR) headsets, such as Oculus and Apple Vision Pro, are almost allowing us to touch the events that are happening in the news, and what children in Nicaragua or Congo experience. However, a big obstacle in the path of making VR truly mainstream is the sheer amount of data that needs compressing before it can be relayed to these VR sets.

Put simply, even a stereoscopic video with just a pair of views would in principle require twice the number of pixels to be stored and/or transmitted. Conventional video coding fails to efficiently handle the huge volume of data needed for top-notch 3D video recorded from various cameras. Within the realm of this challenging problem, Multiview High Efficiency Video Coding (MV-HEVC) proposes a solution.

MV-HEVC technology is being built to bring a high-quality immersive experience to VR headsets and it does so by exploiting the high spatial data redundancy existing between the different views in a 3D video.

This blog post scrutinizes the technical elements of MV-HEVC. We look at how it assimilates the fundamental tenets of HEVC to create a more efficient and effective coding structure for Multiview video. We also consider the specific difficulties present in Multiview setup and possible alternatives that may have a better result in VR video streaming.

MV-HEVC being used in the Apple Vision Pro VR device from Apple

Apple Vision Pro – revolutionizing VR video streaming! Image credit: Apple

What is MV-HEVC?

The MV-HEVC syntax is an extension to the High Efficiency Video Coding (HEVC) standard. It is designed to efficiently compress multiple video views of the same scene that are captured from different vantage .

When depth-based information is also available and needs to be transmitted, the 3D extension of HEVC (3D-HEVC) is employed and uses ad-hoc coding tools to compress the specific statistics associated with depth data.

These two extensions (MV and 3D) constitute the HEVC-based toolset to enable the delivery of 3D video applications.

The basic gain in compression achieved with MV-HEVC comes from removing the so-called inter-view redundancy that’s naturally present among multiple views, and we’ll understand the mechanics of this process in the next section.

How does MV-HEVC Work?

MV-HEVC’s USP is inter-view prediction.

It distinguishes itself from the traditional compression of individual views or cameras. Unlike single-view compression, predicting frames of one camera’s view is essentially different from predicting frames from another camera’s view.

By exploring the pixel contexts either within one or across multiple views, MV-HEVC can offer more effective coding schemes, thereby reducing the bit rate even more.

MV-HEVC codec being used to compression Big Buck Bunny stereoscopic video

Left and right views of the 3-D stereoscopic version of the popular Big Buck Bunny video. Source

Inter-view Prediction in MV-HEVC

Predicting and exploiting inter-view redundancies in MV-HEVC is essential to achieving a higher compression ratio while maintaining good video quality, while trying to hit reasonable bitrates that allow multi-view video to be transmitted over the internet.

  • Prediction means the video encoder can use the knowledge and data of one view to generate the corresponding block in another view; note that this is not acquisition/3D-reconstruction.
  • Since the encoder receives two views, they make use of either of these views to come up with the blocks in another view, and this is why it allows redundancy removal.

MV-HEVC enables inter-view prediction by inserting the compressed frame from one view into the reference picture buffer so that the view currently being encoded may use motion compensation to perform inter-view prediction.

In the simple case of two views, one will serve as base layer (e.g. the left one) and the other (i.e. the right one) will predict from it. This way of performing inter-view prediction by using coding tools from Version 1 of HEVC has the key advantages that MV-HEVC can simply be implemented by extending the high-level syntax of the standard as described in the following.

Finally, we note that when inter-view prediction is selected for a given coding area, the motion vectors employed are disparity vectors, quantifying the spatial offset between the same area in two different views.

MV-HEVC codec being used to compression Big Buck Bunny stereoscopic video

Left and right views of the 3-D stereoscopic version of the popular Elephant’s Dream video. Source.

Cross-layer Prediction in MV-HEVC

This kind of prediction is key in scalable video coding solutions where frames or group of frames are divided into layers and spatial/temporal prediction occurs across these layers.

As an example,

  • spatial scalability provides that a frame is composed of as many layers as the number of image resolutions the application scenario requires.
  • Layers are sorted in ascending order with their pixel count, whereby the smallest resolution layer is denoted as base layer whilst the others are the enhancement
  • Accordingly, pixels from a given layer L (i.e. resolution) may be predicted from those associated with a layer L – 1, using therefore data across layers, hence the name cross-layer prediction.

Note: A little thought should convince the reader that inter-view prediction may be seen as a special case of cross-layer prediction where the pixel from one view may be predicted from those of another view.

The following figure shows a two view (stereoscopic) video arranged so that the left view is associated with the base layer 0 whilst the right view is the enhancement layer 1. Inter frame and inter-view predictions are highlighted too.

MV-HEVC prediction with multi-view interview prediction

Example of prediction structure arrangement for a stereoscopic video where the left view serves as base layer (Layer 0) and the right one is the enhancement one (Layer 1). Adapted from here.

High level syntax extensions for MV-HEVC

As already mentioned, MV-HEVC’s design (as well the scalable extension of HEVC, SHV) was carried out by using some pre-existing hooks in the HEVC’s syntax as well as extending some elements such as parameter sets and slice headers.

Such an approach allowed codec vendors to speed up the development of scalable extensions of HEVC (such as MV-HEVC) by greatly re-using hardware and/or software components designed for Version 1 solutions. A summary of the main high level syntax extensions introduced is as follows (more details can be found in [1] and [2]):

  • Video Parameter Set (VPS): Additional syntax is present to signal the type of scalability (e.g. spatial or Multiview), profiles used by the different layers, decoded picture buffer size, etc.
  • Sequence Parameter Set (SPS) and Picture Parameter Set (PPS): Additional syntax is present to signal constraints on disparity vectors extent and picture order count reset for different views (i.e. layers).
  • Slice header: Additional syntax is present to signal whether a picture will be discarded because it is not used for inter-frame or inter-view prediction or which pictures in the lower layers will be used by the current layer.
  • Supplemental Enhanced Information (SEI) messages: A few new messages have been introduced for specific application scenarios associated with Multiview content. A notable one is the 3D reference displays information message required when Multiview clips are displayed on Apple Vision Pro devices.

Advantages of using MV-HEVC

By exploiting inter-view redundancies, MV-HEVC can greatly reduce the bitrate compared to single view HEVC i.e., encoding each view separately, as independent streams.

By reducing bitrates by a noticeable amount, MV-HEVC helps to save files’ storage space and thus directly helps content provides reduce their OPEX (storage and CDN). Also, smaller file sizes and bitrates will enable the video to be transmitted faster over the internet and load on devices.

MV-HEVC is also backwards compatible and if the end-device does not support multiview decoding, then the decoder can fall back to decoding the base layer using standard HEVC.

This is a win-win for both the content provider and the end-user!

MV-HEVC codec being used to compression Big a hockey match captured using an iPhone 15 Pro and compressed using MV-HEVC

Left and right views of a video captured by Visionular using an iPhone 15 Pro and compressed using the Multiview Extension of  Visionular’s Aurora5 HEVC codec.

Considerations for Implementing MV-HEVC

While the development of MV-HEVC is still in its early stages, many codec vendors (including Visionular) have released their MV-HEVC implementations for early adopters to start using.

Whether you are a codec developer or a content provider, there are a few points that you need to take into consideration as you build your MV-HEVC strategy for VR streaming.

Playback Support

Currently, MV-HEVC lacks vital support in platforms and devices, but this is expected as it’s in its early stages of development and interest in the industry is moving from the R&D labs to trials and deployments with content providers.

In many instances, it’s necessary to transcode to/from MV-HEVC to obtain the needed compatibility to make a project truly cross-platform.

And, we’ve also heard news about Dolby introducing support for MV-HEVC in its Dolby Vision Profile 20. This is great to hear as its an indication of the willingness to work towards building the eco-system to support video playback and streaming in the virtual world!

Content-Creation Process

The process of creating content is altered when using MV-HEVC. It considers the content creation and composition and requires revisions in those areas. The process might necessitate working closely with content creators and training them to prepare the content in a way that meshes well with the MV-HEVC workflow.

Interestingly, you can capture stereoscopic content using the iPhone 15 Pro and watch it in 3D using an Apple Vision Pro.

Encoding Complexity

The process is slowed down by having to compress multiple views for each frame. However, the high spatial correlation among different views – exploited during inter-view prediction (as mentioned above) – can also be used to speed up compression by re-using some decisions made in the base layer view (e.g. the coding tree unit partitioning and/or coding mode, i.e. inter or intra).

Also, the rate controller module needs to take advantage of the nature of Multiview content, one of the views may be compressed with less bits (i.e. having lower quality), knowing that the human brain will compensate for the different quality through the well-known binocular rivalry effect.

Wrapping Up ...

These are the early days of MV-HEVC and streaming in the VR world. We are excited to see how the world of VR streaming evolves and are looking forward to playing a major role in it using our AI-driven video compression technology.

References

  1. Tech, Y. Chen, K. Müller, J.-R. Ohm, A. Vetro, and Y.-K. Wang, “Overview of the Multiview and 3D extensions of High Efficiency Video Coding”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 1, pp. 35 – 49, January 2016.
  2. ITU-T H.265 and ISO/IEC 23008-2, “High Efficiency Video Coding”, Version 9, September 2023.

Related Posts