Analysis of AV1 Coding Tools using libaom

analysis of AV1 coding tools from Visionular
Grant Hsu

AV1 Architect & Video Codec Tech Lead at Visionular

This article consists of two parts: In part one, I will focus on providing an analysis of certain AV1 coding tools in the libaom open-source encoder.  And, in part 2, we will explore which tools are the most useful in the pursuit of better coding performance (speed) and efficiency (bitrate savings), again, using the open-source encoder, libaom.

As video engineers and codec (what is a video codec?) practitioners know, coding tools must show a reasonable advantage in the area of bitrate efficiency to be adopted. However, we must also consider the processing cost of the tool, as there are some tools that could yield efficiency benefits, but remain impractical based on the use case.

Digging into libaom as the case study for analyzing AV1’s coding tools, we can see in Table 1 that no fewer than 35 control flags can be toggled on or off in libaom, while Table 2 shows the libaom parameters used for testing.

Partition Types

In Figure 1, we showed the partition types of AV1 as compared to VP9, while in Table 4 below, we show the baseline as square partitions from 128×128 to 4×4 are activated.

At CPU level 3, even though all the AV1 partition types together generate the largest coding gain, we can see that the rectangular partition types that contribute the largest gain are the partitions in VP9. The newly added partitions AB partition, and 1:4 partition do not generate significantly larger gains, hence, they can be considered to be turned off, especially for fast speeds or very low delay (real-time) speed level modes.

Single Motion Modes

In Table 5 you will find the three main single motion modes that are found in libaom, namely warped motion, obmc, and inter-intra. One motion mode excluded in our analysis is the global motion mode, whose pre-analysis has been turned off in libaom CPU3. This is mainly due to the consideration of a trade-off between coding efficiency and encoder complexity.

On the other hand, inter-intra is usually considered as a compound mode – combining inter- and intra- for the encoding of one block. Considering this mode only includes one motion vector for one block, we list it as one of the single motion mode sets.

From Table 5, for the test set of objective-1-fast, it can be seen that warped motion is a coding tool that demonstrates the biggest cost-performance ratio, where a coding gain of 1.07% can be achieved by using just 2% of the CPU’s resources.

In Figure 2, we have two frame samples from two different video clips in the test set of objective-1-fast that contributed the largest gain of 1.07% for the mode of warped motion. Further, it can be observed that these two videos contain many scenes of rotation motions. Specifically, the Netflix clip in Figure 2(a) provides a gain of ~4%, whereas the clip of blue_sky in Figure 2(b) yields a ~8% gain.

It is noted if we change to another test set, the above coding tools may present completely different cost-performance numbers.

Thus, it is completely reasonable that different coding tools may be selected to adapt to different content scenarios.

Compound Modes

Compared to single motion modes, there are many more compound modes that have been included in the AV1 coding standard. In AV1, for each superblock, there are 28 single motion modes, whereas the number of compound modes increases to 128.

In essence, any pair of references can be composed of one compound mode.

In libaom CPU3, many speedup algorithms have been proposed, out of which the majority speedup features all come from the following simple idea: Check the results of the single motion evaluation, and determine which compound is worth evaluating and further what modes under those specific compound modes should be considered. For compound modes with a strong likelihood of not resulting in a noticeable coding gain, they will be skipped.

In particular, one compound coding tool, namely one-sided compound, uses a pair reference from the same prediction direction, or the pair of references are positioned on the same side of the current frame.

As shown in Figure 3, in the High Delay mode, libaom has in general included a golden frame (GF) group, also referred to as GOP, comprising 16 frames.

Figure 3 shows how a hierarchical structure can be adopted in many cases. It can be seen that for all 16 frames except for the ALTREF frame, the very last frame in the current GOP, has two-sided references. Therefore, a two-sided compound instead of a one-sided compound should be better-taken advantage of as larger coding gains can usually be achieved.

For ALTREF, all reference frames are positioned at a distance of more than 16 frames away in the forward direction. Hence, for the High Delay scenario, a one-sided compound does not contribute much to the coding efficiency for all the frames within the GOP.

For very low-delay (zero-latency) scenarios, all frames have their references positioned in the same direction—forward prediction without the use of any single bidirectional predicted frame. The use of a one-sided compound can then be considered.

Related Posts