Making Sense of PSNR, SSIM, VMAF

making sense of PSNR, SSIM, VMAF
Zoe Liu

President & Co-Founder, Visionular

Summary: In this article, we talk about PSNR, SSIM, and VMAF – how they work, their drawbacks, and where they are used in the video compression and streaming industry.

As video traffic on the Internet increases each year, and with the continual rise in viewer experience and quality expectations, the need for streaming services and video platforms to deliver higher resolutions at ever-increasing quality has never been greater. This leads to the requirement for more efficient video codec standards and better methods to assess the resulting video quality. Although many video quality evaluation methods exist, the most widely accepted are PSNR, SSIM, and VMAF.

This article will discuss the concepts behind how each of these quality metrics functions, with a focus on PSNR avg. MSE and PSNR avg. log compared to VMAF. Our opinions are based on real-world experiences from our internal experiments and those of more than 50 customers.

Without further ado, let’s dive into an overview of the three most common objective quality metrics, starting with PSNR.

How does PSNR (Peak Signal to Noise Ratio) work?

In technical terms, PSNR is a numerical representation of the ratio between the maximum possible power of the signal (original frame) and the peak signal of the noise (compressed frame).

However, in simpler terms, we can say that PSNR compares the difference in the pixel values of two images and is commonly used to measure the quality of lossy compression codecs.

PSNR is an attractive option for the following reasons:

  1. It is simple to compute, making PSNR more usable in real-world applications.
  2. PSNR has a long history of usage, making it easy to compare the performance of new algorithms to those previously evaluated.
  3. PSNR is easy to apply for the use case of encoding optimization.

When comparing video codecs or encoder implementations, PSNR approximates the human perception of reconstruction quality. Typical PSNR values in lossy image and video compression are between 30 and 50 dB if the bit depth is 8 bits (higher means better quality). For a video sequence, there are two primary ways to calculate PSNR, PSNR avg. MSE and PSNR avg. log.

PSNR avg.MSE gets calculated by computing the arithmetic mean of the MSE first and then taking the log.

PSNR avg.log is computed by calculating the PSNR of each frame and then calculating the arithmetic mean of all video frames. Multiple tests show that PSNR avg.log is less accurate because it weights the final score toward frames with higher quality.

How good is PSNR in estimating video quality?

While PSNR is simple to calculate, it is important to know its strengths and weaknesses in order to use it effectively.

For example, if we have a video sequence with two frames, the first with a PSNR score of 99 dB and the second with a PSNR score of 50 dB, the visual difference will be subjectively indistinguishable. However, calculating the PSNR avg.log with a frame of PSNR=99dB will increase the final average score, giving the false impression of the sequence being subjectively higher quality.

And, the reason for this is that the HVS (human visual system) is highly sensitive to frames with subpar quality compared to the rest. When watching a video with a poor-quality frame, the human eye will recall this frame despite representing a low percentage of the total encoded frames.

PSNR avg. MSE is weighted toward low-quality frames, making the calculation better aligned to how humans “see” video in motion.

SSIM (Structural Similarity Index Measure)

SSIM is a full-reference image quality evaluation index that measures image similarity from three aspects: brightness, contrast, and structure. The value range of SSIM is [0,1], where the larger the value, the smaller the image distortion. SSIM gets compared to other metrics, including PSNR, MSE, and other perceptual image and video quality metrics. In testing, SSIM often outperforms MSE-based standards.

SSIM is an excellent choice for applications like:

  • estimation of content-dependent distortion,
  • capturing and measuring the impact of noise,
  • capturing blurring artifacts.

SSIM is unique in its ability to measure the subjective loss of coding. For example, when encoding using x264 with SSIM and with AQ (adaptive quantization technology) turned off, x264 will use a lower bit rate for smooth areas containing minor detail. AQ better allocates the bit rate to each macroblock. PSNR and VMAF incorrectly score the actual quality, whereas SSIM better correlates with subjective viewing.

What is probably clear by now is that there is no single “winning” quality metric, and even with SSIM’s positive attributes, there are specific applications where the metric may fail, such as:

  • assessing the quality of Super-Resolution algorithms,
  • detecting and capturing spatial and rotational shifts,
  • capturing changes in brightness, contrast, hue, and saturation.

With this understanding of SSIM, let us now move on to VMAF and dive into it!

VMAF (Video Multi-Assessment Method Fusion)

Video Multimethod Assessment Fusion (VMAF) is an objective full-reference video quality metric developed by Netflix with the University of Southern California, the University of Nantes IPI/LS2N lab, and the University of Texas at Austin Laboratory for Image and Video Engineering (LIVE).

VMAF is becoming a popular choice for the evaluation of quality when comparing different video codecs, video encoder implementations, encoding settings, and transmission standard variants. VMAF addresses the video evaluation situation in which traditional quality metrics cannot fully reflect all scene characteristics.

Why is VMAF so popular?

VMAF is unique and popular because it predicts subjective video quality based on a reference and a distorted video sequence. In other words, it can mimic how a human reacts to the videos, rather than an “objective” computer.

While PSNR and SSIM are commonly used and simple to calculate, they do not fully reflect the subjective sentiment of the human eye and thus provide limited value for real-world applications.

This is where VMAF shows a tremendous advantage, as illustrated below.

Clearly, the image on the right looks better to human eyes, and as we can see, the corresponding VMAF score is also higher!

But, wait! Is the image on the right really much better in quality?

The comparison shows that the image on the right has been enhanced to reveal additional detail, so the smaller characters are sharper while the VMAF score is greatly improved. But is this resulting VMAF improvement correlated to a better match for the original quality?

So, it turns out that VMAF can be tricked or hacked! Although the VMAF value has been raised, image quality has not improved.

Although VMAF has flaws, it is a valuable and, in some cases, superior quality metric.

Take a look at the images above.

  • The right image uses “deblocking filters” and shows mosaic artifacts.
  • The left image that uses “deblocking filters” produces a lower VMAF score because of the visible blurring.

And, this is the goal of VMAF – matching human perception!

Since VMAF can account for the two parts of image quality enhancement and degradation relative to the source, it is gaining considerable usage in video streaming.

So which is the best? PSNR, VMAF, or SSIM?

The main point of this article is not to announce a winning quality metric!

What’s important is to select the evaluation standard, whether objective or subjective, that most closely fits the type of measurement desired and the application for the testing. Making decisions based on a single data set leads to wrong conclusions. And, for this reason, we suggest that everyone use VMAF combined with PSNR and SSIM for the best combination of subjective and objective video quality evaluation.

The goal of Visionular has always been to improve image quality while pushing the bounds on performance without losing the compression advantages of the H.264/AVC, HEVC, and AV1 standards. Our team wakes up every day to do this. It’s a hard job that comes with great rewards, as we see some of the world’s largest video services and platforms delight their customers with our video encoding solutions.

Click play to see how we use AI + Codec optimization to improve streaming UX.