Fundamentals

Video Encoding on a GPU – Explained in Detail

Video-Encoding-on-a-GPU-Explained-in-Detail

In this article, we talk about video encoding on GPUs, why GPUs are suited for video compression, and compare it with CPU-based video compression. 

Video can be compressed using software or hardware video compression solutions/systems. Hardware encoders are machines dedicated for video compression (using FPGA, ASIC, etc.) and software encoders are software products that run on the machine’s CPUs/GPUs.  A lot of video compression is done on CPUs, because of their versatility, complex instruction sets, and frankly, their vast availability.

However, as video resolutions and bitrates increase, the limitations of CPUs for encoding video become apparent. And, these limitations can be seen across video codecs such as H.264/AVC, HEVC, and AV1.

Firstly, complexity: encoding complex videos with resolutions of 1080p or higher resolutions can require a vast amount of processing power and encoding time. Simultaneous processing of several live channels may be difficult due to the processing limitations of a CPU.

Secondly, variability: a CPU encoder usually takes a variable length of time to encode a video, and therefore sizing an encoding platform for a live application requires some headroom and/or adaptability in the encoder algorithm, and encoding times are not entirely predictable.

While CPUs may shine at tasks which need to run one at a time (sequential tasks), GPUs (or Graphics Processing Units) are made for parallel processing. That is, a GPU can run many jobs at the same time. This seems just what is required for video encoding as many parts of a video encoding algorithm can be broken up into pieces and simultaneously completed.

Furthermore, these parallel tasks are completed in predictable amounts of time. By placing such tasks onto the GPU, we can make noticeable improvements to encoding speed and predictability. However, doing so comes with challenges and potential costs which must be overcome.

In this article, let’s understand how a GPU works, why certain video compression tasks are well-suited for running on a GPU, the challenges involved in accelerating an encoder in this way, and the use of GPU encoding for a range of applications.

Example of a GPU

Why is a GPU suitable for Video Encoding and Decoding?

While CPUs are fantastic at running complex algorithms, they are less good at taking a small algorithm and running it on a million different data points at the same time. As we think about this problem, then, it’d be great to have a computing platform that is very good at that very sort of problem – that has, in other words, great “parallel processing” of data — breaking a video up into little pieces and applying a small algorithm on all these data-points (or arrays) at the same time.

With this understanding, we can begin to look at how a GPU supports massively parallel processing.

1) Numerous cores: In contrast to CPUs, today’s GPUs incorporate thousands of cores meant to perform tiny workloads all at once. For example, the NVIDIA GeForce RTX 3080 or the AMD Radeon RX 6800 XT, both of which are commonly available GPUs for customers, come with over 8,000 cores and this huge amount of computing power for parallel processing!

cpu-vs-gpu-cores
CPU vs. GPU cores. Image credit: VMWare

2) High Bandwidth Memory: When dealing with large amounts of video data, in the form of individual frames of video, there is no substitute for a GPU. Keep in mind that these frames must be transferred from the disk to the RAM and made available for the GPU cores to work on – so this can potentially become a bottleneck for high-speed, parallel computing. However, GPUs come with high bandwidth memory (HBM or GDDR), as opposed to the traditional DDR memory which CPUs have. In the presence of high bandwidth memory, data can be transferred at much faster rates within a GPU for faster processing and turn-around of tasks.

3) Streaming Multiprocessors (SMs): Cores in GPUs are divided into Streaming Multiprocessors, where each SM can handle multiple simultaneously to enhance parallel processing. We require more SMs in the GPU to obtain better video rendering results in the encoding process.

4) Memory Bandwidth: What data can be transferred between the GPU memory and the cores and is important. Higher memory bandwidth, which is measured in GB/s, is how we make sure that data moves smoothly, and that the GPU is fully utilized.

Once you grasp these architectural features as well as the specific chores involved in video encoding, you can comprehend what makes GPUs a fitting solution for a complex and time-sensitive system like video compression.

In addition to general purpose cores, most recent GPUs today feature special hardware built specifically for video compression and video processing. Examples include NVIDIA’s NVENC and AMD’s VCE, both are hardware encoders and decoders that offload most of the video encoding workload off the CPU. These specialized units are designed to accelerate video processing and work very well with complex encoding algorithms. In the end, pulling some tasks off to the on-device hardware encoders/decoders leaves the GPU’s cores free for additional tasks.

For this reason, the term “GPU video encoding” is often confusing, as it often means offloading onto these special cores rather than decomposing into tasks performed on the general GPU cores

In fact, a wide variety of architectures are available where work can be shared in different ways between special and general GPU cores and even with a CPU. This is because some processes do not parallelize so well or where parallelization affects the quality of decisions the encoder can make.

GPUs offer a significant advantage over CPUs in terms of speed and efficiency, which is required for faster encoding times and high-quality video production workflows.

With this understanding, let us next look at which tasks are GPUs best suited for.

Examples of Video Encoding Tasks Well-Suited for GPUs

Generally, a job that you would want to move onto a GPU core is

  • clearly defined (in terms of inputs and outputs)
  • needs a small, stand-alone piece of data.
  • its calculations or operations operate independently of calculations occurring on other cores.

These requirements combined create a task that could be offloaded to the thousands of GPU cores and can be processed in parallel.

Given that we are discussing video compression, let us point out some tasks/algorithms that are especially suitable to be farmed out to a GPU’s cores:

1) Motion Estimation: It is very important to identify areas in video frames which have not changed compared to the previous frame (or future frame) for efficient compression. Motion estimation or ME is a repetitive calculation which can perfectly leverage the parallel processing power of GPUs. Thousands of cores can analyze different regions of a frame in parallel, and the motion estimation process can be speeded up dramatically.

2) DCT: One effective way to compress a video stream is to perform a Discrete Cosine Transform (DCT) on blocks of pixels of each frame. The DCT is a mathematical operation that transforms spatial data (pixels in a frame) into frequency data, which enables efficient lossy compression (when the data is operated upon in the frequency domain). GPUs are especially good at performing large numbers of operations simultaneously, making them perfect for the computationally intensive (and parallelizable) DCT.

3) Quantization: Quantization reduces the DCT coefficients by truncating them using an integer division operation (using matrix operations) and this shrinks the video data even more. What appears as a simple trick involves countless calculations across the whole video frame. GPUs compute this operation proficiently because of their parallel processors.

4) Loop filtering: Modern codecs include multiple in-loop filters which are used to remove artefacts post-quantization: deblocking to remove visible block edges, de-ringing and smoothing filters. Although controlled by various local parameters, the core loop-filtering processes apply a set of identical kernels everywhere in the frame.

Many such operations can be offloaded to a GPU for rapid processing. Video codec engineers spend significant amounts of time optimizing their video algorithm implementations to execute on the GPUs.Although video encoders apply a great deal of adaptivity within each frame for these processes, once parameters are known these processes can (with the right architecture) be off-loaded all together to the GPU.

Benefits of GPU-based Encoding

There are a lot of benefits to using GPUs for video encoding, when compared to using only CPUs. Let us see why!

Incredibly Fast Encoding: GPU encoding can reduce your encoding times by a factor of 5-10 compared to CPUs and this will allow you to reduce the time to production or publication of your most recent content. As soon as the video arrives from the production house, you can encode it quickly, and push it to your end users!

Cost Reduction: Owing to the large number of cores on offer, you can easily fit several live channels on a single GPU (around 10 channels with 4 SD profiles each) can easily run on a GPU with ease and this reduces the amount of hardware that you need to purchase to maintain your business.

Decreased Costs via Efficiency: When you can encode faster on a GPU, you will end up reducing the time taken by the video encoding part of your media pipeline. The time saved equals cost savings on a grand scale, causing a fast ROI. The same argument exists for live channels (24×7) where you can pack several of them into a single GPU.

Applications in the Real World

Using GPU-accelerated transcoding in the real world has many practical applications.

Video Transcoding on GPUs:

GPU-accelerated transcoding platforms and video applications optimize billions of hours of video into various formats and resolutions to fit numerous devices and bandwidth conditions. It is quite common to see broadcasters and live-channel providers delivering multiple live channels on a single GPU, and we’re talking about 4 – 5 channels with a couple of HD profiles and 3 – 4 SD profiles all being transcoded and packaged into HLS in real-time.

Video Editing on GPUs:

Adobe Premiere Pro and DaVinci Resolve utilize GPU power to expedite rendering and export times. This allows video editors to edit much faster and get results of the same quality or better, but in shorter time frames.

CDNs used GPUs for Transcoding:

Using GPUs for transcoding is also great for content delivery networks (CDNs); CDNs will run GPUs to transcode videos in real time to subscribers, ensuring as many high-quality videos as possible will be available across any device and network.

Challenges and Future Directions

While GPU accelerated transcoding has its benefits, there are still hurdles that must be overcome to take advantage.

Cost

One of the most notable challenges is getting started – high-performance GPUs are expensive and companies must carefully evaluate their ROI before moving over to a GPU-based workflow.

Video Quality and Compression Efficiency

While a GPU is great at number crunching in parallel, there are a lot of video compression algorithms that cannot be easily parallelized and contribute heavily to the encoder’s compression efficiency. For example, Mode Decision – which decides which encoding mode (inter or intra compression) should be applied to a block, depends on the decision of a previously encoded block. Such inter-block or inter-frame dependencies can put a limit on the compression efficiency achievable on a GPU vs. a CPU.

Programming skills

And, as we mentioned before, GPU-based transcoding is complex and requires a niche skill set to program and deliver the savings that companies need from moving a GPU-based workflow.

While, it is true that GPUs can deliver enormous performance improvements, but, to arrive at the best possible outcome, a company needs programmers with a deep understanding of video compression and GPU-programming. Optimal quality and optimal performance can only be achieved together by off-loading tasks in the right way with intelligent algorithms for ensuring quality is not lost by the parallelization process. This is a stumbling block for companies looking to use GPUs for video transcoding.

However, with careful design and programming, by leveraging the powerhouse of parallel processing capacity on the GPU, we can obtain an astounding leap in performance and speed in the field of video compression.

If you are interested in video compression and codecs, you will find our overview of the MV-HEVC video codec interesting for sure!

Visionular focuses on the use of AI in video compression, that enables our video codecs to deliver the highest possible compression efficiency at the best encoding speeds! Our tech is used by the largest content providers globally to deliver the best user experience and to reduce their OPEX by 25 – 30% at least!

To learn more, please contact us for a 1:1 with a technical expert or sign up for a free trial.

Visionular focuses on the use of AI in video compression (for H.264/AVC, HEVC, and AV1) that enables our video codecs to deliver the highest possible compression efficiency at the best encoding speeds!

Our tech is used by the largest content providers globally to deliver the best user experience and to reduce their OPEX by 25 – 30% at least!

To learn more,

Related Posts