Advanced Optimization Techniques in Video Compression

Bo Zhang

Solutions Architect at Visionular

In today’s world, videos are more than just a medium; they are an omnipresent form of communication, education, entertainment, and business.

From social media reels to high-definition films, the digital era is replete with video data. However, this immense volume of visual data poses significant challenges related to storage, bandwidth, and playback quality.

Enter video compression – the art and science of reducing the size of digital video files, without compromising the overall viewing quality. And, this is done to reduce storage, delivery costs, and improve playback video quality.

Video compression’s importance stems from the pressing need for efficient storage and swift transmission, particularly in a time where streaming reigns supreme.

At the heart of video compression lies a series of complex algorithms and operations. These algorithms are tasked with the job of recognizing and eliminating redundant data, preserving only the essential components that contribute most significantly to the video’s quality. It is a balancing act between conserving storage and retaining quality, requiring an intricate understanding of video data and human perception.

The Need for Optimization in Software-based Compression

Video compression can be bifurcated into two primary categories: hardware and software-based compression. While both have their merits, they differ significantly in methodology and application.

Hardware video compression is typically achieved using dedicated equipment or chips designed exclusively for this task. It is faster due to its dedicated nature but can be less flexible when adapting to newer algorithms or standards.

In contrast, software video compression capitalizes on general-purpose computer hardware and specialized software. These software-driven approaches provide flexibility, scalability, and adaptability to emerging standards. Moreover, with software, updates can be rolled out seamlessly, ensuring the encoding techniques remain at the forefront.

In this blog, we look at the challenges to high-speed software-based video compression and provide ideas on how video can be compressed faster on modern general-purpose hardware. These ideas stem from research and development done by Visionular’s engineers and we hope this helps the video community at large.

Let’s go!

Clean Coding and Software Design Principles

In software development, the quality and efficiency of code can be critical for both a business’s success and a customer’s satisfaction. The methodology with which a software is constructed can significantly impact its performance, scalability, and maintainability, especially in a complex and deep-tech field such as video compression.  

And here is how –  

Efficiency in Video Compression Algorithms: Clean code allows for better readability and understanding of the algorithms used. When video compression algorithms, which are complex mathematical entities, are neatly articulated in code, it enhances the efficiency of the encoding and decoding process. 

Modular Encapsulation: video compression tasks can be broken down into modular components, each focusing on specific tasks, such as motion estimation, quantization, or entropy coding. This modularity ensures that components can be updated, replaced, re-used, or improved without affecting the entire system.  

E.g., creating the final bitstream (such as the NAL units, SPS, PPS, etc.) is independent of whether the encoder is running in 8 or 10-bit configurations. Or, for example, using C++ templates can minimize code required for carrying out motion estimation, while supporting 8bit/10bit modes, different color spaces such as YUV420, YUV422, YUV422-10bit, etc. 

Scalability: As video resolutions grow and new formats emerge, a clean and object-oriented codebase can be more easily scaled to accommodate these changes. A well-structured system can be expanded to handle 4K, 8K, or even more data-intensive formats without a complete overhaul. 

Performance Optimization: In video compression, performance is paramount. Clean, well-organized code is more likely to be efficient, reducing the computational resources required for encoding and decoding.  

In conclusion, for engineers delving into video compression, adopting clean coding isn’t just a best practice; it is a prerequisite for creating efficient, scalable, and maintainable compression systems.  

The resource-intensive demands of video data make these practices not just beneficial but essential for success. 

Clean Coding and Software Design Principles

“Coupling” in the context of software design refers to the degree of interdependence between different modules or components of a system.  

Advocating for coupling might seem counterintuitive given the emphasis on modular and independent design in modern software engineering (i.e., decoupled architectures). 

However, in the realm of video compression, strategic coupling becomes not just relevant but critically important, and here’s why: 

  • Predictive Coding: Modern compression techniques frequently employ predictive coding, in which future frames are forecasted based on preceding ones. SAD, or Sum of Absolute Differences, is a common method employed in motion estimation to find the difference between blocks in consecutive video frames. Given how motion estimation works, there are many overlapping computations when analyzing adjacent blocks. Here’s how tight coupling helps in making motion estimation efficient – 

    a. When calculating the SAD for a block, we are comparing it to a reference block in a previous frame. Now, when moving to an adjacent block, many of the comparisons made for the previous block remain relevant.b. Instead of recalculating the entire SAD for the new block, we can reuse a significant portion of the calculations from the previous block and only compute the differences for the new pixels.c. By reusing these calculations, the process becomes much more efficient, reducing the computational load and speeding up the motion estimation process. This practice not only conserves processing resources but also hastens the video compression process without compromising accuracy. 
  • Data Flow Efficiency: Coupling ensures that relevant data, such as motion vectors, macroblock information, or quantization parameters, flow smoothly between related components of the compression system. This smooth data flow is vital to maintain the speed and efficiency of the compression process. 
  • Memory Efficiency: Video compression, being memory-intensive, benefits from coupling by reducing unnecessary data duplication and moving entire video frames to and from the disk to the system RAM. By strategically sharing data between related processes, the algorithm can minimize its memory footprint, leading to faster operations and reduced computational resources. 
  • Optimization and Feedback Loops: Coupled systems in video compression can adapt in real-time. For instance, if a particular encoding strategy results in a higher-than-expected bitrate for a segment, a coupled system might dynamically adjust the quantization or prediction strategy for subsequent segments, optimizing quality and compression ratio. 

To put it succinctly, while decoupling promotes modularity and independent functionality, strategic coupling in video compression taps into the interconnected nature of video data, ensuring efficiency and quality. 

Harnessing the Power of Advanced Instruction Sets in Video Compression

Processing speed is very important in video compression especially if the end-user is using an encoder for live-streaming or real-time communications. To carry out every step in present-day video codecs (such as AV1, HEVC, or AVC), the power of modern CPUs and their specialized instruction sets becomes indispensable.

SSE and AVX2 in x86/x64 Architectures

Let’s first dive into the significance of instruction sets like SSE (Streaming SIMD Extensions) and AVX2 (Advanced Vector Extensions 2) for x86/x64 architectures.  

SIMD stands for Single Instruction, Multiple Data. As the name suggests, this design philosophy enables a single instruction to process multiple data elements concurrently. 

Imagine a scenario where a video encoder needs to calculate the difference between corresponding pixels in two video frames, a common operation in motion estimation. Without SIMD, the CPU would iterate through each pixel pair sequentially. However, with SSE or AVX2, the CPU can process multiple pixel pairs simultaneously, drastically speeding up the computation. 

The sheer volume of data involved in video compression makes these parallel operations not just beneficial but essential. Encoding a mere second of high-definition video can involve processing millions of pixels. Thus, the parallel processing capabilities of SSE and AVX2 prove invaluable. 

NEON and ARM Architectures for Mobile Video Compression

While SSE and AVX2 are mainstays in desktop and server environments, the mobile world, dominated by ARM architectures, requires its own set of optimizations. This is where the NEON instruction set comes into play. 

NEON, like its counterparts in the X86/X64 realm, allows for efficient parallel processing of data. Given the power constraints and performance needs of mobile devices, optimizing video compression is even more critical. Videos on mobile platforms need to be efficiently compressed for playback, transmission (like in video calls), or even editing on the go. 

Let’s dissect some of the operations NEON optimizes: 

  • Intra-prediction: This is about predicting the pixel values within a single frame based on neighboring pixel values. NEON can process these predictions for multiple pixels or blocks simultaneously, making the intra-frame compression faster. 
  • Motion compensation: In inter-frame compression, predicting the content of one frame based on another is crucial. This involves estimating motion vectors, which denote how certain blocks of pixels move between frames. NEON can expedite these calculations by handling multiple vectors concurrently. 
  • Entropy coding: This involves representing data using fewer bits based on its frequency. In video compression, this step is vital to shrink the video size further after pixel-based optimizations. With NEON, the encoder can process larger chunks of data in tandem, making entropy coding faster and more efficient. 


Advanced instruction sets, whether SSE, AVX2, or NEON, enable encoders to tackle the vast amounts of data inherent to videos with efficiency and speed.  

Thread Pool Optimization

In computing, especially when it comes to processes that demand high parallelism like video compression, the act of managing tasks involves spawning (creating) and killing (terminating) threads.  

Threads are sequences of instructions that can be executed independently.  

  • As tasks arise, new threads are spawned to handle them. 
  • Once the tasks are completed, these threads are killed or terminated. 

However, the constant cycle of spawning and killing threads is computationally intensive and can introduce latency—a challenge particularly evident in the demanding field of video compression.


Instead of constantly creating and terminating threads, thread pool optimization relies on a pre-initialized set of threads. This pool of ready threads waits in the background, primed for deployment as tasks arise.  

By eliminating the overhead of thread creation and termination, video encoding can occur more swiftly and efficiently, managing the considerable data involved without unnecessary delays.  


One-Step Deeper: Priority Scheduling

But the real magic unfolds when these threads are not just numerous but smartly managed. In video compression, not all tasks are equal in priority – some tasks may be more time-sensitive or critical to the encoding process than others.

This is where priority scheduling logic comes into play.

Priority scheduling logic ensures that threads are not distributed haphazardly but are assigned based on the urgency and importance of each task. This makes certain that crucial aspects of the encoding process receive the computational resources they require promptly.

For example, certain operations like motion estimation, which predicts motion between frames, may be deemed more critical than others during the encoding process. Or certain video frames (I/P) that are critical for the next few frames to be encoded / decoded need to be processed first and are given higher priority.

Priority scheduling ensures such operations get the necessary resources swiftly, ensuring the entire encoding process is expedited.


Video compression demands the highest standards of precision and efficiency.  

Through the integration of clean coding principles, systems coupling, and the leverage of advanced instruction sets such as (SSE, AVX2, NEON), software optimization elevates the process to its peak performance.  

Thread pool optimization and priority scheduling further ensure that these strategies are executed seamlessly.  

Every decision, from code structuring to thread management, directly impacts the encoding quality and speed. Thus, a dedicated focus on software optimization not only ensures technical excellence but also delivers the best possible video output.