Fundamentals

Video Encoding on ARM vs. Intel x86 – Which Is Better?

Video encoding compared on ARM vs. Intel x86 processors

In this article, we talk about video encoding on ARM processors using benchmarks that show why they are suited for video compression (both from a performance and a pricing standpoint), and compare the results with video compression on Intel x86 (CISC) processors. 

When it comes to video encoding and building out your video encoding infra and architecture, two factors are critical –

  • how fast the encoding gets done,
  • and the compute cost of getting it done.

If you pay less and get a machine with poor specifications – it takes longer to encode; but, if you spend too much on your infrastructure, then the ROI ins’t there. This is a big trade-off you need to tackle up-front!

And, when it comes to speed, one must examine the algorithms at work and the machines on which these algorithms are being processed. Specifically, the processors used to run the algorithms.

With that said, there are two popular processor families or architectures that are used for video compression, and they are the Intel x86 processors and the ARM processors – and they perform differently due to their inherent architectural differences and instruction sets.

And publicly available prices show that these two processors are also priced very differently!

So, which architecture should you choose for your video encoding tasks?

In this article, we’ll examine the Intel x86 and ARM processors and show you a couple of benchmarks to prove which is better suited for video compression and presents a better ROI.

The Difference between ARM and Intel Processors

First up, let’s look at the ARM processors.

The ARM architecture is a popular choice for microcontrollers and embedded systems due to its low power consumption and high performance and they are found in a wide variety of devices.

The ARM processor is built upon the Reduced Instruction Set Computer (RISC) architecture with the idea that, in most cases, you can get by with a small, simple set of instructions. Meanwhile, up to a hundred or so instructions are usually available with complex instruction set computer (CISC) architectures.

ARM’s RISC approach allows it to perform as well as or even better than processors using CISC architectures, with some potentially considerable power savings along the way.

ARM processors have a great reputation for using very little power in comparison to their Intel and AMD counterparts, making them perfect for when batteries need to last a long time. Low-power, small packages, intended for use in battery-powered devices, are where ARM really stands out.

And, as a proof of its capabilities, Apple has moved its laptops and high-powered computing to an ARM-based architecture using its own M1/2/3 chips.

The ARM architecture’s scalability is readily apparent from its effective maneuvering between small, real-time, embedded systems all the way up to vastly more powerful and performant cloud (and hypervisor-layer) servers.

And now, let’s look at the Intel x86 processors.

Complex-Instruction Set Computers (CISC-based x86 processors) from Intel have a wide range of instructions making them a great choice when you need to perform intricate calculations. AMD is also a large supplier of x86-compatible CPUs.

However, the trade-off is the consumption of a lot of power, and they tend to run much hotter. They are the go-to chips for PCs and other devices that need to have a lot of horsepower, and they can still get a good job done in a very small amount of time.

For many years now, the x86 line of processors has set the standard for desktops, laptops, and servers worldwide.

And, because these Intel processors are so widely used, you can expect that most software are optimized for running on the x86 instruction set – either by code optimization, or hand-tuning and rewriting code using SIMD/intrinsics.

What is the advantage of using ARM processors?

The advantages of using ARM processors are many – the processors are relatively small and use very little power. And when they do use that power, it is more efficiently distributed throughout the chip than is the case with the larger and more power-hungry instruction set architectures (ISAs) found in Intel and AMD x86 processors.

ARM’s RISC design approach results in some real power savings. Power efficiency cuts both ways: It saves money by reducing your electricity bill and lowers system maintenance and management costs because it has a direct effect on system power, cooling, and reliability.

There are cloud computing options that use ARM processors, like the AWS c7g.xlarge-arm. These are great for workloads in the cloud because they offer both good performance and aren’t too expensive.

ARM’s power efficiency and performance make it appropriate for edge computing, which calls for computing “at the edge” of the network, where it is closest to the data.

Video Encoding Benchmark on x86 and ARM

To provide a comprehensive analysis, we used our Aurora5 (HEVC) encoder that is optimized for both Intel x86 and ARM processors.

The codebase for the Aurora series of encoders (for H.264/AVC, HEVC, and AV1) is written such that, the encoder detects the underlying processor and executes the right set of intrinsics accordingly. Several key sections of the encoder have been written using ARM and Intel-specific intrinsics and this makes the Aurora encoders exceptionally performant.

For the benchmark, we compared encoding on three AWS instances –

  • c7g.2xlarge-arm,
  • c7i.2xlarge-x86, and
  • c5.2xlarge-x86.

We also used two different version of our Aurora5 encoder – with and without Context Adaptive Encoding (CAE) that uses AI in video compression, including our Aurora AI-based pre-analysis & pre-processing module.

The benchmarking parameters were:

  • Resolution: 720p and 1080p
  • Encoding Settings: Slow-speed preset, CRF=26.5, I-frame interval of 5 seconds.
  • There are 16 cores in each instance.
  • The CPU occupancy rate for 1080p video is about 1315.6% and for 720p video is about 1094.6%.
  • Threading: The experiments are run with 8 threads.

Performance Analysis

In this section, we’ll dive into the data presented in the tables, focusing on the performance and cost-efficiency of ARM-based instances compared to x86-based instances.

Let’s start by explaining the table headers to help you interpret the information.

  1. Machine: This column lists the type of AWS EC2 instance used for encoding. We are comparing c7g.2xlarge-arm (ARM-based), c7i.2xlarge-x86 (Intel x86-based), and c5.2xlarge-x86 (Intel x86-based) instances.
  2. Speed (Duration / Encoding Time): Speed is calculated as the duration of the video divided by the time it takes to encode that video. A higher speed indicates better performance. For example, a speed of 1 or larger means the machine processes the video duration in real-time.
  3. Time to encode 1 minute of video (seconds): This column shows the actual time in seconds it takes to encode 1 minute of video. Lower values indicate faster encoding. For example, c7g.2xlarge-arm takes 60 seconds to encode 1 minute of 720p video.
  4. AWS Price to encode 1 minute of video: This column calculates the cost to encode 1 minute of video, based on the per-second price and the time taken. For instance, encoding 1 minute of 720p video on a c7g.2xlarge-arm instance costs $0.004833.
  5. Savings vs. c7i.2xlarge-x86: This column shows the percentage cost savings of using the ARM-based c7g.2xlarge-arm instance compared to the Intel x86-based c7i.2xlarge-x86 instance.
  6. Savings vs. c5.2xlarge-x86: This column shows the percentage cost savings of using the ARM-based c7g.2xlarge-arm instance compared to the Intel x86-based c5.2xlarge-x86 instance.

720p Encoding Performance on ARM and Intel x86

First up, let’s analyze the performance and cost for encoding 720p video on ARM and Intel x86 and compare the performance. These tests do not use our Content Adaptive Encoding technology.

 

Machine Speed (Duration / Encoding Time) Time to Encode 1 Minute (seconds) AWS Price for 1 Minute Video Savings vs. c7i.2xlarge-x86 Savings vs. c5.2xlarge-x86
c7g.2xlarge-arm 1.00 60.00 $0.004833 -25.27% -43.71%
c7i.2xlarge-x86 0.92 65.22 $0.006467
c5.2xlarge-x86 0.66 90.91 $0.008586
Figure 1: Performance comparison of ARM vs. Intel x86 processors for encoding 720p videos without CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
Figure 1: Performance comparison of ARM vs. Intel x86 processors for encoding 720p videos without CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
  1. Performance (Speed and Time): The c7g.2xlarge-arm instance has a speed of 1.00, meaning it can encode 1 minute of video in 60 seconds, which is real-time performance. The c7i.2xlarge-x86 instance is slightly slower with a speed of 0.92, taking 65.22 seconds to encode the same video. The c5.2xlarge-x86 instance has a lower speed of 0.66, taking 90.91 seconds.
  2. Cost Efficiency (AWS Price for 1 Minute Video): The c7g.2xlarge-arm instance costs $0.004833 to encode 1 minute of 720p video, which is cheaper than the c7i.2xlarge-x86 at $0.006467 and the c5.2xlarge-x86 at $0.008586. This demonstrates the cost savings of ARM-based instances.
  3. Savings: Using the c7g.2xlarge-arm instance provides a 25.27% cost savings compared to the c7i.2xlarge-x86 and a 43.71% savings compared to the c5.2xlarge-x86. This highlights the cost efficiency of ARM instances for 720p video encoding.

1080p Encoding Performance on ARM and Intel x86

Next, let’s analyze the performance and cost for encoding 1080p video on ARM and Intel x86 and compare the performance. These tests do not use our Content Adaptive Encoding technology.

Machine Speed (Duration / Encoding Time) Time to Encode 1 Minute (seconds) AWS Price for 1 Minute Video Savings vs. c7i.2xlarge-x86 Savings vs. c5.2xlarge-x86
c7g.2xlarge-arm 0.44 136.36 $0.010985 -26.15% -45.72%
c7i.2xlarge-x86 0.40 150.00 $0.014875
c5.2xlarge-x86 0.28 214.29 $0.020238
Figure 2: Performance comparison of ARM vs. Intel x86 processors for encoding 1080p videos without CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
Figure 2: Performance comparison of ARM vs. Intel x86 processors for encoding 1080p videos without CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
  1. Performance (Speed and Time): The c7g.2xlarge-arm instance has a speed of 0.44, taking 136.36 seconds to encode 1 minute of 1080p video. The c7i.2xlarge-x86 instance has a speed of 0.40, taking 150 seconds, while the c5.2xlarge-x86 instance is slower with a speed of 0.28, taking 214.29 seconds.
  2. Cost Efficiency (AWS Price for 1 Minute Video): The c7g.2xlarge-arm instance costs $0.010985 to encode 1 minute of 1080p video, which is lower than the c7i.2xlarge-x86 at $0.014875 and the c5.2xlarge-x86 at $0.020238. This shows the cost benefits of using ARM instances.
  3. Savings: The c7g.2xlarge-arm instance offers a 26.15% cost savings compared to the c7i.2xlarge-x86 and a 45.72% savings compared to the c5.2xlarge-x86 for 1080p video encoding.

720p Encoding Performance (with Content Adaptive Encoding) on ARM and Intel x86

Now, let’s analyze the performance and cost for encoding 720p video on ARM and Intel x86 and compare the performance. These tests use our Content Adaptive Encoding technology.

Machine Speed (Duration / Encoding Time) Time to Encode 1 Minute (seconds) AWS Price for 1 Minute Video Savings vs. c7i.2xlarge-x86 Savings vs. c5.2xlarge-x86
c7g.2xlarge-arm 0.94 63.83 $0.005142 -24.82% -41.02%
c7i.2xlarge-x86 0.87 68.97 $0.006839
c5.2xlarge-x86 0.65 92.31 $0.008718
Figure 3: Performance comparison of ARM vs. Intel x86 processors for encoding 720p videos with CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
Figure 3: Performance comparison of ARM vs. Intel x86 processors for encoding 720p videos with CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
  1. Performance (Speed and Time): The c7g.2xlarge-arm instance has a speed of 0.94, taking 63.83 seconds to encode 1 minute of 720p video with CAE. The c7i.2xlarge-x86 instance has a speed of 0.87, taking 68.97 seconds, while the c5.2xlarge-x86 instance is slower with a speed of 0.65, taking 92.31 seconds.
  2. Cost Efficiency (AWS Price for 1 Minute Video): The c7g.2xlarge-arm instance costs $0.005142 to encode 1 minute of 720p video with CAE, which is lower than the c7i.2xlarge-x86 at $0.006839 and the c5.2xlarge-x86 at $0.008718. This shows the cost benefits of using ARM instances.
  3. Savings: The c7g.2xlarge-arm instance offers a 24.82% cost savings compared to the c7i.2xlarge-x86 and a 41.02% savings compared to the c5.2xlarge-x86 for 720p video encoding with CAE.

1080p Encoding Performance (with Content Adaptive Encoding) on ARM and Intel x86

Finally, let’s analyze the performance and cost for encoding 1080p video on ARM and Intel x86 and compare the performance. These tests use our Content Adaptive Encoding technology.

Machine Speed (Duration / Encoding Time) Time to Encode 1 Minute (seconds) AWS Price for 1 Minute Video Savings vs. c7i.2xlarge-x86 Savings vs. c5.2xlarge-x86
c7g.2xlarge-arm 0.41 146.34 $0.011789 -26.69% -41.75%
c7i.2xlarge-x86 0.37 162.16 $0.016081
c5.2xlarge-x86 0.28 214.29 $0.020238
Figure 4: Performance comparison of ARM vs. Intel x86 processors for encoding 1080p videos with CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
Figure 4: Performance comparison of ARM vs. Intel x86 processors for encoding 1080p videos with CAE. It’s clear that ARM processors deliver the best speed to price trade-off.
  1. Performance (Speed and Time): The c7g.2xlarge-arm instance has a speed of 0.41, taking 146.34 seconds to encode 1 minute of 1080p video with CAE. The c7i.2xlarge-x86 instance has a speed of 0.37, taking 162.16 seconds, while the c5.2xlarge-x86 instance is slower with a speed of 0.28, taking 214.29 seconds.
  2. Cost Efficiency (AWS Price for 1 Minute Video): The c7g.2xlarge-arm instance costs $0.011789 to encode 1 minute of 1080p video with CAE, which is lower than the c7i.2xlarge-x86 at $0.016081 and the c5.2xlarge-x86 at $0.020238. This shows the cost benefits of using ARM instances.
  3. Savings: The c7g.2xlarge-arm instance offers a 26.69% cost savings compared to the c7i.2xlarge-x86 and a 41.75% savings compared to the c5.2xlarge-x86 for 1080p video encoding with CAE.

Summary of Cost and Performance Benefits of ARM vs. Intel x86

The data demonstrates that ARM-based instances (c7g.2xlarge-arm) provide significant cost savings and competitive performance compared to Intel x86-based instances (c7i.2xlarge-x86 and c5.2xlarge-x86).

For various video encoding tasks, ARM instances offer:

  • 25-46% Cost Savings: Significant reduction in encoding costs.
  • Comparable or Better Performance: Similar or faster encoding times, especially compared to older x86 instances.

Conclusion

Before we end this article, here are some important points to note –

Threading: All the data in this discussion comes from using 8 threads for encoding. When we used only 4 threads, we observed the speed & time performance between ARM and x86 processors are similar. This means that the encoding speed (or the encoding time for a 1-minute video) is comparable between ARM and Intel x86. However, with the advantageous lower pricing of ARM, it remains a competitive option.  In contrast, with 8 threads, the ARM processor clearly outshines its x86 counterpart in both speed & time performance, as well as pricing.

Video dataset: The outcome also hinges on the configuration of our encoders and the videos used to benchmark them. We used 50 sequences from YouTube’s UGC data set (publicly available here) for this benchmark, representing a wide variety of video characteristics.

The optimization potential is clear from the data in this article, and it becomes apparent that if we fully optimize for the ARM architecture, there’s a huge opportunity for encoders to take advantage of compute efficiency.

Countering the widespread notion that ARM is less capable, but cheaper, than Intel x86, our investigation shows that ARM can be both cost-effective and highly capable. Especially when optimized for the ARM architecture, our findings indicate that ARM can handle demanding video compression tasks with ease.

Visionular focuses on the use of AI in video compression (for H.264/AVC, HEVC, and AV1) that enables our video codecs to deliver the highest possible compression efficiency at the best encoding speeds!

Our tech is used by the largest content providers globally to deliver the best user experience and to reduce their OPEX by 25 – 30% at least!

To learn more,

Related Posts