Raspberry Pi 4 CPU MHz Throttling Performance Effects

Roy Longbottom


Contents


Introduction Video Playback OpenGL Benchmark
Main Drive Benchmark LAN Benchmark LAN and CPU Benchmarks
Copying Files To Windows PC Core Utilisation Variations


Summary

On running stress tests on a Pi 4, without a cooling fan attached, CPU temperature can increase, leading to clock speed throttling in stages, normally between 1500, 1000, 750 and 600 MHz. In turn, this leads to slower performance, proportional to to clock speed reduction, for the processor speed limited programs. The following series of tests were run at the two extremes of 1500 and 600 MHz, via Raspbian, and performance measured, with monitoring of CPU MHz, voltage and temperatures.

Video Playback - These tests were run using BBC iPlayer with data transfers via LAN. Unlike with WiFi connection, no buffering was indicated using both MHz settings but, at 600 MHz, pixel dimension quality was worse viewing complex images, then the same with plain backgrounds.

OpenGL Benchmark - Performance was the same or worse, at 600 MHz, depending whether graphics or CPU speed was the limiting factor.

Main Drive Benchmark - Writing and reading large files, average data transfer speed was around 6% faster at the higher MHz setting.

LAN Benchmark - Again transferring large files, as for the drive benchmark, but with increased CPU time. Gigabit speeds were demonstrated at the higher MHz, some 25% faster than at 600 MHz.

LAN Plus CPU Benchmarks - Using the same LAN benchmark plus a single threaded processor test, network speeds were the same as before but the CPU benchmark performance was proportional to MHz settings.

Copying Files From Pi 4 USB 3 Drive Via LAN To Windows PC - Transferring 1.1 GB files, at three quarters gigabit speeds at 1500 MHz, data transfers were 70% faster than at 600 MHz, where CPU time was particularly important.

Remember - The measurements of performance at 600 MHz represent the extreme deviations from unthrottled operation, unlikely to be seen in most environments, running the applications considered here.


Introduction

On running stress tests on a Pi 4, without a cooling fan attached, CPU temperature can increase, leading to clock speed throttling in stages, normally between 1500, 1000, 750 and 600 MHz. In turn, this leads to slower performance, proportional to to clock speed reduction, for the processor speed limited programs.

I decided that it would be useful to obtain some idea of the effects on other activities that have different workload profiles. The first problem was find a way of running continuously at a constant low speed. Initially, I used the uncontrollable hair dryer treatment, where the CPU throttling was reduced to indicate 429 MHz at 88°C, with the remarkable Pi 4 continuing its processing activity.

Fortunately, I found that setting the frequency scaling governor to powersave resulted in a constant 600 MHz. Along with using the performance setting, for 1500 MHz, I ran the following tests at both frequencies to determine speed or throughput changes. These were in conjunction with using bcmstat performance monitor, particularly to identify CPU utilisation of individual cores.

Following are examples of the main CPU details quoted, with added average for CPU 0 to 4, that is the same as total CPU utilisation. Then %idle = 100 - %total. A complication is that adding percentages for the first seven columns, less %idle, is not always the same as %total.


 %user %nice  %sys %idle %iowt %irq %s/irq %total cpu0  cpu1  cpu2  cpu3 av 0to3

  1.86     0 11.00 58.97 12.56    0   3.75  41.03 60.87 10.61 34.25 58.29  41.01
  0.96     0  2.65 73.89 23.11    0      0  26.11  1.80  100   1.80  0.84  26.11 

During the tests, CPU temperatures MHz and voltage were also noted, the former not increasing that much, with the others continuing at constant values.

Note - The measurements of performance at 600 MHz represent the extreme deviations from unthrottled operation, unlikely to be seen in most environments, running the applications considered here.

Video Playback Next or Go To Start


Video Playback

When using BBC iPlayer, with LAN connection, and displaying a TV programme with complex images (lions and grass), the player indicated data transfer speed of 3900 kbps and image size 960 x 540, at CPU frequency of 1500 MHz. Here, CPU utilisation of all cores approached 50% and received Bytes per second was a similar rating to the identified kbps.

My main HD TV played the complex programme at 1920 x 1080 pixels, but reverted to 960 x 540 with input from the Pi 4.

Inferior quality was indicated using 600 MHz, at 1700 kbps and size 704 x 396, with near double CPU utilisation. The performance statistics were somewhat strange, where measured data reading speed appeared to be much higher than that at 1500 MHz. Was it errors causing retransmission?

Another programme with a snow background appeared to run at the same quality at 1500 and 600 MHz. I think that this identifies the claim that the same performance can be obtained at a lower clock speed. In this case, the performance requirements would be the same Frames Per Second and image quality. Then additional but slower instructions speeds need to have execution time less than frame time.


                          Average Values from bcmstat

 ARM  Bytes Per Second
 MHz   RX B/s   TX B/s %user  %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3

1500  463,603    8,933    29    14    53     0     0     47    52   46   46   45    
 600  921,810   11,166    48    27    19     0     0     81    83   77   81   82
   

OpenGL Benchmark Next or Go To Start


OpenGL Benchmark

Below are Frames Per Second Speeds measured by the benchmark, with VSYNC disabled, avoiding clamping maximum display rate at 60 FPS. The results at this window size indicate effectively the same performance at 1500 and 600 MHz for the first four test functions, but 1500 MHz more than twice as fast for the more complex kitchen displays.

First sight of the total utilisation figures can suggest the opposite effects, being higher at 600 MHz for the first batch and similar for the others. In fact, the benchmark program has no built in multithreading, leading to most of the processing time using a single core, but not the same one over a period. Most of the value obtained by multiplying %total by 4 represents utilisation of that core. With the first tests, the time to display the images is far greater than that. Then, with the CPU limited kitchen displays, both configurations were effectively running at 100% CPU utilisation (of one core), leading to the frame time being much longer on the 600 MHz setup.


      Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
 CPU    Pixels        Few      All      Few      All  Kitchen  Kitchen
 MHz  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

1500  1920  1080     56.9     55.4     52.5     48.8     30.6     20.2
 600  1920  1080     55.9     54.6     51.5     48.4     12.9      9.0

           ARM
    Test   MHz    %user  %sys %idle %iowt %s/irq %total cpu0  cpu1  cpu2  cpu3

      1   1500        4     3    92     0     0      8    11     9     9     4
      6   1500       24     3    73     0     0     27     6    23    76     2
      1    600        8     5    86     0     0     14    15    18    13    12
      6    600       26     4    70     0     0     30     7    54    55     6
   

Main Drive Benchmark Next or Go To Start


Main Drive Benchmark

Selecting large files, most of the time is spent on sequential writing and reading. So, just this section is considered. Performance measured by the benchmark shows that using the higher clock speed produced slightly faster results.

This time, utilisation details are averages over three sample 30 second periods. Note that the relatively high values for single core activity are due to waiting for I/O time. Real utilisation (user and system) is quite low, identifying difference in CPU MHz.


                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

1500 MHz 
 512    18.64    18.86    18.39    42.71    42.73    42.67
1024    18.59    18.59    18.60    42.65    42.67    42.67
 600 MHz
 512    17.81    17.86    17.97    40.02    39.90    39.82
1024    18.04    18.07    18.11    39.90    40.10    40.02


 MHz    %user   %sys %idle %iowt %s/irq %total cpu0  cpu1  cpu2  cpu3

1500      0.9    1.7    73    24     0     27     2    93     9     2
1500      0.9    1.8    73    24     0     27    49     2    54     1
1500      0.8    1.8    73    23     0     27     7    28    71     2

 600      1.9    3.5    71    22     0     29    30    75     8     2
 600      1.8    4.2    71    21     0     29    34    32    20    29
 600      1.8    3.7    71    21     0     29    21    61    23     9
   

LAN Benchmark Next or Go To Start


LAN Benchmark

The LAN benchmark is the same as that used for the above drive tests, for writing and reading large files. The tests are for accessing a Windows based PC. The bcmstat TX B/s and RX B/s measurements and Windows Task Manager reports confirmed the writing and reading speeds provided below. The later bcmstat CPU utilisation results are for average writing and reading over all large files (as they were quite similar).

In this case, with higher speed measurements and CPU utilisation than during drive tests, performance degradation at 600 MHz was more significant, estimated as an average of around 20%. There, 60% total utilisation indicates more than two cores in continuous use.


       ------------------ MBytes/Second ------------------
  MB   Write1   Write2   Write3    Read1    Read2    Read3

1500 MHz
 512   110.32    91.41   110.53   107.83    99.65   107.70
1024   112.08   111.89   111.59   109.38   104.58   108.37

 600 MHz
 512    70.35    51.76    79.38    92.76   100.62    95.26
1024    84.79    83.52    81.84    97.44    96.58    93.85


 MHz    %user   %sys %idle %iowt %s/irq %total cpu0  cpu1  cpu2  cpu3

1500      1.4   18.3  53.2   6.9   11.8    47    82    28    41    36
 600      2.5   25.6  40.2   6.4   19.6    60    90    60    48    41
   

LAN and CPU Benchmarks Next or Go To Start


LAN and CPU Benchmarks

This was a repeat of the LAN tests, also running a CPU benchmark, using a single thread, at the same time. LAN performance was again degraded by an average around 20% with that for the CPU benchmark in line with clock speed difference.


       ----------------------- MBytes/Second ----------------------
  MB   Write1   Write2   Write3    Read1    Read2    Read3  CPU Test

1500 MHz
 512   110.48   111.43   111.18   109.97    94.91    97.20     5950
1024   111.62   112.28   114.25   107.49   101.86   111.01

 600 MHz
 512    52.44    57.23    40.36    90.69    92.02    95.19     2364
1024    84.80    71.79    98.81    98.19   102.36   101.84

                          
 MHz    %user   %sys %idle %iowt %s/irq %total cpu0  cpu1  cpu2  cpu3

1500     23.9   18.1  29.9   7.9    9.9    70    82    77    57    64
 600     26.4   25.9  19.4   5.5   18.7    81    89    74    79    80
   

Copying Files To Windows PC Next or Go To Start


Copying Pi 4 USB 3 Files To Windows PC Via LAN

Tests were carried out copying 1.1 GB files from a USB 3 flash drive on the Pi 4, via LAN, to a Windows based PC, to see if there were different performance implications to LAN benchmarks. Below are average bcmstat results at 1500 and 600 MHz, the copying time being based on the number of one second sample when data was being transmitted. With overheads, time and MB/second details confirmed data volumes.

Performance degradation at 600 MHz, based on MB/second copying speed was 40%, compared with 60% in MHz. CPU utilisation and data transfer speed were lower than those for the LAN benchmark


   MHz Secs MB/sec %user  %sys %idle %iowt %s/irq %total cpu0  cpu1  cpu2  cpu3
                                                                                
  1500   17   71.9   1.7  11.2  74.0   4.4   2.5    26    71     6    10    18
   600   29   42.4   2.9  18.8  66.7   4.4   1.5    33    73    17    27    16
   

Core Utilisation Variations Next or Go To Start


Core Utilisation Variations

Using bcmstat, minimum sampling period is one second, where this being a relatively long time, it is not possible to determine whether cores are executing instructions at the same time. For example, four cores each at 25% utilisation could apply to only one core being used continuously, but the Operating System switching between cores to share the load. The bcmstat provided %Totals represents averages of those for all cores.

Below are all the recorded bcmstat utilisation records for the file copying tests, at one second intervals, the values being rounded down for clarification of variations. This time the total shown is not the average. At 1500 MHz, it looks as though nearly 100% of one CPU is used continuously. Carrying this over to the 600 MHz results. leads to a much longer time to copy the files. Then there is other activity that means that more than one core is being used at the same time.


                 1500 MHz                        600 MHz

 Seconds  cpu0  cpu1  cpu2  cpu3  Total   cpu0  cpu1  cpu2  cpu3  Total

       1    60    28    24    10    123     61    65    40    39    205
       2    87     6    14    13    120     75    47    11    25    158
       3    83     9     4     9    105     78     9    14    28    129
       4    86     4     5     4    100     95    16    24     8    143
       5    86     6     5     2    100     55     9    10    55    130
       6    86     3     8     1     99     53    57     9     8    128
       7    86     5     7     1    100     95    10     5    12    122
       8    86     4     8     2    100     54     6    14    59    133
       9    86     2     9     2    100     61    50    11     6    129
      10    85     4     8     2     99     97     5    14     4    120
      11    64     5     7    27    102     52    10    62     8    132
      12    51     1    14    35    101     57     6    59    10    133
      13    48     3     7    46    104     56     6    60    13    135
      14    48     2     7    47    104     52     9    61    13    135
      15    50     4     8    46    108     87     8    16    13    124
      16    41     4    16    37     98     97     7    10     7    121
      17                                    97     8    10     6    121
      18                                    95     9    12     8    123
      19                                    80     9    17    24    131
      20                                    95    14     8     5    123
      21                                    77    22    21     9    129
      22                                    54    12    62     7    135
      23                                    54    11    55    11    131
      24                                    61    10    55     6    133
      25                                    74    10    32    11    128
      26                                    82    19    17    13    131

   
The other extremes, from above, are for video playback, indicated for sample periods below. In this case, it could be expected that displayed frames per second were the same at both CPU MHz settings. The 600 MHz results indicate that more than two cores were in use at the same time. At 1500 MHz, average CPU time used is 1.7 CPU seconds per displayed second. At 600 MHz (times 15/6), this suggests 4.25 CPU seconds per second, impossible with four cores, indicating that there must be some performance degradation. This appeared to be in the form of poorer quality of displayed images (that might not be noticed).

                 1500 MHz                        600 MHz

 Seconds  cpu0  cpu1  cpu2  cpu3  Total   cpu0  cpu1  cpu2  cpu3  Total

       1    41    45    36    42    165     79    77    80    72    308
       2    38    43    33    44    159     73    53    75    68    269
       3    40    38    34    44    156     62    48    64    58    232
       4    47    55    42    48    192     81    69    84    84    319
       5    45    54    38    45    181     86    70    88    85    329
       6    39    49    32    40    160     84    67    80    80    312
       7    40    44    36    41    161     76    60    78    77    290
       8    49    57    40    45    191     74    66    76    65    281
       9    44    46    40    47    176     67    54    63    60    244
      10    38    45    31    41    154     71    48    61    59    239

 Average                            170                             282
   

Go To Start