Raspberry Pi 4 CPU MHz Throttling Performance EffectsRoy Longbottom
|
%user %nice %sys %idle %iowt %irq %s/irq %total cpu0 cpu1 cpu2 cpu3 av 0to3 1.86 0 11.00 58.97 12.56 0 3.75 41.03 60.87 10.61 34.25 58.29 41.01 0.96 0 2.65 73.89 23.11 0 0 26.11 1.80 100 1.80 0.84 26.11 |
During the tests, CPU temperatures MHz and voltage were also noted, the former not increasing that much, with the others continuing at constant values.
Note - The measurements of performance at 600 MHz represent the extreme deviations from unthrottled operation, unlikely to be seen in most environments, running the applications considered here.
When using BBC iPlayer, with LAN connection, and displaying a TV programme with complex images (lions and grass), the player indicated data transfer speed of 3900 kbps and image size 960 x 540, at CPU frequency of 1500 MHz. Here, CPU utilisation of all cores approached 50% and received Bytes per second was a similar rating to the identified kbps.
My main HD TV played the complex programme at 1920 x 1080 pixels, but reverted to 960 x 540 with input from the Pi 4.
Inferior quality was indicated using 600 MHz, at 1700 kbps and size 704 x 396, with near double CPU utilisation. The performance statistics were somewhat strange, where measured data reading speed appeared to be much higher than that at 1500 MHz. Was it errors causing retransmission?
Another programme with a snow background appeared to run at the same quality at 1500 and 600 MHz. I think that this identifies the claim that the same performance can be obtained at a lower clock speed. In this case, the performance requirements would be the same Frames Per Second and image quality. Then additional but slower instructions speeds need to have execution time less than frame time.
Average Values from bcmstat ARM Bytes Per Second MHz RX B/s TX B/s %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3 1500 463,603 8,933 29 14 53 0 0 47 52 46 46 45 600 921,810 11,166 48 27 19 0 0 81 83 77 81 82 |
Below are Frames Per Second Speeds measured by the benchmark, with VSYNC disabled, avoiding clamping maximum display rate at 60 FPS. The results at this window size indicate effectively the same performance at 1500 and 600 MHz for the first four test functions, but 1500 MHz more than twice as fast for the more complex kitchen displays.
First sight of the total utilisation figures can suggest the opposite effects, being higher at 600 MHz for the first batch and similar for the others. In fact, the benchmark program has no built in multithreading, leading to most of the processing time using a single core, but not the same one over a period. Most of the value obtained by multiplying %total by 4 represents utilisation of that core. With the first tests, the time to display the images is far greater than that. Then, with the CPU limited kitchen displays, both configurations were effectively running at 100% CPU utilisation (of one core), leading to the frame time being much longer on the 600 MHz setup.
Window Size Coloured Objects Textured Objects WireFrm Texture CPU Pixels Few All Few All Kitchen Kitchen MHz Wide High FPS FPS FPS FPS FPS FPS 1500 1920 1080 56.9 55.4 52.5 48.8 30.6 20.2 600 1920 1080 55.9 54.6 51.5 48.4 12.9 9.0 ARM Test MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3 1 1500 4 3 92 0 0 8 11 9 9 4 6 1500 24 3 73 0 0 27 6 23 76 2 1 600 8 5 86 0 0 14 15 18 13 12 6 600 26 4 70 0 0 30 7 54 55 6 |
Selecting large files, most of the time is spent on sequential writing and reading. So, just this section is considered. Performance measured by the benchmark shows that using the higher clock speed produced slightly faster results.
This time, utilisation details are averages over three sample 30 second periods. Note that the relatively high values for single core activity are due to waiting for I/O time. Real utilisation (user and system) is quite low, identifying difference in CPU MHz.
MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 1500 MHz 512 18.64 18.86 18.39 42.71 42.73 42.67 1024 18.59 18.59 18.60 42.65 42.67 42.67 600 MHz 512 17.81 17.86 17.97 40.02 39.90 39.82 1024 18.04 18.07 18.11 39.90 40.10 40.02 MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3 1500 0.9 1.7 73 24 0 27 2 93 9 2 1500 0.9 1.8 73 24 0 27 49 2 54 1 1500 0.8 1.8 73 23 0 27 7 28 71 2 600 1.9 3.5 71 22 0 29 30 75 8 2 600 1.8 4.2 71 21 0 29 34 32 20 29 600 1.8 3.7 71 21 0 29 21 61 23 9 |
The LAN benchmark is the same as that used for the above drive tests, for writing and reading large files. The tests are for accessing a Windows based PC. The bcmstat TX B/s and RX B/s measurements and Windows Task Manager reports confirmed the writing and reading speeds provided below. The later bcmstat CPU utilisation results are for average writing and reading over all large files (as they were quite similar).
In this case, with higher speed measurements and CPU utilisation than during drive tests, performance degradation at 600 MHz was more significant, estimated as an average of around 20%. There, 60% total utilisation indicates more than two cores in continuous use.
------------------ MBytes/Second ------------------ MB Write1 Write2 Write3 Read1 Read2 Read3 1500 MHz 512 110.32 91.41 110.53 107.83 99.65 107.70 1024 112.08 111.89 111.59 109.38 104.58 108.37 600 MHz 512 70.35 51.76 79.38 92.76 100.62 95.26 1024 84.79 83.52 81.84 97.44 96.58 93.85 MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3 1500 1.4 18.3 53.2 6.9 11.8 47 82 28 41 36 600 2.5 25.6 40.2 6.4 19.6 60 90 60 48 41 |
This was a repeat of the LAN tests, also running a CPU benchmark, using a single thread, at the same time. LAN performance was again degraded by an average around 20% with that for the CPU benchmark in line with clock speed difference.
----------------------- MBytes/Second ---------------------- MB Write1 Write2 Write3 Read1 Read2 Read3 CPU Test 1500 MHz 512 110.48 111.43 111.18 109.97 94.91 97.20 5950 1024 111.62 112.28 114.25 107.49 101.86 111.01 600 MHz 512 52.44 57.23 40.36 90.69 92.02 95.19 2364 1024 84.80 71.79 98.81 98.19 102.36 101.84 MHz %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3 1500 23.9 18.1 29.9 7.9 9.9 70 82 77 57 64 600 26.4 25.9 19.4 5.5 18.7 81 89 74 79 80 |
Tests were carried out copying 1.1 GB files from a USB 3 flash drive on the Pi 4, via LAN, to a Windows based PC, to see if there were different performance implications to LAN benchmarks. Below are average bcmstat results at 1500 and 600 MHz, the copying time being based on the number of one second sample when data was being transmitted. With overheads, time and MB/second details confirmed data volumes.
Performance degradation at 600 MHz, based on MB/second copying speed was 40%, compared with 60% in MHz. CPU utilisation and data transfer speed were lower than those for the LAN benchmark
MHz Secs MB/sec %user %sys %idle %iowt %s/irq %total cpu0 cpu1 cpu2 cpu3 1500 17 71.9 1.7 11.2 74.0 4.4 2.5 26 71 6 10 18 600 29 42.4 2.9 18.8 66.7 4.4 1.5 33 73 17 27 16 |
Using bcmstat, minimum sampling period is one second, where this being a relatively long time, it is not possible to determine whether cores are executing instructions at the same time. For example, four cores each at 25% utilisation could apply to only one core being used continuously, but the Operating System switching between cores to share the load. The bcmstat provided %Totals represents averages of those for all cores.
Below are all the recorded bcmstat utilisation records for the file copying tests, at one second intervals, the values being rounded down for clarification of variations. This time the total shown is not the average. At 1500 MHz, it looks as though nearly 100% of one CPU is used continuously. Carrying this over to the 600 MHz results. leads to a much longer time to copy the files.
Then there is other activity that means that more than one core is being used at the same time.
1500 MHz 600 MHz Seconds cpu0 cpu1 cpu2 cpu3 Total cpu0 cpu1 cpu2 cpu3 Total 1 60 28 24 10 123 61 65 40 39 205 2 87 6 14 13 120 75 47 11 25 158 3 83 9 4 9 105 78 9 14 28 129 4 86 4 5 4 100 95 16 24 8 143 5 86 6 5 2 100 55 9 10 55 130 6 86 3 8 1 99 53 57 9 8 128 7 86 5 7 1 100 95 10 5 12 122 8 86 4 8 2 100 54 6 14 59 133 9 86 2 9 2 100 61 50 11 6 129 10 85 4 8 2 99 97 5 14 4 120 11 64 5 7 27 102 52 10 62 8 132 12 51 1 14 35 101 57 6 59 10 133 13 48 3 7 46 104 56 6 60 13 135 14 48 2 7 47 104 52 9 61 13 135 15 50 4 8 46 108 87 8 16 13 124 16 41 4 16 37 98 97 7 10 7 121 17 97 8 10 6 121 18 95 9 12 8 123 19 80 9 17 24 131 20 95 14 8 5 123 21 77 22 21 9 129 22 54 12 62 7 135 23 54 11 55 11 131 24 61 10 55 6 133 25 74 10 32 11 128 26 82 19 17 13 131 |
1500 MHz 600 MHz Seconds cpu0 cpu1 cpu2 cpu3 Total cpu0 cpu1 cpu2 cpu3 Total 1 41 45 36 42 165 79 77 80 72 308 2 38 43 33 44 159 73 53 75 68 269 3 40 38 34 44 156 62 48 64 58 232 4 47 55 42 48 192 81 69 84 84 319 5 45 54 38 45 181 86 70 88 85 329 6 39 49 32 40 160 84 67 80 80 312 7 40 44 36 41 161 76 60 78 77 290 8 49 57 40 45 191 74 66 76 65 281 9 44 46 40 47 176 67 54 63 60 244 10 38 45 31 41 154 71 48 61 59 239 Average 170 282 |