Raspberry Pi 3B and 3B+ High Performance Linpack and Error TestsRoy Longbottom
Table 2 provides results run via Raspbian Stretch on a Raspberry Pi 3B+, with N values of 1000, 2000, 4000 and 8000 defined in file HPL.dat (and 256 NBs), as used for earlier runs. Variations in these parameters (such as 8192 and 128) made little difference in performance. Three results from tests that use all four cores are provided to show some possible variations. All tests ran without any wrong results or system crashes.
|
Table 2 3B+ Results using Raspbian Stretch
Original HPL taskset ATLAS HPL input params ATLAS HPL taskset
N Cores MFLOPS SumCheck MFLOPS SumCheck MFLOPS SumCheck
1000 1 76 79 78 0.0052233 1066 1057 0.0066595 31 0.0069506
1000 2 177 149 172 0.0052233 1244 1220 0.0066480 237 0.0069506
1000 4 2586 2637 2650 0.0052233 1504 1458 0.0069506 1496 0.0069506
4 2608 2606 2661 0.0052233 1479 1512 0.0069506 1504 0.0069506
4 2451 2660 2642 0.0052233 1481 1440 0.0069506 1499 0.0069506
2000 1 226 227 226 0.0044702 1330 1331 0.0042812 118 0.0050602
2000 2 518 430 519 0.0044702 1767 1768 0.0043077 755 0.0050602
2000 4 3844 4047 3997 0.0044702 2434 2463 0.0050602 2448 0.0050602
4 3906 4046 4015 0.0044702 2469 2461 0.0050602 2479 0.0050602
4 3862 4066 4056 0.0044702 2390 2449 0.0050602 2461 0.0050602
4000 1 626 623 623 0.0029620 1474 1475 0.0033552 392 0.0028653
4000 2 1228 1253 1251 0.0029620 2397 2282 0.0033594 1310 0.0028653
4000 4 4966 5205 5238 0.0029620 3199 3435 0.0028653 3376 0.0028653
4 5169 5249 5154 0.0029620 3426 3401 0.0028653 3369 0.0028653
4 5004 5232 5202 0.0029620 3327 3328 0.0028653 3385 0.0028653
8000 1 1182 1168 1167 0.0025941 1571 1568 0.0022596 786 0.0024910
8000 2 1957 2200 2149 0.0025941 2654 2635 0.0022581 1782 0.0024910
8000 4 5769 5735 5813 0.0025941 3894 3815 0.0024910 3010 0.0024910
4 5795 5417 5835 0.0025941 3962 4076 0.0024910 3293 0.0024910
4 5809 5063 5607 0.0025941 4063 4022 0.0024910 3294 0.0024910
Following (XXX) were from tests on a RPi 3B+ in a FLIRC case, at N=8000 via 4 cores, after three or four runs. They demonstrated that excessive temperatures were not produced.
However, unlike running via Raspbian Jessie, using Stretch, CPU MHz was reduced from 1400 to 1200 above temperatures of 60°C, compared with 70°C as originally specified for the Pi 3B+. As described
here,
this change was included as the Stretch default to avoid problems using unstable boards (or inadequate cooling). The solution was to include temp_soft_limit=70 in the /boot/config.txt file. Results using this limit change are included below.
In case it means anything, note the differences in recorded voltages, where even using Jessie, a slight increase was indicated above 60°C. Then, under Stretch, it was decreased by 7.5% to run at the lower MHz, then increased slightly for the constant 1400 MHz, all being lower than that measured with the Jessie tests.
The MFLOPS speeds are representative of likely performance differences without and with the config.txt change.
Table 3 Inconsistent Four Core Performance (samples over test periods)
------- Stretch Original ------- --------- Stretch ATLAS --------- Jessie Original
MFLOPS 5063 5607 3120 3914 5311
XXX limit=70 XXX limit=70
All 1400 MHz All 1400 MHz All 1400 MHz
Volts MHz °C Volts °C Volts MHz °C Volts °C Volts °C
1.3500 1400 53.7 1.3563 49.4 1.3500 1400 56.9 1.3563 53.7 1.3875 53.7
1.3500 1400 57.5 1.3563 49.9 1.3500 1400 60.1 1.3563 55.8 1.3875 56.9
1.3500 1399 58.0 1.3563 55.9 1.2563 1200 60.1 1.3563 58.0 1.3875 58.0
1.3500 1400 59.1 1.3563 59.1 1.2563 1200 60.1 1.3563 60.7 1.3875 58.5
1.3500 1400 59.1 1.3563 60.1 1.2563 1200 60.7 1.3563 62.3 1.3875 59.1
1.2563 1200 59.6 1.3563 61.2 1.2563 1200 60.1 1.3563 62.3 1.3938 60.1
1.3500 1400 59.1 1.3563 62.3 1.2563 1200 60.1 1.3563 62.3 1.3938 60.7
1.3500 1200 60.1 1.3563 62.3 1.2563 1200 60.1 1.3563 62.8 1.3938 60.7
1.2563 1200 59.1 1.3563 62.8 1.2563 1200 60.1 1.3563 63.4 1.3938 61.2
1.2563 1200 60.1 1.3563 63.4 1.2563 1200 60.1 1.3563 62.8 1.3938 62.3
1.3500 1200 59.6 1.3563 63.9 1.2563 1199 60.7 1.3563 63.4 1.3938 62.3
|
Note that these issues apply to my particular systems, other users having reported different behaviour.
Table 4 Old Pi 3B Performance, Errors and Crashes
Stretch Original HPL Jessie Original HPL Stretch ATLAS HPL
Normal Volts+ Normal Volts+ Normal Volts+
N Cores MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck
1000 1 80 OK 79 OK 82 OK 78 OK 970 OK 978 OK
1000 2 178 OK 159 OK 158 OK 177 OK 1124 OK 1097 OK
1000 4 2468 NO* 2477 OK 2494 NO* 2512 OK 1390 NO* 1385 OK
4 2426 NO* 2479 OK 2464 OK 2527 OK 1385 OK 1374 OK
4 2445 OK 2524 OK 2499 OK 1387 OK 1362 OK
2000 1 221 OK 222 OK 219 OK 225 OK 1139 OK 1142 OK
2000 2 506 OK 439 OK 472 OK 496 OK 1609 OK 1613 OK
2000 4 CRASH 3727 OK 3804 NO* 3799 OK 2159 NO* 2226 OK
4 3775 NO* 3799 OK 3747 OK 3810 OK 2259 NO* 2260 OK
4 3782 OK 3826 NO* 3797 OK 2267 NO* 2215 OK
4000 1 602 OK 598 OK 601 OK 601 OK 1294 OK 1292 OK
4000 2 1298 OK 1311 OK 1310 OK 1057 OK 2039 OK 2025 OK
4000 4 CRASH 4831 OK CRASH 4864 OK ERROR 3077 OK
4 CRASH 4813 OK CRASH 4873 OK CRASH 3119 OK
4 4736 OK 4777 OK 3142 OK
8000 1 1085 OK 1088 OK 1094 OK 1096 OK 1358 OK 1354 OK
8000 2 1996 OK 2009 OK 2036 OK 2027 OK 2278 OK 2355 OK
8000 4 CRASH 5056 NO* CRASH CRASH ERROR 3514 OK
4 CRASH 3620 OK
4 3665 OK
NO* SumCheck such as 86232467 5841191 or 1281583765 12822
OK See 3B+ Sumcheck results
ERROR Fatal error indication
CRASH Frozen display reboot required
Following were from tests on the older RPi 3B in a FLIRC case, at N=8000 via 4 cores, after three or four runs. The samples are at around 4 second intervals.
The first example indicates that a system crash does not appear to be caused by a high temperature. The other two, for completely successful runs, have the config.txt “over_voltage=2” setting (note higher voltage) with constant voltage and MHz. For these, CPU temperatures are high and not reaching the point where CPU MHz is throttled.
Table 5 Raspberry Pi 3B High Temperatures
Original HPL CRASH Original HPL ATLAS HPL
Normal Volts Over Volts Over Volts
MHz Volts °C MHz Volts °C MHz Volts °C
1200 1.2563 44.5 1200 1.3188 52.6 1200 1.3188 59.1
1200 1.2563 48.3 1200 1.3188 61.2 1200 1.3188 61.2
1200 1.2563 50.5 1200 1.3188 65.5 1200 1.3188 65.5
1200 1.2563 51.5 1200 1.3188 68.8 1200 1.3188 65.5
1200 1.2563 51.5 1200 1.3188 70.9 1200 1.3188 67.7
1200 1.2563 53.2 1200 1.3188 71.4 1199 1.3188 67.7
1200 1.3188 73.1 1200 1.3188 68.8
1200 1.3188 74.1 1200 1.3188 69.3
1200 1.3188 75.2 1200 1.3188 70.4
1200 1.3188 76.8 1199 1.3188 70.4
1200 1.3188 76.3 1200 1.3188 70.9
1200 1.3188 77.4 1200 1.3188 71.4
1200 1.3188 78.4 1200 1.3188 72.5
1200 1.3188 79.0 1200 1.3188 72.5
1200 1.3188 79.0 1200 1.3188 72.0
1200 1.3188 79.5 1200 1.3188 73.1
1200 1.3188 79.5 1200 1.3188 73.1
1200 1.3188 79.5 1199 1.3188 73.6
1200 1.3188 75.2 1200 1.3188 73.6
1200 1.3188 71.4 1200 1.3188 73.6
|
Four copies of the stress test program were run at the same time, along with another that measures CPU MHz, core voltage and temperature. In my case, the tests could be run, via Raspbian Stretch, using the following commands in a script file, but separate terminal windows had to be opened, for Raspbian Jessie, and individual commands used. The main programs were run using 40k words or 160k bytes each, with total address accesses greater than the shared L2 cache size. Then section 2 is specified, for 8 floating point operations per data word, running for a minimum of 15 minutes. On starting, the number of passes is calibrated to produce 15 second tests and a final numeric result for checking purposes, identical data and calculations being used for each data word. These results are displayed and logged on an ongoing basis.
As all the programs cannot be started at the same time, later running times cannot be a constant 15 seconds per pass, and this can be affected by multiprocessing overheads and CPU clock speed reductions. This also introduces complications in synchronising MFLOPS speed calculations with the measured MHz. Further more, the way in which the Operating System handles an over utilised L2 cache can change the running time. In some case, the OS appears to improve access efficiency to produce higher measured MFLOPS, even when CPU MHz has decreased. The main concern is to use the same calculation passes for each of these logged tests, enabling numeric results to be verified. The variations are reflected in the following log entries. If wrong numeric results are identified, only the first one is reported, to avoid excessive log entries. The sumcheck is dependent on the number of passes and is likely to be different for each of the four tests.
lxterminal --geometry=80x15 -e ./RPiHeatMHzVolts passes 63, seconds 15 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 5 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 6 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 7 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 8
Seconds
15.0 1200 scaling MHz, 1200 ARM MHz, core volt=1.2688V, temp=57.5'C
Pass 4 Byte Ops/ Repeat Seconds MFLOPS First All
Words Word Passes Results Same
1 40000 8 94400 15.05 2007 0.549930990 Yes
Error Example
36 40000 8 94400 19.16 1577 See later No
At End
First Unexpected Results
test1 40000 8 94400 word 24113 was 0.578973 not 0.549931
Data comparison failures only occurred during minute 7 at 80.6°C.
Single Test Running
Pass 4 Byte Ops/ Repeat Seconds MFLOPS First All
Words Word Passes Results Same
1 40000 8 128000 15.05 2722 0.541245401 Yes
2 40000 8 128000 14.23 2879 0.541245401 Yes
3 Tests Running 4 Tests Running
ARM Total ARM Total
Minute MHz Volts °C MFLOPS MHz Volts °C MFLOPS
0 1200 1.2688 47.2 1200 1.2688 56.4
1 1199 1.2688 62.3 8360 1200 1.2688 73.6 7743
2 1200 1.2688 65.5 8548 1200 1.2688 75.8 7990
3 1199 1.2688 67.1 8545 1200 1.2688 77.9 7918
4 1200 1.2688 68.2 8514 1200 1.2688 79.0 7949
5 1200 1.2688 68.8 8481 1200 1.2688 79.5 7775
6 1200 1.2688 68.8 8503 1141 1.2688 80.6 8052
7 1200 1.2688 70.9 8537 1141 1.2688 80.6 7930 ERROR
8 1200 1.2688 72.0 8533 1141 1.2688 80.6 7908
9 1200 1.2688 71.4 8528 1087 1.2688 80.6 7870
10 1200 1.2688 72.0 8535 1141 1.2688 81.1 7725
11 1200 1.2688 73.1 8503 1141 1.2688 81.1 7891
12 1200 1.2688 73.6 8491 1140 1.2688 81.1 7795
|
The second test started at a higher temperature and suffered from MHz throttling, as expected, starting a 80°C. This time numerous errors were detected. Voltages were the same as the first test, but recorded MFLOPS were much higher, with no significant changes on throttling, and probably due to better organisation in L2 cached data, providing a higher hit rate.
The system was rebooted before the third test, with the recorded voltage surprisingly higher. No errors were observed with the temperatures increasing to MHz throttling level, bur MFLOPS were again lower.
Before the fourth test, the system was powered off then on, after a delay, to start at the lowest temperature. This time the voltage was restored to the Test 1 level, no MHz throttling was indicated, but data comparison failures were detected. MFLOPS were slightly better than during Test 3.
Test 1 - errors Test 2 - errors
Start at Sat Feb 16 16:09:03 2019 Start at Sat Feb 16 16:28:45 2019
Total Total
Minute MHz Volts °C MFLOPS Errors MHz Volts °C MFLOPS Errors
0 1200 1.2625 42.4 1200 1.2625 52.6
1 1200 1.2625 61.8 6516 0 1200 1.2625 73.1 8171 0
2 1200 1.2625 65.0 6231 0 1200 1.2625 75.8 8142 2
3 1200 1.2625 67.7 6325 0 1200 1.2625 76.8 7843 2
4 1200 1.2625 69.3 6314 0 1200 1.2625 77.9 8042 5
5 1200 1.2625 70.9 6321 0 1200 1.2625 79.5 8178 5
6 1200 1.2625 72.0 6335 1 1195 1.2625 80.6 8002 3
7 1199 1.2625 73.1 6313 0 1141 1.2625 80.6 8019 3
8 1200 1.2625 73.1 6264 1 1141 1.2625 81.1 8075 3
9 1199 1.2625 74.1 6251 0 1087 1.2625 80.6 8046 2
10 1200 1.2625 75.2 6266 0 1087 1.2625 80.6 7967 0
11 1200 1.2625 75.2 6405 1 1087 1.2625 80.6 7894 0
12 1200 1.2625 76.3 6368 0 1140 1.2625 81.1 7844 0
13 1200 1.2625 73.6 6349 2 1087 1.2625 81.1 7931 0
14 1200 1.2625 69.8 6777 2 1199 1.2625 78.4 7886 0
15 1200 1.2625 64.5 1200 1.2625 68.8
Total 7 25
Min 1199 1.2625 42.4 6231 1087 1.2625 52.6 7843
Max 1200 1.2625 76.3 6777 1200 1.2625 81.1 8171
Start at Sat Feb 16 17:03:02 2019 Start at Sat Feb 16 20:58:47 2019
Test 3 Reboot time 17:03 - no errors Test 4 Power off/on time 20:58 - errors
0 1200 1.275 48.3 1200 1.2625 40.8
1 1200 1.275 69.3 6835 0 1200 1.2625 61.2 7199 0
2 1200 1.275 72.5 6857 0 1200 1.2625 65.0 7037 0
3 1200 1.275 74.7 7015 0 1200 1.2625 67.1 7041 0
4 1199 1.275 75.8 6860 0 1200 1.2625 68.8 6997 0
5 1200 1.275 76.8 6658 0 1200 1.2625 70.4 6904 0
6 1200 1.275 77.4 7011 0 1200 1.2625 72.0 6830 0
7 1199 1.275 79.0 7016 0 1199 1.2625 73.1 6980 0
8 1200 1.275 79.5 6731 0 1200 1.2625 73.6 7133 1
9 1200 1.275 79.5 6659 0 1200 1.2625 74.7 7188 1
10 1194 1.275 79.5 6712 0 1200 1.2625 75.2 7185 2
11 1200 1.275 80.6 6949 0 1200 1.2625 75.8 7256 0
12 1195 1.275 81.1 6936 0 1200 1.2625 77.4 7200 2
13 1199 1.275 78.4 6813 0 1200 1.2625 74.1 7342 0
14 1200 1.275 72.5 0 1200 1.2625 69.8 1
15 1200 1.275 60.1 1200 1.2625 63.4
Total 0 7
Min 1194 1.275 48.3 6658 1199 1.2625 40.8 6830
Max 1200 1.275 81.1 7016 1200 1.2625 77.4 7342
|
Two pairs of MP tests were run, to show the effects of increasing temperature on repeating the procedure. The first were from using the Pi, as delivered, with CPU voltage and MHz reducing at 60°C (would have been earlier without that FLIRC case). For the second pair, the system was booted with that limit=70 change, reported above for the HPL benchmark. Then voltage and MHz were constant. until 70°C was reached. Minimum MFLOPS indicated the improvement. Maximum speeds shown, and some after as low as after 10 minutes, were probably affected by higher L2 cache hit rates and some programs finishing earlier.
1 Program 4 Byte Ops/ Repeat Seconds MFLOPS First All
Pass Words Word Passes Results Same
1 40000 8 150400 15.03 3203 0.540749788 Yes
2 40000 8 150400 14.27 3373 0.540749788 Yes
4 Core Test 1 ---------------- Test 2 ---------------- Test 3 ---- Test 4 -----------
limit=70 limit=70
All 1400 MHz All 1400 MHz
All 1.3563 V
Minute MHz Volts °C MFLOPS MHz Volts °C MFLOPS °C MFLOPS Volts °C MFLOPS
0 1400 1.3563 45.1 1400 1.3563 53.7 40.2 1.3563 52.1
1 1400 1.3563 55.3 8849 1400 1.2500 60.1 8378 51.5 9408 1.3563 62.3 8504
2 1400 1.3563 57.5 9176 1200 1.2500 60.1 8232 54.2 9430 1.3563 64.5 8623
3 1400 1.3563 58.5 9170 1200 1.2500 61.2 8247 56.4 9382 1.3563 65.5 8638
4 1200 1.3563 60.1 9075 1200 1.2500 61.2 8227 56.9 9414 1.3563 66.6 8628
5 1200 1.3563 59.6 8956 1200 1.2500 61.8 8218 59.1 9390 1.3563 67.7 8630
6 1200 1.3563 60.1 8573 1200 1.2500 61.2 8116 60.7 9410 1.3563 67.7 8447
7 1200 1.2500 60.1 8707 1200 1.2500 62.3 8098 61.2 9471 1.3563 68.8 8214
8 1200 1.2500 60.1 8626 1200 1.2500 62.3 8174 62.3 9419 1.3563 68.8 8203
9 1200 1.2500 60.1 8261 1200 1.2500 62.3 8145 63.4 9427 1.3563 69.8 8251
10 1200 1.2500 60.1 8610 1200 1.2500 62.8 8223 64.5 9569 1.3563 69.8 8505
11 1200 1.2500 61.2 8558 1200 1.2500 62.3 9488 66.6 11228 1.3563 69.8 9417
12 1200 1.2500 60.7 8624 1200 1.2500 63.4 9624 64.5 1.3563 70.4 9651
13 1200 1.2500 61.2 8483 1200 1.2500 63.4 9636 59.1 1.2500 70.4 9853
14 1200 1.2500 61.2 1200 1.2500 61.2 53.7 1.3563 64.5 9760
15 1400 1.3563 56.9 1400 1.3563 55.8 52.6 1.3563 58.5
Min 1200 1.2500 45.1 8261 1200 1.2500 53.7 8098 40.2 9382 1.2500 52.1 8203
Max 1400 1.3563 61.2 9416 1400 1.3563 63.4 9636 66.6 11228 1.3563 70.4 9853
|
Using Jessie no errors were again observed using the 3B+. Temperatures were similar to the first Stretch tests, but 1400 MHz was recorded continuously. Then, voltages appeared to increase slightly above 60°C. Measured MFLOPS performance appeared to be slower than when using Stretch, maybe due to different arrangement with cached data.
Start at Sat Feb 16 23:12:35 Start at Sat Feb 16 23:36:47
Total Total
Minute MHz Volts °C MFLOPS Errors MHz Volts °C MFLOPS Errors
0 1400 1.3375 41.9 1400 1.3375 46.2
1 1400 1.3375 50.5 7827 0 1400 1.3375 56.4 7557 0
2 1400 1.3375 53.7 7862 0 1400 1.3375 58.5 7380 0
3 1400 1.3375 54.8 7956 0 1400 1.3375 60.1 7570 0
4 1400 1.3375 56.4 7951 0 1400 1.3438 60.1 7409 0
5 1400 1.3375 56.9 7916 0 1400 1.3438 61.2 7402 0
6 1399 1.3375 57.5 8042 0 1400 1.3438 62.3 7447 0
7 1400 1.3375 59.6 7931 0 1400 1.3438 62.3 7448 0
8 1400 1.3438 59.1 7841 0 1400 1.3438 63.4 7438 0
9 1400 1.3438 60.7 7800 0 1400 1.3438 63.4 7453 0
10 1400 1.3438 61.2 7972 0 1400 1.3438 64.5 7463 0
11 1400 1.3438 61.8 7996 0 1400 1.3438 63.9 0
12 1400 1.3438 62.3 7857 0 1400 1.3438 63.4 0
13 1400 1.3438 61.8 0 1400 1.3438 63.4 0
14 1400 1.3375 58.0 0 1400 1.3438 63.4 0
15 1400 1.3375 55.3 1400 1.3438 60.7
Total 0 0
Min 1399 1.3375 41.9 7800 1400 1.3375 46.2 7380
Max 1400 1.3438 62.3 8042 1400 1.3438 64.5 7570
|
Considering the old Pi 3B, running stand alone, the 64 bit benchmark indicated performance 28% faster than the 32 bit version. As shown in the following results (at 4 minute intervals), this version did not demonstate the same throughput improvement, using four cores, Maximum recorded CPU temperatures were not as high, but data comparison errors were detected, after warming up, noting that recorded power measurements were at 1.2625 volts, same as when failures occurred at 32 bits. Rebooting, following a system crash, voltage was indicated as slightly higher, and no errors were detected (if that mans anything). Running times were also more variable, with some tests finishing early. With fewer than four cores in use, improved throughput would be due to little or no out of cache accesses.