Contents
Summary
In 2006, I ran a precompiled version of the High Performance Linpack Benchmark on my Raspberry Pi 3B. As seen by others, this indicated that the program could produce the wrong and inconsistent numeric calculations, also system crashes. I decided to repeat this exercise via a later Raspbian release, run a later program, compiled to use ATLAS, with alternative Basic Linear Algebra Subprograms and to provide comparisons with the newer Raspberry Pi 3B+.
The ATLAS based benchmark had to be built from scratch, taking an unbelievable 14 hours on the Pi 3B+, including hundreds of MFLOPS speed tuning calculations.
I also decided to see if I could reproduce the failures using my stress testing programs, particularly running four copies of one that carries out floating point calculations. At the time of writing this report, I was unable to find a 64 bit HPL benchmark for the Raspberry Pi but, in the event of failures occurring in the 32 bit stress tests, to also run the 64 bit versions under Gentoo.
Voltage might be particularly important, as a claimed solution, to the original Pi 3 failures, was to set an overvoltage parameter in the booting configurtion file.
Voltage changes again came into play on the later Raspbian Stretch release and Pi 3B+, where CPU MHz was reduced at a lower temperature than previously encountered.
It should be noted that the results reproduced here are for my particular systems and may not be representative of other Raspberry Pi configurations. To identify possible causes of failures, my CPU MHz, core volts and temperature monitoring program was run at the same time as the other benchmarks. Both the Pi 3B and 3B+ boards were installed in FLIRC cases, that provide efficient cooling arrangements.
HPL Tests - Four data sizes were used controlled by N at 1000, 2000, 4000 and 8000, running via 1, 2 and 4 cores (possibly not intended to be used that way). My program that monitors CPU MHz, core voltage and temperature was mainly run at the same time.
Result sumchecks were noted, expected to be different, depending on N, but constant, independent of the number of cores used.
HPL Older Pi 3B - Initially, running both the original HPL benchmark and ATLAS version, via Raspbian Jessie and Stretch Operating Systems, gave rise to wrong numeric sumchecks or system crashes at all data N sizes. The only failures noted were on using all four cores, but performance using fewer cores often did not achieve anticipated performance levels. Using the later recommended over voltage setting, errors and crashes only occurred using the largest N=8000 setting, the exception being via Stretch and ATLAS, where performance could be expected to be slower than using the original HPL (see 3B+) and at lower maximum CPU temperatures. Also, some of the failures were noted after short running times, when recorded temperatures were low.
HPL 3B+ - There were no sumcheck failures or system crashes, using the Pi 3B+. Sumchecks were identical using 1, 2 and 4 cores, except using an alternative procedure to specify how many should be used. There were no high temperatures (FLIRC case effect?) but, unexpectedly using Stretch. core voltage and CPU MHz reduced at 60°C, producing slower performance that could be no better than that from the old Pi 3B. The default temperature had been lowered from 70°C with Stretch. The latter applied under Jessie, with a continuous 1400 MHz, temperature well under 70°C, and the highest voltage. The fix for Stretch was another booting option. With this, the MHz and voltage were constant and temperatures reasonable. These tests confirmed that, at N=8000, the ATLAS version was slower than the original HPL benchmark.
Stress Tests - I ran my floating point stress tests, nominally in 15 minute sessions, using four independent programs, attempting to reproduce the HPL Pi 3B failures. After limited testing, I found that wrong numeric calculations and system crashes could occur when each program used 160 KB of data, overfilling the shared L2 cache.
Possible performance of the latter was confirmed by running 1, 3 and 4 programs, where three improved throughput by nearly 3 times, but was slower using all four cores.
Pi 3B - These sessions were run without the power boost, the main tests being under Jessie, with Stretch essentially producing the same performance, but as slightly different recorded voltages.
Results from four tests indicated rather strange results, all starting at 1200 MHz with some throttling at 80°C, mainly at a constant 1.2625 volts. Test 1 had 3 errors. Test 2 was hotter, with MHz throttling, but higher total MFLOPS and 25 errors. Test 3, after rebooting, had more typical speed, higher 1.2750 volts and throttling, but no errors. Test 4, after power off/on, produced results similar to Test 1.
Pi 3B+ - No data comparison failures were detected on this system. Initially under Stretch, performance was degraded due to CPU MHz being reduced, as with HPL, from 1400 to 1200 but voltage slightly different, from 1.3563 to 1.2500 at 60°C. After implementing the 70°C limit option, noted CPU MHz was a constant 1400 MHz, with voltage briefly indicating 1.2500, near the end of the tests.
Using Jessie, temperatures were similar to the first Stretch tests, but 1400 MHz was recorded continuously. Then, voltages appeared to increase slightly, from 1.3375 to 1.3438, above 60°C. Measured MFLOPS performance appeared to be slower than when using Stretch, maybe due to different arrangement with cached data.
64 Bit Version - Considering the old Pi 3B, running stand alone, the 64 bit benchmark indicated performance 28% faster than the 32 bit version but was slower running four copies of the program, becoming faster after repetitive runs (fewer RAM accesses?). Then, above 74°C at 1.2625 volts, data comparison were detected, the system crashing at 76.3°C. Following power off/on, there were no errors, with slow performance, not reaching 74°C.
There were no detected errors using the Pi 3B+. As during the HPL Jessie tests, no excessive temperatures were recorded, and voltage increased at 60°C, MHz was a constant 1400, but four core MFLOPS was particularly slow.
Background
In 2016 I ran a precompiled version of High Performance Linpack (HPL) on my Raspberry Pi 3. As seen by others, this indicated that the program could produce the wrong and inconsistent numeric calculations, also system crashes. I ran with commands that specified the use of 1, 2 and 4 CPU cores, the failures only occurring using the latter. See
details in Raspberry Pi Forum,
The errors were indicated in topic
Pi3 incorrect results under load (possibly heat related),
Here, it is indicated that the area and frequency of failures can vary considerably from one raspberry Pi 3 to another, some appearing to be due to heating effects and others power fluctuations. For the latter, a config.txt power setting “over_voltage=2” was said avoid the problem, but not so in other cases. In my case, the power setting improved the failure rate.
Instructions for using the benchmark are (at this time) available from:
this howtoforge.com tutorial,
The current document provides details of how to download, compile and run the benchmark, a procedure that I preferred, hopefully to include compile options to use later technology than those available for an earlier pre-compiled version. Unfortunately, this required installing MPICH2 (message-passing for distributed-memory applications), where the specified link failed to provide the package.
The benchmarks were run on both the Raspberry Pi 3B and 3B+. Both systems were installed in FLIRC aluminium cases, that have a built in heatsink. These lead to much lower CPU temperatures than standard plastic varieties, as demonstrated in
Raspberry Pi 3B+ 32 bit and 64 bit Benchmarks and Stress Tests.pdf
from ResearchGate and
Raspberry Pi 2 and 3 Stress Tests (htm)
archived copy.
These were repeated, using different parameters, successfully reproducing similar incorrect numeric results and system crashes, as observed on running the HPL programs.
Original Pre-Compiled Version
Later comments in the howtoforge article provided a reminder of how to download and run the pre-compiled version. I wanted this to see if the errors occurred running on a Raspberry Pi 3B+ (my original copy was no longer available). This one uses a different MPICH, appropriate instructions being:
sudo apt-get install libmpich-dev
wget http://web.eece.maine.edu/~vweaver/junk/pi3_hpl.tar.gz
tar -xvzf pi3_hpl.tar.gz
chmod +x xhpl
./xhpl
The ./xhpl run command automatically uses all four Pi 3B CPU cores. Besides this, I ran tests using taskset 0x00000001 ./xhpl and taskset 0x00000003 ./xhpl to run using 1 and 2 cores. These are important to show that the provided sumcheck indicates the same numeric results on varying the number of processors used.
ATLAS - Compiling Version Using Different BLAS (unbelievably slow to produce)
Another version of HPL was found, available from
Building HPL and ATLAS for the Raspberry Pi,
where ATLAS is an alternative Basic Linear Algebra Subprograms.
The build was completed after 14 hours on a Pi 3B+, and included hundreds of MFLOPS tuning calculations. Of little avail, resultant performance was worse than the original and still produced errors using the old model 3B.
The command to run this version, that uses all four cores, is as follows along with an example of that used initially to exercise fewer cores:
mpiexec -f nodes-1pi ./xhpl
mpiexec -f nodes-1pi taskset 0x00000003 ./xhpl
Further probing indicated that the supplied file nodes-1pi had four identical “localhost” statements, to use four cores, but this was dependent on P an Q values, each of 2, in file HPL.dat. Changing these to one localhost and 1,1 for P,Q then two localhosts and 2,1, provided an alternative to run using 1 or 2 cores. As shown later, these produced different execution speeds and numeric results to those using taskset.
Performance Expectations
My expectations of variance in performance are reflected in Intel Atom HPL resulrs, provided in a
Raspberry Pi Forum topic
This shows performance increasing by more than 1.5 times and less than twice, on doubling the number of cores used, then performance increasing on using larger data N sizes. Also shown are numeric result sumchecks also varying on increasing data N size, but constant irrespective of the number of threads used, at a particular size. In other circumstances, performance can be degraded if use of higher level slower memory is required or extended running time leads to reduced CPU MHz due to high temperatures.
These relationships are confirmed by results from my
MP MFLOPS benchmark in MultiThreading Benchmarks.pdf,
as shown in Table 1 below, measuring performance using 1, 2, 4 and 8 threads, the first set being for normal operation. The others were run to see what happens if the number of CPU cores used is restricted to one and two, via taskset parameters. In the one core case, speeds using 1 to 8 threads were reasonably constant and similar to the earlier one thread test (maybe a slight overhead). With two cores, 2, 4 and 8 thread speeds were effectively the same. For all three runs, numeric sumchecks were identical.
Next Page Table 1 or Go To Start
Example Benchmark Running Multiple Threads Using 1, 2 and 4 Cores
When CPU speed limited, and using all cores, doubling the number of threads, up to four, improved MFLOPS throughput by nearly twice.
Specifying that only one core should be used, produced constant performance with variable numbers of threads, defined by the program.
Specifying two cores produced constant performance on using 2, 4, and 8 threads.
Note constant sumchecks.
Table 1 MP-MFLOPS Benchmark
Add and Multiply instructions using 1, 2, 4, 8 Threads
Each thread deals with a dedicated segment of data.
Identical data is initialised in each word and the same
calculations applied. This leads to constant sumchecks,
independent of the number of threads used. Sumchecks
vary due to different number of calculations, depending
on data size. Third column MFLOPS depend on RAM speed.
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
All Cores Command ./MP-MFLOPSPiA7g6
MFLOPS
1T 213 213 186 783 798 779
2T 410 424 340 1555 1587 1506
4T 681 731 372 3064 3008 2941
8T 783 788 407 3083 3135 2849
Results x 100000
1T 76406 97075 99969 66015 95363 99951
2T 76406 97075 99969 66015 95363 99951
4T 76406 97075 99969 66015 95363 99951
8T 76406 97075 99969 66015 95363 99951
1 Core Command taskset 0x00000001 ./MP-MFLOPSPiA7g6
MFLOPS
1T 210 209 182 784 783 760
2T 209 208 182 783 783 757
4T 209 208 182 783 782 757
8T 208 209 181 781 782 758
Results x 100000
1T 76406 97075 99969 66015 95363 99951
2T 76406 97075 99969 66015 95363 99951
4T 76406 97075 99969 66015 95363 99951
8T 76406 97075 99969 66015 95363 99951
2 Cores Command taskset 0x00000003 ./MP-MFLOPSPiA7g6
MFLOPS
1T 206 208 182 795 783 759
2T 407 416 334 1568 1566 1506
4T 410 395 343 1567 1540 1509
8T 372 370 344 1562 1535 1506
Results x 100000
1T 76406 97075 99969 66015 95363 99951
2T 76406 97075 99969 66015 95363 99951
4T 76406 97075 99969 66015 95363 99951
8T 76406 97075 99969 66015 95363 99951
|
Go To Start
Raspberry Pi 3B+ HPL Results
Table 2 provides results run via Raspbian Stretch on a Raspberry Pi 3B+, with N values of 1000, 2000, 4000 and 8000 defined in file HPL.dat (and 256 NBs), as used for earlier runs. Variations in these parameters (such as 8192 and 128) made little difference in performance. Three results from tests that use all four cores are provided to show some possible variations. All tests ran without any wrong results or system crashes.
The original HPL program run command included taskset, to restrict the number of cores used. This produced consistent sumchecks at a particular N size. Particularly at the smaller data sizes, expected performance differences between tests using 1, 2 and 4 cores are not demonstrated., but can be useful for other comparisons.
The three sets of tests produced similar MFLOPS speeds. The third one followed a boot including a new parameter that reduced CPU MHz at a higher temperature (see table 3).
The first ATLAS HPL tests were run using the taskset parameter (last 2 columns). The performance pattern was similar to that from the original HPL benchmark but, in spite of a later compilation and all those MFLOPS calibration tests in setting it up, speed was much slower. The N 8000 performance was affected by the lower MHz (see table 3).
The other ATLAS tests made use of the nodes-1pi and HPL.dat parameters, mentioned earlier, to control the number of cores used. This produced different sumchecks using 1 and 2 cores, possibly invalidating the apparent higher performance.
Table 2 3B+ Results using Raspbian Stretch
Original HPL taskset ATLAS HPL input params ATLAS HPL taskset
N Cores MFLOPS SumCheck MFLOPS SumCheck MFLOPS SumCheck
1000 1 76 79 78 0.0052233 1066 1057 0.0066595 31 0.0069506
1000 2 177 149 172 0.0052233 1244 1220 0.0066480 237 0.0069506
1000 4 2586 2637 2650 0.0052233 1504 1458 0.0069506 1496 0.0069506
4 2608 2606 2661 0.0052233 1479 1512 0.0069506 1504 0.0069506
4 2451 2660 2642 0.0052233 1481 1440 0.0069506 1499 0.0069506
2000 1 226 227 226 0.0044702 1330 1331 0.0042812 118 0.0050602
2000 2 518 430 519 0.0044702 1767 1768 0.0043077 755 0.0050602
2000 4 3844 4047 3997 0.0044702 2434 2463 0.0050602 2448 0.0050602
4 3906 4046 4015 0.0044702 2469 2461 0.0050602 2479 0.0050602
4 3862 4066 4056 0.0044702 2390 2449 0.0050602 2461 0.0050602
4000 1 626 623 623 0.0029620 1474 1475 0.0033552 392 0.0028653
4000 2 1228 1253 1251 0.0029620 2397 2282 0.0033594 1310 0.0028653
4000 4 4966 5205 5238 0.0029620 3199 3435 0.0028653 3376 0.0028653
4 5169 5249 5154 0.0029620 3426 3401 0.0028653 3369 0.0028653
4 5004 5232 5202 0.0029620 3327 3328 0.0028653 3385 0.0028653
8000 1 1182 1168 1167 0.0025941 1571 1568 0.0022596 786 0.0024910
8000 2 1957 2200 2149 0.0025941 2654 2635 0.0022581 1782 0.0024910
8000 4 5769 5735 5813 0.0025941 3894 3815 0.0024910 3010 0.0024910
4 5795 5417 5835 0.0025941 3962 4076 0.0024910 3293 0.0024910
4 5809 5063 5607 0.0025941 4063 4022 0.0024910 3294 0.0024910
Following (XXX) were from tests on a RPi 3B+ in a FLIRC case, at N=8000 via 4 cores, after three or four runs. They demonstrated that excessive temperatures were not produced.
However, unlike running via Raspbian Jessie, using Stretch, CPU MHz was reduced from 1400 to 1200 above temperatures of 60°C, compared with 70°C as originally specified for the Pi 3B+. As described
here,
this change was included as the Stretch default to avoid problems using unstable boards (or inadequate cooling). The solution was to include temp_soft_limit=70 in the /boot/config.txt file. Results using this limit change are included below.
In case it means anything, note the differences in recorded voltages, where even using Jessie, a slight increase was indicated above 60°C. Then, under Stretch, it was decreased by 7.5% to run at the lower MHz, then increased slightly for the constant 1400 MHz, all being lower than that measured with the Jessie tests.
The MFLOPS speeds are representative of likely performance differences without and with the config.txt change.
Table 3 Inconsistent Four Core Performance (samples over test periods)
------- Stretch Original ------- --------- Stretch ATLAS --------- Jessie Original
MFLOPS 5063 5607 3120 3914 5311
XXX limit=70 XXX limit=70
All 1400 MHz All 1400 MHz All 1400 MHz
Volts MHz °C Volts °C Volts MHz °C Volts °C Volts °C
1.3500 1400 53.7 1.3563 49.4 1.3500 1400 56.9 1.3563 53.7 1.3875 53.7
1.3500 1400 57.5 1.3563 49.9 1.3500 1400 60.1 1.3563 55.8 1.3875 56.9
1.3500 1399 58.0 1.3563 55.9 1.2563 1200 60.1 1.3563 58.0 1.3875 58.0
1.3500 1400 59.1 1.3563 59.1 1.2563 1200 60.1 1.3563 60.7 1.3875 58.5
1.3500 1400 59.1 1.3563 60.1 1.2563 1200 60.7 1.3563 62.3 1.3875 59.1
1.2563 1200 59.6 1.3563 61.2 1.2563 1200 60.1 1.3563 62.3 1.3938 60.1
1.3500 1400 59.1 1.3563 62.3 1.2563 1200 60.1 1.3563 62.3 1.3938 60.7
1.3500 1200 60.1 1.3563 62.3 1.2563 1200 60.1 1.3563 62.8 1.3938 60.7
1.2563 1200 59.1 1.3563 62.8 1.2563 1200 60.1 1.3563 63.4 1.3938 61.2
1.2563 1200 60.1 1.3563 63.4 1.2563 1200 60.1 1.3563 62.8 1.3938 62.3
1.3500 1200 59.6 1.3563 63.9 1.2563 1199 60.7 1.3563 63.4 1.3938 62.3
|
Go To Start
Raspberry Pi 3B HPL Results (With Errors and Crashes)
Besides under Raspbian Stretch, the original HPL code was run via the earlier Jessie Operating System and performance was essentially identical. Using normal voltage settings, all test ran successfully using 1 and 2 cores and all failed utilising all four cores, in some rather random cases producing invalid final numeric results, or causing a system crash (frozen display), that needed a reboot. Of particular note, runs producing wrong numeric results indicated the same MFLOPS speeds as successful runs.The benchmarks were also run using the over volts setting, when the only failures were at N=8000. The ATLAS version was also run, with the same failures as normal runs but did achieve complete success using over voltage.
Table 5, below, shows CPU temperatures, core voltage and MHz recorded during some of the tests.
Note that these issues apply to my particular systems, other users having reported different behaviour.
Table 4 Old Pi 3B Performance, Errors and Crashes
Stretch Original HPL Jessie Original HPL Stretch ATLAS HPL
Normal Volts+ Normal Volts+ Normal Volts+
N Cores MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck MFLOPS Sumck
1000 1 80 OK 79 OK 82 OK 78 OK 970 OK 978 OK
1000 2 178 OK 159 OK 158 OK 177 OK 1124 OK 1097 OK
1000 4 2468 NO* 2477 OK 2494 NO* 2512 OK 1390 NO* 1385 OK
4 2426 NO* 2479 OK 2464 OK 2527 OK 1385 OK 1374 OK
4 2445 OK 2524 OK 2499 OK 1387 OK 1362 OK
2000 1 221 OK 222 OK 219 OK 225 OK 1139 OK 1142 OK
2000 2 506 OK 439 OK 472 OK 496 OK 1609 OK 1613 OK
2000 4 CRASH 3727 OK 3804 NO* 3799 OK 2159 NO* 2226 OK
4 3775 NO* 3799 OK 3747 OK 3810 OK 2259 NO* 2260 OK
4 3782 OK 3826 NO* 3797 OK 2267 NO* 2215 OK
4000 1 602 OK 598 OK 601 OK 601 OK 1294 OK 1292 OK
4000 2 1298 OK 1311 OK 1310 OK 1057 OK 2039 OK 2025 OK
4000 4 CRASH 4831 OK CRASH 4864 OK ERROR 3077 OK
4 CRASH 4813 OK CRASH 4873 OK CRASH 3119 OK
4 4736 OK 4777 OK 3142 OK
8000 1 1085 OK 1088 OK 1094 OK 1096 OK 1358 OK 1354 OK
8000 2 1996 OK 2009 OK 2036 OK 2027 OK 2278 OK 2355 OK
8000 4 CRASH 5056 NO* CRASH CRASH ERROR 3514 OK
4 CRASH 3620 OK
4 3665 OK
NO* SumCheck such as 86232467 5841191 or 1281583765 12822
OK See 3B+ Sumcheck results
ERROR Fatal error indication
CRASH Frozen display reboot required
Following were from tests on the older RPi 3B in a FLIRC case, at N=8000 via 4 cores, after three or four runs. The samples are at around 4 second intervals.
The first example indicates that a system crash does not appear to be caused by a high temperature. The other two, for completely successful runs, have the config.txt “over_voltage=2” setting (note higher voltage) with constant voltage and MHz. For these, CPU temperatures are high and not reaching the point where CPU MHz is throttled.
Table 5 Raspberry Pi 3B High Temperatures
Original HPL CRASH Original HPL ATLAS HPL
Normal Volts Over Volts Over Volts
MHz Volts °C MHz Volts °C MHz Volts °C
1200 1.2563 44.5 1200 1.3188 52.6 1200 1.3188 59.1
1200 1.2563 48.3 1200 1.3188 61.2 1200 1.3188 61.2
1200 1.2563 50.5 1200 1.3188 65.5 1200 1.3188 65.5
1200 1.2563 51.5 1200 1.3188 68.8 1200 1.3188 65.5
1200 1.2563 51.5 1200 1.3188 70.9 1200 1.3188 67.7
1200 1.2563 53.2 1200 1.3188 71.4 1199 1.3188 67.7
1200 1.3188 73.1 1200 1.3188 68.8
1200 1.3188 74.1 1200 1.3188 69.3
1200 1.3188 75.2 1200 1.3188 70.4
1200 1.3188 76.8 1199 1.3188 70.4
1200 1.3188 76.3 1200 1.3188 70.9
1200 1.3188 77.4 1200 1.3188 71.4
1200 1.3188 78.4 1200 1.3188 72.5
1200 1.3188 79.0 1200 1.3188 72.5
1200 1.3188 79.0 1200 1.3188 72.0
1200 1.3188 79.5 1200 1.3188 73.1
1200 1.3188 79.5 1200 1.3188 73.1
1200 1.3188 79.5 1199 1.3188 73.6
1200 1.3188 75.2 1200 1.3188 73.6
1200 1.3188 71.4 1200 1.3188 73.6
|
Go To Start
Stress Tests Using All Four Cores
As High Performance Linpack errors were only shown when using all four Raspberry Pi 3 cores, it was decided to determine if failures could be identified using my floating point stress testing programs. Descriptions and results of these can be found in
Raspberry Pi 3B+ 32 bit and 64 bit Benchmarks and Stress Tests.pdf
from ResearchGate and
Raspberry Pi 2 and 3 Stress Tests (htm)
archived copy.
Four copies of the stress test program were run at the same time, along with another that measures CPU MHz, core voltage and temperature. In my case, the tests could be run, via Raspbian Stretch, using the following commands in a script file, but separate terminal windows had to be opened, for Raspbian Jessie, and individual commands used. The main programs were run using 40k words or 160k bytes each,
with total address accesses greater than the shared L2 cache size.
Then section 2 is specified, for 8 floating point operations per data word, running for a minimum of 15 minutes. On starting, the number of passes is calibrated to produce 15 second tests and a final numeric result for checking purposes, identical data and calculations being used for each data word. These results are displayed and logged on an ongoing basis.
As all the programs cannot be started at the same time, later running times cannot be a constant 15 seconds per pass, and this can be affected by multiprocessing overheads and CPU clock speed reductions. This also introduces complications in synchronising MFLOPS speed calculations with the measured MHz. Further more, the way in which the Operating System handles an over utilised L2 cache can change the running time. In some case, the OS appears to improve access efficiency to produce higher measured MFLOPS, even when CPU MHz has decreased. The main concern is to use the same calculation passes for each of these logged tests, enabling numeric results to be verified.
The variations are reflected in the following log entries. If wrong numeric results are identified, only the first one is reported, to avoid excessive log entries.
The sumcheck is dependent on the number of passes and is likely to be different for each of the four tests.
lxterminal --geometry=80x15 -e ./RPiHeatMHzVolts passes 63, seconds 15 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 5 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 6 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 7 &
lxterminal --geometry=80x15 -e ./burninfpuPi2 Kwds 40 Sect 2 Mins 15 Log 8
Seconds
15.0 1200 scaling MHz, 1200 ARM MHz, core volt=1.2688V, temp=57.5'C
Pass 4 Byte Ops/ Repeat Seconds MFLOPS First All
Words Word Passes Results Same
1 40000 8 94400 15.05 2007 0.549930990 Yes
Error Example
36 40000 8 94400 19.16 1577 See later No
At End
First Unexpected Results
test1 40000 8 94400 word 24113 was 0.578973 not 0.549931
Pi 3B Raspbian Stretch Stress Tests (With Errors)
Results below are from running using a single CPU core, three and four cores. Performance, using three cores, was around three times that using that for a warmed up single core, voltage and CPU MHz were constant but with temperatures up to 73.6°C. Running to utilise all four cores, shortly after 3, stiil indicated the same voltage but temperatures reached 81.1°C, with associated CPU clock throttling, down to 1087 MHz. Note that recorded MFLOPS were less than when using three cores, the likely influence of increases out of cache accesses. Also note that the MFLOPS measurements are approximate, based on averages over slightly different intervals.
Data comparison failures only occurred during minute 7 at 80.6°C.
Single Test Running
Pass 4 Byte Ops/ Repeat Seconds MFLOPS First All
Words Word Passes Results Same
1 40000 8 128000 15.05 2722 0.541245401 Yes
2 40000 8 128000 14.23 2879 0.541245401 Yes
3 Tests Running 4 Tests Running
ARM Total ARM Total
Minute MHz Volts °C MFLOPS MHz Volts °C MFLOPS
0 1200 1.2688 47.2 1200 1.2688 56.4
1 1199 1.2688 62.3 8360 1200 1.2688 73.6 7743
2 1200 1.2688 65.5 8548 1200 1.2688 75.8 7990
3 1199 1.2688 67.1 8545 1200 1.2688 77.9 7918
4 1200 1.2688 68.2 8514 1200 1.2688 79.0 7949
5 1200 1.2688 68.8 8481 1200 1.2688 79.5 7775
6 1200 1.2688 68.8 8503 1141 1.2688 80.6 8052
7 1200 1.2688 70.9 8537 1141 1.2688 80.6 7930 ERROR
8 1200 1.2688 72.0 8533 1141 1.2688 80.6 7908
9 1200 1.2688 71.4 8528 1087 1.2688 80.6 7870
10 1200 1.2688 72.0 8535 1141 1.2688 81.1 7725
11 1200 1.2688 73.1 8503 1141 1.2688 81.1 7891
12 1200 1.2688 73.6 8491 1140 1.2688 81.1 7795
|
Next Same Using Raspbian Jessie or Go To Start
Pi 3B, Raspbian Jessie Stress Tests (With Errors)
The first test below ran without CPU MHz being throttled and wrong numeric results were detected with temperature above 71°C. There were more errors than the Jessie tests, in spite of the lower temperature, along with lower recorded MFLOPS speeds. A possible influence was the lower recorded core voltage.
The second test started at a higher temperature and suffered from MHz throttling, as expected, starting a 80°C. This time numerous errors were detected. Voltages were the same as the first test, but recorded MFLOPS were much higher, with no significant changes on throttling, and probably due to better organisation in L2 cached data, providing a higher hit rate.
The system was rebooted before the third test, with the recorded voltage surprisingly higher. No errors were observed with the temperatures increasing to MHz throttling level, bur MFLOPS were again lower.
Before the fourth test, the system was powered off then on, after a delay, to start at the lowest temperature. This time the voltage was restored to the Test 1 level, no MHz throttling was indicated, but data comparison failures were detected. MFLOPS were slightly better than during Test 3.
Test 1 - errors Test 2 - errors
Start at Sat Feb 16 16:09:03 2019 Start at Sat Feb 16 16:28:45 2019
Total Total
Minute MHz Volts °C MFLOPS Errors MHz Volts °C MFLOPS Errors
0 1200 1.2625 42.4 1200 1.2625 52.6
1 1200 1.2625 61.8 6516 0 1200 1.2625 73.1 8171 0
2 1200 1.2625 65.0 6231 0 1200 1.2625 75.8 8142 2
3 1200 1.2625 67.7 6325 0 1200 1.2625 76.8 7843 2
4 1200 1.2625 69.3 6314 0 1200 1.2625 77.9 8042 5
5 1200 1.2625 70.9 6321 0 1200 1.2625 79.5 8178 5
6 1200 1.2625 72.0 6335 1 1195 1.2625 80.6 8002 3
7 1199 1.2625 73.1 6313 0 1141 1.2625 80.6 8019 3
8 1200 1.2625 73.1 6264 1 1141 1.2625 81.1 8075 3
9 1199 1.2625 74.1 6251 0 1087 1.2625 80.6 8046 2
10 1200 1.2625 75.2 6266 0 1087 1.2625 80.6 7967 0
11 1200 1.2625 75.2 6405 1 1087 1.2625 80.6 7894 0
12 1200 1.2625 76.3 6368 0 1140 1.2625 81.1 7844 0
13 1200 1.2625 73.6 6349 2 1087 1.2625 81.1 7931 0
14 1200 1.2625 69.8 6777 2 1199 1.2625 78.4 7886 0
15 1200 1.2625 64.5 1200 1.2625 68.8
Total 7 25
Min 1199 1.2625 42.4 6231 1087 1.2625 52.6 7843
Max 1200 1.2625 76.3 6777 1200 1.2625 81.1 8171
Start at Sat Feb 16 17:03:02 2019 Start at Sat Feb 16 20:58:47 2019
Test 3 Reboot time 17:03 - no errors Test 4 Power off/on time 20:58 - errors
0 1200 1.275 48.3 1200 1.2625 40.8
1 1200 1.275 69.3 6835 0 1200 1.2625 61.2 7199 0
2 1200 1.275 72.5 6857 0 1200 1.2625 65.0 7037 0
3 1200 1.275 74.7 7015 0 1200 1.2625 67.1 7041 0
4 1199 1.275 75.8 6860 0 1200 1.2625 68.8 6997 0
5 1200 1.275 76.8 6658 0 1200 1.2625 70.4 6904 0
6 1200 1.275 77.4 7011 0 1200 1.2625 72.0 6830 0
7 1199 1.275 79.0 7016 0 1199 1.2625 73.1 6980 0
8 1200 1.275 79.5 6731 0 1200 1.2625 73.6 7133 1
9 1200 1.275 79.5 6659 0 1200 1.2625 74.7 7188 1
10 1194 1.275 79.5 6712 0 1200 1.2625 75.2 7185 2
11 1200 1.275 80.6 6949 0 1200 1.2625 75.8 7256 0
12 1195 1.275 81.1 6936 0 1200 1.2625 77.4 7200 2
13 1199 1.275 78.4 6813 0 1200 1.2625 74.1 7342 0
14 1200 1.275 72.5 0 1200 1.2625 69.8 1
15 1200 1.275 60.1 1200 1.2625 63.4
Total 0 7
Min 1194 1.275 48.3 6658 1199 1.2625 40.8 6830
Max 1200 1.275 81.1 7016 1200 1.2625 77.4 7342
|
Next Stretch 3B+ or Go To Start
Pi 3B+, Raspbian Stretch Stress Tests (No Errors or Crashes)
The first example results below are from only running one copy of the stress testing program. This indicates that four cores, using all L2 cache based data, could achieve more than 12 single precision GFLOPS. The other details are from running four copies of the program, where some RAM accesses are inevitable, resulting in slower performance. The main consideration is that no final data comparison errors were detected.
Two pairs of MP tests were run, to show the effects of increasing temperature on repeating the procedure. The first were from using the Pi, as delivered, with CPU voltage and MHz reducing at 60°C (would have been earlier without that FLIRC case). For the second pair, the system was booted with that limit=70 change, reported above for the HPL benchmark. Then voltage and MHz were constant. until 70°C was reached. Minimum MFLOPS indicated the improvement. Maximum speeds shown, and some after as low as after 10 minutes, were probably affected by higher L2 cache hit rates and some programs finishing earlier.
1 Program 4 Byte Ops/ Repeat Seconds MFLOPS First All
Pass Words Word Passes Results Same
1 40000 8 150400 15.03 3203 0.540749788 Yes
2 40000 8 150400 14.27 3373 0.540749788 Yes
4 Core Test 1 ---------------- Test 2 ---------------- Test 3 ---- Test 4 -----------
limit=70 limit=70
All 1400 MHz All 1400 MHz
All 1.3563 V
Minute MHz Volts °C MFLOPS MHz Volts °C MFLOPS °C MFLOPS Volts °C MFLOPS
0 1400 1.3563 45.1 1400 1.3563 53.7 40.2 1.3563 52.1
1 1400 1.3563 55.3 8849 1400 1.2500 60.1 8378 51.5 9408 1.3563 62.3 8504
2 1400 1.3563 57.5 9176 1200 1.2500 60.1 8232 54.2 9430 1.3563 64.5 8623
3 1400 1.3563 58.5 9170 1200 1.2500 61.2 8247 56.4 9382 1.3563 65.5 8638
4 1200 1.3563 60.1 9075 1200 1.2500 61.2 8227 56.9 9414 1.3563 66.6 8628
5 1200 1.3563 59.6 8956 1200 1.2500 61.8 8218 59.1 9390 1.3563 67.7 8630
6 1200 1.3563 60.1 8573 1200 1.2500 61.2 8116 60.7 9410 1.3563 67.7 8447
7 1200 1.2500 60.1 8707 1200 1.2500 62.3 8098 61.2 9471 1.3563 68.8 8214
8 1200 1.2500 60.1 8626 1200 1.2500 62.3 8174 62.3 9419 1.3563 68.8 8203
9 1200 1.2500 60.1 8261 1200 1.2500 62.3 8145 63.4 9427 1.3563 69.8 8251
10 1200 1.2500 60.1 8610 1200 1.2500 62.8 8223 64.5 9569 1.3563 69.8 8505
11 1200 1.2500 61.2 8558 1200 1.2500 62.3 9488 66.6 11228 1.3563 69.8 9417
12 1200 1.2500 60.7 8624 1200 1.2500 63.4 9624 64.5 1.3563 70.4 9651
13 1200 1.2500 61.2 8483 1200 1.2500 63.4 9636 59.1 1.2500 70.4 9853
14 1200 1.2500 61.2 1200 1.2500 61.2 53.7 1.3563 64.5 9760
15 1400 1.3563 56.9 1400 1.3563 55.8 52.6 1.3563 58.5
Min 1200 1.2500 45.1 8261 1200 1.2500 53.7 8098 40.2 9382 1.2500 52.1 8203
Max 1400 1.3563 61.2 9416 1400 1.3563 63.4 9636 66.6 11228 1.3563 70.4 9853
|
Pi 3B+, Raspbian Jessie Stress Tests (No Errors or Crashes)
Using Jessie no errors were again observed using the 3B+. Temperatures were similar to the first Stretch tests, but 1400 MHz was recorded continuously. Then, voltages appeared to increase slightly above 60°C. Measured MFLOPS performance appeared to be slower than when using Stretch, maybe due to different arrangement with cached data.
Start at Sat Feb 16 23:12:35 Start at Sat Feb 16 23:36:47
Total Total
Minute MHz Volts °C MFLOPS Errors MHz Volts °C MFLOPS Errors
0 1400 1.3375 41.9 1400 1.3375 46.2
1 1400 1.3375 50.5 7827 0 1400 1.3375 56.4 7557 0
2 1400 1.3375 53.7 7862 0 1400 1.3375 58.5 7380 0
3 1400 1.3375 54.8 7956 0 1400 1.3375 60.1 7570 0
4 1400 1.3375 56.4 7951 0 1400 1.3438 60.1 7409 0
5 1400 1.3375 56.9 7916 0 1400 1.3438 61.2 7402 0
6 1399 1.3375 57.5 8042 0 1400 1.3438 62.3 7447 0
7 1400 1.3375 59.6 7931 0 1400 1.3438 62.3 7448 0
8 1400 1.3438 59.1 7841 0 1400 1.3438 63.4 7438 0
9 1400 1.3438 60.7 7800 0 1400 1.3438 63.4 7453 0
10 1400 1.3438 61.2 7972 0 1400 1.3438 64.5 7463 0
11 1400 1.3438 61.8 7996 0 1400 1.3438 63.9 0
12 1400 1.3438 62.3 7857 0 1400 1.3438 63.4 0
13 1400 1.3438 61.8 0 1400 1.3438 63.4 0
14 1400 1.3375 58.0 0 1400 1.3438 63.4 0
15 1400 1.3375 55.3 1400 1.3438 60.7
Total 0 0
Min 1399 1.3375 41.9 7800 1400 1.3375 46.2 7380
Max 1400 1.3438 62.3 8042 1400 1.3438 64.5 7570
|
Next 64 Bit Gentoo or Go To Start
64 Bit Gentoo RP1 3B Stress Tests (With Errors)
As far as I was aware, access to a 64 bit HPL benchmark was not available for the Raspberry Pi 3, at the time of writing this report. But I wondered if the same MHz and voltage variations and errors might occur, as shown by my stress tests. To see, the 64 bit version was run, under Gentoo, using the same parameters as at 32 bits. of 40K words (160K Bytes) with 8 floating point operations per data word.
Further deatails of the benchmark can be found, at ResearchGate, in
Raspberry Pi 3B+ 32 bit and 64 bit Benchmarks and Stress Tests.pdf
along with the benchmark execution codes (burninfpuPi64 and RPiHeatMHzVolts64G) in
Rpi3-64-Bit-Benchmarks.tar.gz.
Considering the old Pi 3B, running stand alone, the 64 bit benchmark indicated performance 28% faster than the 32 bit version. As shown in the following results (at 4 minute intervals), this version did not demonstate the same throughput improvement, using four cores, Maximum recorded CPU temperatures were not as high, but data comparison errors were detected, after warming up, noting that recorded power measurements were at 1.2625 volts, same as when failures occurred at 32 bits. Rebooting, following a system crash, voltage was indicated as slightly higher, and no errors were detected (if that mans anything). Running times were also more variable, with some tests finishing early. With fewer than four cores in use, improved throughput would be due to little or no out of cache accesses.
Single Core Average 3688 MFLOPS
Test 1 Test 2 Test 3
Total Total Total
Seconds MHz °C MFLOPS MHz °C MFLOPS MHz Volts °C MFLOPS Errors
0 1200 44.0 1200 47.2 1200 1.2625 46.2
240 1200 63.4 3506 1200 67.1 5151 1200 1.2625 68.2 6716 0
480 1200 66.6 3489 1200 70.4 5145 1200 1.2625 72.0 6698 0
720 1200 69.8 3962 1200 72.0 5149 1200 1.2625 74.1 6709 6
960 1200 66.1 >7391 1200 72.5 >7367 1200 1.2625 74.1 >7630 5
Test 4 Test 5 Power Off/On
Total Total
Seconds MHz Volts °C MFLOPS Errors MHz Volts °C MFLOPS Errors
0 1200 1.2625 56.9 1200 1.2688 47.2
240 1200 1.2625 74.1 6248 2 1200 1.2688 68.2 4680 0
480 1200 1.2625 76.3 6246 8 1200 1.2688 71.4 4679 0
720 1200 1.2625 CRASH 3 1200 1.2688 73.6 4342 0
960 1200 1200 1.2688 69.3 >7200 0
64 Bit Gentoo Rpi 3B+ Stress Tests (No Errors or Crashes)
As for the limited 32 bit Pi 3B+ tests, no data comparison failures were detected, during the six tests reported below. The 64 bit benchmark was again faster, running one copy but, on the limited tests, was shown to be slower using programs that use all four cores. Slight increases in voltages were also indicated above 60°C, as during the Jessie HPL tests, maximum temperatures were similar and there was no CPU MHz throttling. Note that recorded voltages changed on rebooting.
1 Core 2 Cores 3 Cores
Average MFLOPS 4316 8273 7462
Test 1 Test 2 Test 3
Total Total Total
Seconds MHz Volts °C MFLOPS MHz Volts °C MFLOPS MHz Volts °C MFLOPS
0 1400 1.3375 39.2 1400 1.3375 47.8 1400 1.3375 50.5
240 1400 1.3375 52.1 5980 1400 1.3375 58.0 4298 1400 1.3438 61.2 5570
480 1400 1.3375 54.8 5980 1400 1.3438 60.1 4290 1400 1.3438 62.8 5595
720 1400 1.3375 58.5 5994 1400 1.3438 61.8 4788 1400 1.3438 63.9 5522
960 1400 1.3438 60.1 >7400 1400 1.3438 60.1 >8770 1400 1.3438 65.0 5545
Test 4 Test 5 Reboot Test 6 Power Off/On
0 1400 1.3375 54.8 1400 1.3500 47.2 1400 1.3500 46.2
240 1400 1.3438 63.4 4477 1400 1.3500 58.5 5817 1400 1.3500 59.6 6327
480 1400 1.3438 64.5 4441 1400 1.3563 61.2 5805 1400 1.3563 61.8 6341
720 1400 1.3438 64.5 4505 1400 1.3563 64.5 6500 1400 1.3563 64.5 7183
960 1400 1.3438 63.4 >5720 1400 1.3563 65.0 >9500 1400 1.3563 62.3 >10000
|
Go To Start
|