Raspberry Pi 4B 64 Bit Benchmarks and Stress Tests
|
System MHz MWIPS ------MFLOPS------ ------------MOPS--------------- 1 2 3 COS EXP FIXPT IF EQUAL Pi 3B+ 1400 1071 383 403 328 20.9 12.4 1704 N/A 1357 Pi 4B 1500 2269 522 534 398 54.8 39.8 2487 N/A 997 Pi4/3B+ 1.07 2.12 1.36 1.32 1.21 2.63 3.21 1.46 N/A 0.73 Pi 4B 32b 1500 1884 516 478 310 54.7 27.1 2498 2247 999 64b/32b 1.00 1.20 1.01 1.12 1.28 1.00 1.47 1.00 N/A 1.00 =========================================================================== gcc 9 Pi 3B+ 1400 1482 384 404 329 27.4 28.2 1712 2042 1362 Pi 4B 1500 2330 522 533 398 60.4 40.3 2493 2984 997 Pi4/3B+ 1.07 1.57 1.36 1.32 1.21 2.21 1.43 1.46 1.46 0.73 gcc 9/6 Pi 4B 1.00 1.03 1.00 1.00 1.00 1.10 1.01 1.00 N/A 1.00 Dhrystone Benchmark below or Go To Start |
Using the same 64 bit program, the Pi 4 was more than twice as fast and 52% faster than the 32 bit compilation.
The gcc 9 compilations lead to no real difference in performance.
Compiled DMIPS System MHz DMIPS /MHz Pi 3B+ 1400 4028 2.88 Pi 4B 1500 8176 5.45 Pi4/3B+ 1.07 2.03 Pi 4B 32b 1500 5366 3.58 64b/32b 1.00 1.52 =============================== gcc 9 Pi 3B+ 1400 3896 2.78 Pi 4B 1500 8190 5.46 Pi4/3B+ 1.07 2.10 gcc 9/6 Pi 4B 1.00 1.00 Linpack Benchmark below or Go To Start |
The Pi 3B+ 32 bit results are also provided for clarification. My results were highlighted in the MagPi magazine, on announcement of the Pi 4, particularly the 2 GFLOPS 32 bit NEON speed. See raspberry-pi-4-specs-benchmarks.
At 64 bits, Pi 4/3B+ performance ratios were generally higher than those from the earlier benchmarks. Then, as could be expected, virtually compiler independent performance, using NEON Intrinsic Functions, were similar at 32 bits and 64 bits. The main 64 bit gain was with the compiled single precision version, obtaining the same performance as that via NEON Intrinsics.
The new gcc 9 compilations produced the same performance as the older versions, within the variations normally seen on this benchmark.
------ MFLOPS ------ System MHz DP SP SP NEON Pi 3B+ 1400 396.6 562.1 604.2 Pi 4B 1500 1059.9 1977.8 1968.6 Pi4/3B+ 1.07 2.67 3.52 3.26 Pi 4B 32b 1500 760.2 921.6 2010.5 64b/32b 1.00 1.39 2.15 0.98 Pi 3B+ 32 1400 210.5 225.2 562.5 Pi4/3B+ 1.07 3.61 4.09 3.57 ======================================= gcc 9 Pi 3B+ 1400 396.2 571.3 566.7 Pi 4B 1500 1110.6 2052.4 1887.5 Pi4/3B+ 1.07 2.80 3.59 3.33 gcc 9/6 Pi 4B 1.00 1.05 1.04 0.96 Livermore Loops Benchmark below or Go To Start |
All the ratings indicate reasonably significant performance gains of Pi 4 over Pi 3B+ and 64 bits over 32 bits. Results from the 24 kernels indicate some higher gains. Also note the maximum speed of 2.49 GFLOPS (Double Precision).
The speed of the original Raspberry Pi could be rated as 4.5 times faster than the Cray 1 supercomputer (Geomean 11.9) - see my quote in Raspberry Pi Benchmarks.htm. Now, one core of the Raspberry Pi 4B, at 64 bits, produces performance equivalent to 61 Cray 1 supercomputers.
There were some performance differences in gcc 9 results but average speeds were quite similar.
Overall Ratings - MFLOPS System MHz Maximum Average Geomean Harmean Minimum Pi 3B+ 64b 1400 737.7 319.4 284.7 250.6 91.6 Pi 4B 64b 1500 2490.5 892 730.3 603.3 212.4 Pi4/3B+ 1.07 3.38 2.79 2.57 2.41 2.32 Pi 4B 32b 1500 1800.2 635.1 519,0 416.1 155.3 64b/32b 1.00 1.38 1.40 1.41 1.45 1.37 ====================================================== gcc 9 Pi 3B+ 1400 1000.7 347.8 308.0 275.2 117.3 Pi 4B 1500 2744.5 962.5 768.2 596.2 132.1 Pi4/3B+ 1.07 2.74 2.77 2.49 2.17 1.13 gcc 9/6 Pi 4B 1.00 1.10 1.08 1.05 0.99 0.62 MFLOPS for 24 loops MFLOPS Of 24 Kernels Pi 3B+ 540 296 539 527 226 175 738 428 484 251 169 245 127 161 291 258 440 520 333 280 310 93 362 209 Pi 4B 2026 997 987 948 372 739 2033 2491 1980 758 495 875 220 404 811 710 753 1124 444 397 1061 414 822 283 Pi 4B/ 3.75 3.37 1.83 1.80 1.65 4.23 2.76 5.83 4.09 3.02 2.92 3.57 Pi 3B+ 1.73 2.51 2.79 2.75 1.71 2.16 1.33 1.42 3.43 4.48 2.27 1.36 Min 1.33 Max 5.83 Pi 4B 32 746 964 988 943 212 538 1169 1800 1032 469 214 186 159 335 778 623 732 1034 320 350 489 360 749 187 64b/32b 2.72 1.03 1.00 1.00 1.76 1.37 1.74 1.38 1.92 1.62 2.31 4.70 1.38 1.20 1.04 1.14 1.03 1.09 1.39 1.13 2.17 1.15 1.10 1.51 Min 1.00 Max 4.70 =========================================================================== gcc9 Pi 3B+ 565 320 319 535 227 207 1001 581 541 234 171 248 121 160 293 280 456 547 337 287 367 190 386 209 Pi 4B 2146 989 970 965 390 785 2386 2479 1879 632 500 973 134 423 814 670 726 1177 450 397 1675 561 818 283 Pi 4B/ 3.80 3.09 3.04 1.80 1.72 3.80 2.38 4.27 3.48 2.70 2.93 3.93 Pi 3B+ 1.10 2.65 2.78 2.39 1.59 2.15 1.33 1.39 4.56 2.95 2.12 1.35 Min 1.10 Max 4.56 gcc 9/6 Pi 4B 1.06 0.99 0.98 1.02 1.05 1.06 1.17 1.00 0.95 0.83 1.01 1.11 0.61 1.05 1.00 0.94 0.96 1.05 1.01 1.00 1.58 1.35 1.00 1.00 Min 0.61 Max 1.58 Fast Fourier Transforms Benchmarks below or Go To Start |
There were gains all round on the Pi 4, compared with the 3B+, mainly between 3 and 4 times on the optimised version, less so using FFT1, with more data transfer speed dependency.
On the Pi 4, performance from the 32 bit compilation was often similar to that at 64 bits. This is probably due to much of the data being read on a skipped sequential basis, not good for vectorisation.
The Pi 4B/3B+ performance gains were similar using both gcc 9 and gcc 6 compiled programs, but the gcc 9 compilation produced some faster FFT1 speeds, as shown in the Pi 4B gcc 9/6 comparisons.
Gentoo 64b Pi 3B+ Size FFT1 FFT3 K SP DP SP DP 1 0.13 0.15 0.15 0.17 2 0.29 0.39 0.32 0.38 4 0.76 1.13 0.79 0.85 8 1.93 2.66 1.77 1.94 16 4.02 5.51 4.69 5.14 32 9.50 25.11 9.51 13.67 64 42.53 110.21 25.30 32.25 128 151.08 257.41 57.68 76.71 256 355.88 589.07 129.47 174.85 512 819.91 1324.89 297.80 390.74 1024 1746.23 2943.08 641.50 863.82 Gentoo 64b Pi 4B Pi4/3B+ Size FFT1 FFT3 FFT1 FFT3 K SP DP SP DP SP DP SP DP 1 0.04 0.04 0.04 0.04 3.30 3.62 3.60 4.13 2 0.08 0.14 0.11 0.09 3.81 2.88 2.82 4.03 4 0.25 0.38 0.19 0.22 3.05 2.93 4.13 3.86 8 0.79 1.31 0.46 0.50 2.45 2.04 3.87 3.87 16 2.15 2.91 1.15 1.09 1.87 1.89 4.07 4.71 32 5.71 6.76 2.48 3.18 1.66 3.71 3.83 4.30 64 15.22 51.00 5.43 9.29 2.79 2.16 4.66 3.47 128 83.47 151.95 16.28 24.75 1.81 1.69 3.54 3.10 256 231.24 362.64 39.13 57.28 1.54 1.62 3.31 3.05 512 561.16 765.18 90.20 133.21 1.46 1.73 3.30 2.93 1024 1250.51 1878.44 213.35 303.39 1.40 1.57 3.01 2.85 Raspbian 32b Pi 4B 64B/32b Size FFT1 FFT3 FFT1 FFT3 K SP DP SP DP SP DP SP DP 1 0.04 0.04 0.06 0.05 0.99 0.96 1.44 1.18 2 0.08 0.12 0.13 0.11 1.04 0.89 1.14 1.18 4 0.32 0.37 0.27 0.24 1.28 0.96 1.42 1.09 8 0.77 0.97 0.58 0.55 0.98 0.74 1.26 1.09 16 1.69 2.01 1.49 1.35 0.78 0.69 1.29 1.24 32 4.37 4.89 2.96 3.63 0.77 0.72 1.19 1.14 64 9.12 26.55 7.46 10.75 0.60 0.52 1.37 1.16 128 55.52 160.11 17.93 26.03 0.67 1.05 1.10 1.05 256 305.92 423.06 41.16 55.06 1.32 1.17 1.05 0.96 512 833.10 854.88 86.93 120.53 1.48 1.12 0.96 0.90 1024 1617.49 1875.52 190.28 266.60 1.29 1.00 0.89 0.88 More below or Go To Start =========================================================================== Gentoo Pi 3B+ gcc 9 Gentoo Pi 4B gcc 9 Size FFT1 FFT3 FFT1 FFT3 K SP DP SP DP SP DP SP DP 1 0.15 0.16 0.15 0.14 0.04 0.04 0.04 0.04 2 0.34 0.39 0.31 0.31 0.08 0.13 0.08 0.09 4 0.89 1.00 0.82 0.79 0.19 0.33 0.19 0.21 8 2.19 2.70 1.66 1.89 0.71 0.74 0.46 0.46 16 4.32 5.94 4.88 5.32 1.63 2.06 1.17 1.09 32 12.47 24.05 9.59 14.82 3.73 4.03 2.44 3.09 64 66.46 116.11 26.53 36.64 7.92 27.12 5.46 9.06 128 169.06 268.02 63.65 84.00 43.28 100.75 16.09 22.00 256 401.86 600.72 141.83 195.69 192.57 254.20 37.08 49.76 512 853.48 1266.96 329.26 435.23 590.20 651.24 82.54 110.23 1024 1966.69 2808.07 721.36 981.82 1463.15 1749.37 202.20 251.71 Pi 4B/3B+ Pi 4B gcc 9/6 1 3.53 3.77 3.63 3.78 0.97 0.98 1.02 1.18 2 4.39 3.05 3.97 3.64 1.00 1.06 1.46 1.08 4 4.75 3.03 4.23 3.81 1.34 1.16 0.98 1.06 8 3.06 3.62 3.62 4.10 1.10 1.76 1.00 1.09 16 2.65 2.89 4.16 4.89 1.32 1.41 0.98 1.00 32 3.34 5.97 3.93 4.79 1.53 1.68 1.02 1.03 64 8.39 4.28 4.85 4.04 1.92 1.88 0.99 1.03 128 3.91 2.66 3.96 3.82 1.93 1.51 1.01 1.12 256 2.09 2.36 3.82 3.93 1.20 1.43 1.06 1.15 512 1.45 1.95 3.99 3.95 0.95 1.17 1.09 1.21 1024 1.34 1.61 3.57 3.90 0.85 1.07 1.06 1.21 BusSpeed Benchmark below or Go To Start |
Most data transfers were 2.0 to 2.5 times faster on the Pi 4, including from RAM, and somewhat higher with L2 cache based data.
The 64 bit version still deals with 32 bit words but transferred data somewhat quicker than the 32 bit program, as shown by the Pi 4 results.
Results from the gcc 9 compilations were virtually the same as those from gcc 6.
Gentoo 64b Pi 3B+ BusSpeed armv8 64 Bit Fri Aug 16 12:53:43 2019 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 3819 4253 4622 5041 5089 3870 32 1234 1328 2067 3158 4082 3674 64 681 704 1325 2208 3350 3602 128 638 646 1214 2070 3238 3625 256 592 617 1165 1991 3164 3622 512 295 309 640 985 2085 2790 1024 108 120 271 525 1070 1636 4096 98 123 249 486 881 1840 16384 121 114 246 480 977 1642 65536 121 124 248 409 989 1864 Gentoo 64b Pi 4B Inc2 Rd All 4B/3B+ 4B/3B+ 16 4999 5042 5665 5885 5891 8217 1.16 2.12 32 1578 2105 3283 4339 5154 7507 1.26 2.04 64 585 911 1855 3085 5163 7918 1.54 2.20 128 590 932 1888 3110 5161 7874 1.59 2.17 256 598 934 1908 3056 5265 7883 1.66 2.18 512 603 939 1822 3019 5124 7716 2.46 2.77 1024 319 482 1060 1885 3283 5721 3.07 3.50 4096 209 253 503 1006 2009 4111 2.28 2.23 16384 209 261 520 1041 2071 4115 2.12 2.51 65536 203 263 489 1011 2023 4036 2.05 2.17 Raspbian 32b Pi 4B Rd All 64b/32b 16 3836 4049 4467 5885 4641 5858 1.14 32 761 1473 2594 3216 3960 4780 1.01 64 409 801 1684 2422 3745 3940 0.95 128 406 803 1202 1914 3037 5377 1.32 256 415 700 1165 2481 4789 5137 1.27 512 392 760 1243 2455 3764 4264 1.38 1024 230 256 623 1061 2455 3501 1.59 4096 197 214 454 938 1852 3195 1.80 16384 138 215 445 897 1724 3210 1.91 65536 174 215 398 744 1655 3130 1.61 More below or Go To Start ===================================================================== Gentoo 64b Pi 3B+ gcc 9 BusSpeed 64 Bit gcc 9 Thu Sep 26 12:51:15 2019 BusSpeed armv8 64 Bit Fri Aug 16 12:53:43 2019 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 3860 4283 4677 4901 5022 3591 32 2228 2433 2989 4740 4912 3629 64 700 697 1299 2200 3310 3348 128 637 636 1208 2064 3151 3396 256 597 600 1161 1945 3105 3377 512 232 194 500 884 1629 2350 1024 118 131 159 440 692 1682 4096 91 99 197 463 923 1878 16384 119 117 200 392 775 1606 65536 101 105 238 464 873 1876 Gentoo 64b Pi 4B Rd All Rd All 4B/3B+ gcc 9/6 16 4815 5060 5573 5808 5741 8935 2.49 1.09 32 1534 1828 2967 4254 4930 7825 2.16 1.04 64 792 1007 1988 3269 4844 8062 2.41 1.02 128 730 950 1881 3133 5007 8162 2.40 1.04 256 733 955 1901 3128 5071 8236 2.44 1.04 512 737 952 1885 3139 5058 8237 3.51 1.07 1024 374 539 1047 1884 3177 5537 3.29 0.97 4096 235 255 497 990 1975 3386 1.80 0.82 16384 239 263 501 913 1984 3973 2.47 0.97 65536 239 237 502 995 1984 3971 2.12 0.98 MemSpeed Benchmark below or Go To Start |
Results are provided below for the Gentoo 64 bit version on the Pi 3B+ and Pi 4B, and the Raspbian 32 bit variety on the Pi 4B, then a sample of relative performance, covering data from L1 cache, L2 cache and RAM.
Gains, greater than the 7% CPU MHz difference, were recorded all round by the Pi 4B over the Pi 3B+. The most impressive were on using L2 cache based data and the more intensive floating point calculations. On the Pi 4B, speeds of 64 bit and 32 bit compilations were similar using RAM based data and executing some integer tests, but significantly faster from cache based floating point calculations.
Many Pi 4B/3B+ comparisons were similar, but the gcc 9 compilation gave rise to a number of changes, compared with the older version. The latter was slightly faster using some double precision calculations, but gcc 9 produced speed increases between 1.3 and 2.6 times with integers and single precision, the latter providing a maximum of 5.5 GFLOPS compared with 3.5.
Memory Reading Speed Test armv8 64 Bit by Roy Longbottom Start of test Fri Aug 16 12:48:51 2019 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S Gentoo 64b Pi 3B+ 8 4813 2897 4350 6180 3954 4831 5378 4324 4324 16 4540 2900 4356 6213 3961 4838 5401 4344 4333 32 4184 2780 4047 5540 3721 4483 5421 4285 4316 64 3784 2678 3803 4776 3547 4171 4925 4087 4051 128 3613 2694 3842 4731 3562 4188 4967 4087 4103 256 3133 2652 3800 4626 3493 4027 4967 4093 4096 512 670 882 1630 2913 2422 2718 3101 3141 2780 1024 587 774 1017 1310 1287 1184 1105 1526 1543 2048 555 746 917 1143 1131 1043 1071 1007 1128 4096 545 691 1130 1039 1015 1140 1045 1087 892 8192 537 795 1139 980 1133 1148 887 854 922 Max MFLOPS 602 725 Gentoo 64b Pi 4B 8 15530 13973 12509 15570 14025 15534 11417 9308 7798 16 15719 14042 12750 15745 14200 15660 11753 9447 7890 32 14062 12228 11435 14052 12699 12855 11864 9459 7937 64 12195 11344 10698 12211 11705 12025 8872 8752 7904 128 12172 11360 10755 12166 11862 11975 8569 8460 7913 256 12228 11369 10697 12123 11790 12082 8073 8222 7896 512 11269 10738 10206 10985 11164 11590 8017 6280 6557 1024 3407 2635 3281 3396 3242 2979 3765 3947 4029 2048 1525 1832 1838 1851 1607 1838 2819 2790 2770 4096 1407 1851 1859 1861 1666 1840 2485 2487 2410 8192 1913 1914 1922 1528 1895 1891 2496 2234 2489 Max MFLOPS 1965 3511 Comparison 64b Pi4/3B+ 8 3.23 4.82 2.88 2.52 3.55 3.22 2.12 2.15 1.80 16 3.46 4.84 2.93 2.53 3.58 3.24 2.18 2.17 1.82 256 3.90 4.29 2.82 2.62 3.38 3.00 1.63 2.01 1.93 512 16.82 12.17 6.26 3.77 4.61 4.26 2.59 2.00 2.36 1024 5.80 3.40 3.23 2.59 2.52 2.52 3.41 2.59 2.61 4096 2.58 2.68 1.65 1.79 1.64 1.61 2.38 2.29 2.70 8192 3.56 2.41 1.69 1.56 1.67 1.65 2.81 2.62 2.70 Raspbian 32b Pi 4B 8 8459 4766 13344 8303 4768 15553 7806 9926 9927 16 7142 3918 8649 7103 4094 9309 7899 10086 10056 32 7969 4490 10339 7941 4532 11627 7758 10070 10048 64 8126 4602 9909 8114 4617 11069 7425 8021 8070 128 8302 4651 9623 8311 4657 10836 7374 8049 7934 256 8319 4663 9627 8360 4666 10768 7530 7922 7925 512 8088 4629 9453 8239 4650 10696 5023 7904 7949 1024 3581 3113 3618 3577 3150 3675 5358 2431 1560 2048 1338 1808 1780 1811 1832 1773 2131 950 956 4096 1881 1880 1852 1879 1664 1336 1988 984 1054 8192 1890 1901 1884 1729 1319 1367 2252 1018 1021 Max MFLOPS 1057 1192 MemSpeed Continued Below Comparison Pi 4B 64b/32b 8 1.84 2.93 0.94 1.88 2.94 1.00 1.46 0.94 0.79 16 2.20 3.58 1.47 2.22 3.47 1.68 1.49 0.94 0.78 256 1.47 2.44 1.11 1.45 2.53 1.12 1.07 1.04 1.00 512 1.39 2.32 1.08 1.33 2.40 1.08 1.60 0.79 0.82 1024 0.95 0.85 0.91 0.95 1.03 0.81 0.70 1.62 2.58 4096 0.75 0.98 1.00 0.99 1.00 1.38 1.25 2.53 2.29 8192 1.01 1.01 1.02 0.88 1.44 1.38 1.11 2.19 2.44 ===================================================================== Gentoo 64b Pi 3B+ gcc 9 Memory Reading Speed Test 64 Bit gcc 9 by Roy Longbottom Start of test Thu Sep 26 12:43:02 2019 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 8 4565 5140 7847 5439 5827 7928 6161 4288 4334 16 4445 5145 7942 5362 5829 7941 6207 4358 4310 32 4094 4853 7251 4750 5396 7250 6139 4312 4303 64 3767 4748 7008 4320 5309 6954 5461 4097 4100 128 3912 4799 7319 4442 5486 7325 5328 4133 4134 256 3838 4824 6934 4400 5426 7247 5354 3844 4010 512 2570 3661 3826 2773 3975 4912 3302 2532 3017 1024 878 2120 2228 938 2182 2239 1098 1215 1361 2048 848 1961 2046 1016 2008 2033 758 805 814 4096 856 1961 2040 1007 1984 2036 839 863 856 8192 885 1940 1956 1013 1921 1957 844 865 868 Max MFLOPS 571 1286 Gentoo 64b Pi 4B 8 13385 21854 24413 13416 23402 24404 11630 9316 9315 16 13527 22116 24712 13551 23675 24722 11800 9447 9446 32 12170 19681 21716 12164 21047 21740 11403 9511 9514 64 11402 19074 20086 11613 20057 20101 9317 8651 8663 128 11770 20334 21119 12124 21389 21087 8003 8136 8136 256 11740 20281 21115 12029 21384 21111 8098 8184 8015 512 11671 20255 20873 12058 21561 21072 7721 6684 6929 1024 2818 7728 5968 3957 7839 7831 4691 3610 3832 2048 1884 3436 3743 1880 3578 3281 2597 2717 2696 4096 1284 2399 2555 1446 3802 3625 2420 2630 2632 8192 1913 3759 3459 1937 3798 3772 2468 2482 2482 Max MFLOPS 1691 5529 Comparison 64b Pi4/3B+ 8 2.93 4.25 3.11 2.47 4.02 3.08 1.89 2.17 2.15 16 3.04 4.30 3.11 2.53 4.06 3.11 1.90 2.17 2.19 256 3.06 4.20 3.05 2.73 3.94 2.91 1.51 2.13 2.00 512 4.54 5.53 5.46 4.35 5.42 4.29 2.34 2.64 2.30 1024 3.21 3.65 2.68 4.22 3.59 3.50 4.27 2.97 2.82 4096 1.50 1.22 1.25 1.44 1.92 1.78 2.88 3.05 3.07 8192 2.16 1.94 1.77 1.91 1.98 1.93 2.92 2.87 2.86 Comparison Pi4B gcc 9/6 8 0.86 1.56 1.95 0.86 1.67 1.57 1.02 1.00 1.19 16 0.86 1.57 1.94 0.86 1.67 1.58 1.00 1.00 1.20 256 0.96 1.78 1.97 0.99 1.81 1.75 1.00 1.00 1.02 512 1.04 1.89 2.05 1.10 1.93 1.82 0.96 1.06 1.06 1024 0.83 2.93 1.82 1.17 2.42 2.63 1.25 0.91 0.95 4096 0.91 1.30 1.37 0.78 2.28 1.97 0.97 1.06 1.09 8192 1.00 1.96 1.80 1.27 2.00 1.99 0.99 1.11 1.00 NeonSpeed Benchmark below or Go To Start |
Unlike running the same programs on the Pi 3B+, using the Pi 4, compiled codes were no longer slower than those produced via Intrinsic Functions. This lead to performance gains of up to over five times.
Except using L1 cache based data, performance was essentially the same using 32 bit and 64 bit benchmarks.
With the gcc 9 compilation, the Pi 4B continued to be significantly faster than the 3B+. Comparing Pi 4B gcc 9 and 6 results, performance was essentially the same when NEON Intrinsic Functions were used, but, as with MemSpeed, normal compilations were faster, averaging around 80% faster, in this case.
NEON Speed Test armv8 64 Bit V 1.0 Fri Aug 16 2019 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int Gentoo 64b Pi 3B+ 16 2715 5110 3945 4826 5426 5598 32 2528 4326 3569 4191 4596 4661 64 2491 4153 3494 4068 4407 4429 128 2537 4228 3583 4120 4461 4473 256 2526 4265 3614 4140 4480 4514 512 1917 2830 2545 2579 2896 2964 1024 1166 1299 1152 1257 1205 1229 4096 1022 1135 1132 1122 1130 1100 16384 1080 1026 1131 1016 1064 1094 65536 996 1120 1061 831 1110 1069 Gentoo 64b Pi 4B 16 13982 16424 12505 15239 16065 17193 32 9554 10753 8981 9657 10970 11025 64 10658 11833 10274 10722 12110 12134 128 10657 11887 10337 10680 11994 11973 256 10709 11970 10360 10774 12003 12083 512 10147 11441 9733 10209 11264 11532 1024 2964 3222 2876 3216 3270 2942 4096 1734 1712 1729 1772 1586 1728 16384 1592 1922 1818 1923 1926 1667 65536 1970 1736 1997 1747 1884 2021 Comparison 64b Pi4/3B+ 16 5.15 3.21 3.17 3.16 2.96 3.07 256 4.24 2.81 2.87 2.60 2.68 2.68 512 5.29 4.04 3.82 3.96 3.89 3.89 65536 1.98 1.55 1.88 2.10 1.70 1.89 Raspbian 32b Pi 4B 16 9677 10072 8905 9358 9776 10473 32 10149 10330 9364 9539 9988 10543 64 10948 11708 10466 10568 11318 11994 128 10484 11232 10410 10104 11200 11792 256 10509 11369 10428 10264 11273 11842 512 10406 11066 10134 10054 11075 11467 1024 3069 3202 3159 3166 3204 3203 4096 1721 1910 1908 1882 1903 1900 16384 2023 2009 2008 1965 2032 2013 65536 2073 2074 2074 2073 2068 2064 Comparison Pi 4B 64b/32b 16 1.44 1.63 1.40 1.63 1.64 1.64 256 1.02 1.05 0.99 1.05 1.06 1.02 512 0.98 1.03 0.96 1.02 1.02 1.01 65536 0.95 0.84 0.96 0.84 0.91 0.98 NeonSpeed Continued Below ===================================================================== Gentoo 64b Pi 3B+ gcc 9 NEON Speed Test 64 Bit gcc 9 Thu Sep 26 12:45:07 2019 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 5118 5461 6218 5298 6024 6011 32 4894 4980 5886 4855 5431 5445 64 4713 4557 5669 4452 4868 4867 128 4824 4703 5814 4598 4995 4946 256 4857 4750 5815 4643 5028 4964 512 3694 2652 4265 2675 3003 3007 1024 2085 1135 2204 1132 1128 1077 4096 2008 1021 2070 1033 1056 1036 16384 1912 1061 2042 958 1065 1047 65536 1783 1062 1873 769 1080 1081 Gentoo 64b Pi 4B 16 21046 14555 16698 13502 14565 16970 32 17797 12061 14509 10785 12282 13112 64 19517 10860 15252 9981 10793 11419 128 19839 10936 15468 10120 11001 11579 256 20094 10838 15603 10229 10885 11566 512 20076 10846 15469 10185 10943 11667 1024 7016 3040 6826 3211 3417 3548 4096 3945 1940 3599 1950 1768 1937 16384 3394 2017 3386 1963 1848 2014 65536 3484 2043 3839 1765 2060 2049 Comparison 64b Pi4/3B+ 16 4.11 2.67 2.69 2.55 2.42 2.82 32 3.64 2.42 2.47 2.22 2.26 2.41 64 4.14 2.38 2.69 2.24 2.22 2.35 128 4.11 2.33 2.66 2.20 2.20 2.34 256 4.14 2.28 2.68 2.20 2.16 2.33 512 5.43 4.09 3.63 3.81 3.64 3.88 1024 3.36 2.68 3.10 2.84 3.03 3.29 4096 1.96 1.90 1.74 1.89 1.67 1.87 16384 1.78 1.90 1.66 2.05 1.74 1.92 65536 1.95 1.92 2.05 2.30 1.91 1.90 Comparison Pi4B gcc 9/6 16 1.51 0.89 1.34 0.89 0.91 0.99 32 1.86 1.12 1.62 1.12 1.12 1.19 64 1.83 0.92 1.48 0.93 0.89 0.94 128 1.86 0.92 1.50 0.95 0.92 0.97 256 1.88 0.91 1.51 0.95 0.91 0.96 512 1.98 0.95 1.59 1.00 0.97 1.01 1024 2.37 0.94 2.37 1.00 1.04 1.21 4096 2.28 1.13 2.08 1.10 1.11 1.12 16384 2.13 1.05 1.86 1.02 0.96 1.21 65536 1.77 1.18 1.92 1.01 1.09 1.01 Average 1.95 1.00 1.73 1.00 0.99 1.06 MultiThreading Benchmarks below or Go To Start |
As with the single core version, average Pi 4 MWIPS performance gain, over the Pi 3B+, was just over 2 times, but more similar compared with 32 bit speed, this time the latter being somewhat faster on some floating point calculations.
Most of the important Pi 4B gcc 9 results were virtually the same as those from the earlier gcc 6 compilations but the 3B+ COS and EXP speeds were somewhat slower.
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal Threads 1 2 3 MOPS MOPS MOPS MOPS MOPS Gentoo RPi 3B+ 64 Bit 1 1152 383 383 328 23.2 13.0 N/A 2721 1365 2 2312 767 767 657 46.5 26.0 N/A 5461 2738 4 4580 1506 1526 1304 92.0 51.6 N/A 10777 5449 8 4788 1815 1961 1382 95.0 53.3 N/A 13827 5811 Overall Seconds 4.96 1T, 4.95 2T, 5.05 4T, 10.07 8T Gentoo RPi 4B 64 Bit 1 2395 536 538 397 60.8 39.0 N/A 4483 997 2 4784 1062 1079 794 121.2 77.9 N/A 8932 1990 4 9476 2125 2080 1568 240.8 155.3 N/A 17718 3962 8 9834 2631 2744 1630 243.6 160.1 N/A 22265 4053 Overall Seconds 4.99 1T, 5.01 2T, 5.12 4T, 10.17 8T Comparison 64b Pi4/3B+ 1 2.08 1.40 1.41 1.21 2.62 3.00 N/A 1.65 0.73 2 2.07 1.39 1.41 1.21 2.61 3.00 N/A 1.64 0.73 4 2.07 1.41 1.36 1.20 2.62 3.01 N/A 1.64 0.73 8 2.05 1.45 1.40 1.18 2.56 3.00 N/A 1.61 0.70 Raspbian RPi 4B 32 Bit 1 2059 673 680 311 55.6 33.1 7462 2245 995 2 4117 1342 1391 624 110.7 65.9 14887 4467 1986 4 7910 2652 2722 1180 208.5 132.6 29291 8952 3832 8 8652 3057 2971 1268 233.2 149.6 38368 11923 3942 Overall Seconds 4.99 1T, 5.01 2T, 5.29 4T, 10.71 8T Comparison Pi 4B 64b/32b 1 1.16 0.80 0.79 1.28 1.09 1.18 N/A 2.00 1.00 2 1.16 0.79 0.78 1.27 1.09 1.18 N/A 2.00 1.00 4 1.20 0.80 0.76 1.33 1.15 1.17 N/A 1.98 1.03 8 1.14 0.86 0.92 1.28 1.04 1.07 N/A 1.87 1.03 MP-Whetstone Continued Below =========================================================================== MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal Threads 1 2 3 MOPS MOPS MOPS MOPS MOPS Gentoo 64b Pi 3B+ gcc 9 1 1500 381 384 328 27.2 28.1 5098 2049 1368 2 3001 766 762 656 54.5 56.5 10130 4102 2737 4 5940 1488 1528 1304 107.8 111.5 19741 7665 5423 8 5987 1528 1666 1267 107.4 117.9 25862 9518 5666 Overall Seconds 4.98 1T, 4.98 2T, 5.16 4T, 10.30 8T Gentoo 64b Pi 4B gcc 9 1 2364 530 532 395 60.6 40.0 7426 2242 996 2 4724 1060 1052 789 121.0 80.4 14853 4476 1994 4 9413 2103 2112 1579 241.0 159.5 29161 8638 3968 8 9848 2671 2453 1644 247.0 168.1 37385 11636 4108 Overall Seconds 5.00 1T, 5.01 2T, 5.07 4T, 10.20 8T Comparison 64b Pi4/3B+ 1 1.58 1.39 1.38 1.20 2.23 1.42 1.46 1.09 0.73 2 1.57 1.38 1.38 1.20 2.22 1.42 1.47 1.09 0.73 4 1.58 1.41 1.38 1.21 2.24 1.43 1.48 1.13 0.73 8 1.64 1.75 1.47 1.30 2.30 1.43 1.45 1.22 0.72 Comparison Pi4B gcc 9/6 1 0.99 0.99 0.99 1.00 1.00 1.03 N/A 0.50 1.00 2 0.99 1.00 0.97 0.99 1.00 1.03 N/A 0.50 1.00 4 0.99 0.99 1.02 1.01 1.00 1.03 N/A 0.49 1.00 8 1.00 1.02 0.89 1.01 1.01 1.05 N/A 0.52 1.01 |
The single thread speeds were similar to the earlier Dhrystone results, with RPi 4B ratings around twice as fast as those for the Pi 3B+. The single thread Pi 4B 64 bit/32 bit speed ratio was also similar to that during the single core tests.
As indicated for the earlier gcc 6 results, this benchmark produces inconsistent performance and does not provide a good example of multithreading but, in this case, gcc 6 and gcc 9 results were similar, with a reasonably high Pi 4B/3B+ performance gain.
Example Results Log File MP-Dhrystone Benchmark 64 Bit gcc 9 Thu Sep 26 11:46:22 2019 Using 1, 2, 4 and 8 Threads Threads 1 2 4 8 Seconds 0.55 1.19 2.31 4.57 Dhrystones per Second 14579147 13499628 13827400 14017880 VAX MIPS rating 8298 7683 7870 7978 Internal pass count correct all threads End of test Thu Sep 26 11:46:31 2019 ############################################################# Comparisons Threads 1 2 4 8 VAX MIPS rating Pi 3B+ 6 4207 6804 7401 7415 VAX MIPS rating Pi 4B 64 8880 7828 8303 8314 VAX MIPS rating Pi 4B 32 5539 5739 6735 7232 Pi 4B/3B+ 64 bits 2.11 1.15 1.12 1.12 Pi 4B 64 bits/32 bits 1.60 1.36 1.23 1.15 ======================================================= Gentoo gcc 9 VAX MIPS rating Pi 3B+ 6 4062 6504 8242 8343 VAX MIPS rating Pi 4B 64 8298 7683 7870 7978 Pi 4B/3B+ 64 bits 2.04 1.18 0.95 0.96 Pi 4B gcc 9/6 0.93 0.98 0.95 0.96 |
This benchmark uses the same NEON Intrinsic Functions as the single core program, with similar speeds at N = 100, without the threading overheads, but decreasing with larger data sizes, involving RAM accesses.
The full logged output is shown for the first entry, to demonstrate error checking facilities. The sumchecks were identical from the Pi 3B+ and Pi 4B at Gentoo 64 bits, but those from the Raspbian 32 bit test were different, as shown below. Ignoring the slow threaded results, performance ratios of CPU speed limited tests were similar to the single core version.
At least for the unthreaded tests, the gcc 9 results for the Pi 4B were mainly within 10% of those from gcc 6.
Example Results Log File Linpack Single Precision MultiThreaded Benchmark 64 Bit NEON Intrinsics, Fri Aug 23 00:45:54 2019 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 642.56 66.69 66.05 65.54 N 500 479.48 274.36 274.85 269.07 N 1000 363.77 316.17 310.37 316.71 NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1 N 100 500 1000 NR 1.97 5.40 13.51 RE 4.69621336e-05 6.44138840e-04 3.22485110e-03 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07 X0 -1.31130219e-05 5.79357147e-05 -3.08930874e-04 XN -1.30534172e-05 3.51667404e-05 1.90019608e-04 Thread 0 - 4 Same Results Same Results Same Results #################################################### Comparisons Threads None 1 2 4 Gentoo Pi 3B+ 64 Bits N 100 642.56 66.69 66.05 65.54 N 500 479.48 274.36 274.85 269.07 N 1000 363.77 316.17 310.37 316.71 Gentoo 64b Pi 4B N 100 2252.7 97.3 97.4 97.4 N 500 1628.2 665.2 646.6 674.4 N 1000 399.9 406.8 405.8 399.5 Comparison 64b Pi4/3B+ N 100 3.51 1.46 1.48 1.49 N 500 3.40 2.42 2.35 2.51 N 1000 1.10 1.29 1.31 1.26 Raspbian 32b Pi 4B N 100 1921.5 108.7 101.9 102.5 N 500 1548.8 530.2 714.4 733.1 N 1000 399.9 378.1 364.8 398.2 Comparison Pi 4B 64b/32b N 100 1.17 0.89 0.96 0.95 N 500 1.05 1.25 0.91 0.92 N 1000 1.00 1.08 1.11 1.00 MP SP NEON Linpack Continued Below ======================================== gcc 9 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 Gentoo 64b Pi 3B+ gcc 9 N 100 641.6 63.0 62.3 61.9 N 500 326.6 229.3 222.6 227.0 N 1000 320.1 275.0 274.3 275.2 Gentoo 64b Pi 4B gcc 9 N 100 2076.2 98.6 96.6 96.2 N 500 1327.1 631.9 632.5 639.2 N 1000 394.6 375.3 382.3 375.7 Comparison 64b Pi4/3B+ N 100 3.24 1.57 1.55 1.55 N 500 4.06 2.76 2.84 2.82 N 1000 1.23 1.36 1.39 1.37 Comparison Pi4B gcc 9/6 N 100 0.92 1.01 0.99 0.99 N 500 0.82 0.95 0.98 0.95 N 1000 0.99 0.92 0.94 0.94 #################################################### 32 bit numeric results N 100 500 1000 NR 2.17 5.42 9.50 RE 5.16722466e-05 6.46698638e-04 2.26586126e-03 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07 X0 -2.38418579e-07 -5.54323196e-05 -1.26898289e-04 XN -5.06639481e-06 -4.70876694e-06 1.41978264e-04 |
Comparisons are provided for RdAll, at 1, 2 and 4 threads. Pi 4B/3B+ performance ratios were similar to that for the single core tests. There was an exception with two threads, on the Pi 4, using RAM at 64 bits, probably due to caching effects and not seen on subsequent repeated tests.
Particularly note that performance was significantly better using the 32 bit Raspbian compiler. Below are examples of disassembly, showing that Pi 4 code employed scalar operation, using 32 bit w registers, with the 3B benefiting from using 128 bit q registers, for Single Instruction Multiple Data (SIMD) operation. Compile options are included below, where alternative were also tried on the Pi 4B, but failed to implement SIMD operation.
At least, most of the gcc 9 read all compiled tests were significantly faster than those produced by gcc 6.
MP-BusSpd armv8 64 Bit Fri Aug 23 00:47:43 2019 MB/Second Reading Data, 1, 2, 4 and 8 Threads Gentoo 64b Pi 3B+ KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3138 2822 3044 2383 1708 1737 2T 5354 4865 5647 4519 3303 3361 4T 7922 7504 9717 6794 6216 6597 8T 5125 4159 6987 6696 5350 5195 122.9 1T 640 666 1191 1864 1627 1712 2T 1008 1018 1926 3496 3268 3387 4T 962 1042 2157 4259 6427 4372 8T 1031 1047 2147 3952 6317 6514 12288 1T 124 114 260 527 1016 1363 2T 137 138 275 487 946 2182 4T 105 118 240 409 975 2158 8T 108 117 236 504 1077 2051 Gentoo 64b Pi 4B RdAll 4B/3B+ 12.3 1T 4864 4879 5378 4379 4115 4221 2.43 2T 8159 6924 9179 8006 7689 7837 2.33 4T 12677 11531 14850 12554 13807 14794 2.24 8T 7398 6927 10881 11675 11497 13075 2.52 122.9 1T 665 926 1869 2714 3557 4152 2.43 2T 610 696 1549 4898 7188 8184 2.42 4T 476 865 1885 4107 8058 14617 3.34 8T 474 883 1848 3919 7939 13633 2.09 12288 1T 202 210 514 1044 2033 3616 2.65 2T 258 425 853 1551 3693 6228 2.85 4T 217 346 497 1024 2181 3789 1.76 8T 220 275 540 1030 1937 3577 1.74 Raspbian 32b Pi 4B RdAll 64b/32b 12.3 1T 5263 5637 5809 5894 5936 13445 0.31 2T 9412 10020 10567 11454 11604 24980 0.31 4T 16282 15577 16418 21222 20000 45530 0.32 8T 11600 13285 16070 18579 20593 36837 0.35 122.9 1T 739 956 1888 3153 5008 9527 0.44 2T 629 1158 1568 5058 9509 16489 0.50 4T 600 1093 2134 4527 8732 16816 0.87 8T 593 1104 2121 4382 8629 17158 0.79 12288 1T 238 258 518 1005 2001 4029 0.90 2T 278 228 453 1690 1826 3628 1.72 4T 269 257 740 1019 1790 4145 0.91 8T 233 292 532 926 2186 3581 1.00 MP-BusSpeed Continued Below =================================================================== MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll Gentoo 64b Pi 3B+ gcc 9 12.3 1T 3453 4178 4428 3543 3584 2335 2T 5594 7732 8086 6856 6924 4654 4T 9065 12522 13157 12942 13415 9209 8T 6661 10770 13266 11955 12573 8478 122.9 1T 640 646 1197 1970 2909 2272 2T 1030 1012 2006 3671 5784 4528 4T 1001 1041 2145 4266 8337 6729 8T 1043 1061 2123 4005 8133 8572 12288 1T 114 104 241 444 932 1352 2T 126 122 253 370 1005 1997 4T 104 138 197 471 1133 1745 8T 102 96 231 466 796 1893 Gentoo 64b Pi 4B gcc 9 RdAll Pi 4B 4B/3B+ gcc 9/6 12.3 1T 5573 5750 5057 5646 5800 9129 3.91 2.16 2T 7191 9038 10035 11020 11125 17757 3.82 2.27 4T 7023 12144 14591 17681 20490 29184 3.17 1.97 8T 7553 11837 12565 15640 18546 30517 3.60 2.33 122.9 1T 672 922 1864 3092 4744 7741 3.41 1.86 2T 577 947 2100 3051 8780 14975 3.31 1.83 4T 519 983 1884 3980 8701 18139 2.70 1.24 8T 515 951 1913 4181 8797 16899 1.97 1.24 12288 1T 230 261 499 1016 1678 3873 2.86 1.07 2T 276 225 418 925 1929 5629 2.82 0.90 4T 258 267 579 802 1749 5758 3.30 1.52 8T 214 213 538 1069 2145 4680 2.47 1.31 |
Source Code 64 AND instructions in main loop for (i=start; i<end; i=i+64) { andsum1[t] = andsum1[t] & array[i ] & array[i+1 ] & array[i+2 ] & array[i+3 ] & array[i+4 ] & array[i+5 ] & array[i+6 ] & array[i+7 ] To & array[i+56] & array[i+57] & array[i+58] & array[i+59] & array[i+60] & array[i+61] & array[i+62] & array[i+63]; } Pi 32 Bit Raspbian Compile gcc mpbusspd2.c cpuidc.c -lpthread -lm -lrt -O3 -mcpu=cortex-a7 -mfloat-abi=hard -mfpu=neon-vfpv4 -o MP-BusSpd2PiA7 Pi 64 Bit Gentoo Compile gcc mpbusspd2.c -lpthread -lm -lrt -O3 -march=armv8-a -no-pie -o MP-BusSpd2Pi64g9 Parameters also tried -march=armv8-a+crc -mtune=cortex-a72 -ftree-vectorize -O2 -pipe -fomit-frame-pointer" Pi 32 Bit Disassembly Pi 64 Bit Disassembly vld1.32 {q6}, [lr] ldp w30, w17, [x0, 52] vld1.32 {q7}, [r6] and w18, w18, w30 vand q10, q10, q6 and w1, w1, w18 vld1.32 {q6}, [r0] ldp w18, w30, [x0, 60] vand q9, q9, q7 and w17, w17, w18 vand q12, q12, q6 and w1, w1, w17 vld1.32 {q7}, [ip] ldp w17, w18, [x0, 68] vld1.32 {q6}, [r7] and w30, w30, w17 add r1, r3, #96 and w1, w1, w30 add r6, r3, #144 ldp w30, w17, [x0, 76] vand q11, q11, q7 and w18, w18, w30 vand q14, q14, q6 and w1, w1, w18 vld1.32 {q7}, [r1] ldp w18, w30, [x0, 84] vld1.32 {q6}, [r6] and w17, w17, w18 |
Pi 4B provided variable gains over the Pi 3B+, at 64 bits but less on the Pi 4B, from 64 bits over 32 bits.
Some moderate Pi4/3B+ performance gains were produced using gcc 9, but the older version was, possibly, a little faster.
MB/Second Using 1, 2, 4 and 8 Threads Serial Serial Random Random Serial Serial Random Random KB+Thread Read RdWr Read RdWr Read RdWr Read RdWr Gentoo Pi 4B 64 Bits 12.3 1T 5922 7871 5892 7857 2T 11856 7882 11902 7923 4T 22964 7821 22276 7832 8T 23225 7751 22082 7717 122.9 1T 5827 7276 2052 1921 2T 10965 7258 1754 1924 4T 10969 7232 1848 1929 8T 10896 7158 1834 1909 12288 1T 3879 1052 188 170 2T 4848 935 218 168 4T 4684 943 332 170 8T 3982 1049 340 171 Gentoo Pi 3B+ 64 Bits Raspbian Pi 4B 32 Bits 12.3 1T 4901 3587 4912 3585 5860 7905 5927 7657 2T 8749 3564 8719 3556 11747 7908 11182 7746 4T 17108 3504 17160 3505 21416 7626 17382 7731 8T 16885 3475 16650 3485 20649 7528 20431 7378 122.9 1T 3921 3339 1010 974 5479 7269 1826 1923 2T 7360 3350 1814 972 10355 6964 1667 1920 4T 12199 3313 2281 969 9808 7177 1715 1908 8T 12089 3313 2279 968 11677 7058 1697 1919 12288 1T 2024 828 83 67 3438 1271 179 152 2T 2169 820 142 67 4176 1204 213 167 4T 2178 818 154 67 4227 1117 337 161 8T 2219 821 161 67 3479 1093 287 168 4 Thread Pi 4B/3B+ 64 Bits 4 Thread Pi 4B 64 bits/32 bits 12.3 4T 1.34 2.23 1.30 2.23 1.07 1.03 1.28 1.01 122.9 4T 0.90 2.18 0.81 1.99 1.12 1.01 1.08 1.01 12288 4T 2.15 1.15 2.16 2.54 1.11 0.84 0.99 1.06 =================================================================== MB/Second Using 1, 2, 4 and 8 Threads Serial Serial Random Random Serial Serial Random Random KB+Thread Read RdWr Read RdWr Read RdWr Read RdWr Gentoo 64b Pi 3B+ gcc 9 Gentoo 64b Pi 4B gcc 9 12.3 1T 4886 3581 4878 3590 5737 6884 5763 7537 2T 8723 3550 8724 3550 11536 7592 10238 6898 4T 16836 3498 17531 3509 21084 7575 15160 7390 8T 15777 3459 16783 3466 20089 7339 15311 7200 122.9 1T 3913 3346 987 972 5739 7231 2006 1906 2T 7285 3339 1753 964 10662 7217 1742 1896 4T 12354 3344 2350 972 10376 6741 1815 1812 8T 11841 3333 2300 962 10298 6937 1823 1848 12288 1T 1795 761 69 60 3477 905 181 162 2T 1915 735 118 60 3750 794 215 164 4T 2452 730 128 59 4669 968 259 162 8T 1805 755 137 60 3419 981 301 157 4 Thread 4 Thread Comparison 64b Pi4/3B+ Comparison Pi4B gcc 9/6 12.3 4T 1.25 2.17 0.86 2.11 0.92 0.97 0.68 0.94 122.9 4T 0.84 2.02 0.77 1.86 0.95 0.93 0.98 0.94 12288 4T 1.90 1.33 2.02 2.75 1.00 1.03 0.78 0.95 |
There can be wide variations in speeds, affected by the short running times and such as cached data variations. In order to help in interpreting results, comparisons are provided of results using one and four threads. These indicate that, with cache based data, the Pi 4B was more than 3.5 times faster than the Pi 3B+ at two operations per word, but less so at 32 operations.
The 64 bit and 32 bit comparisons were, no doubt, influenced by the particular compiler version used, and this is reflected in the main disassembled code shown below, for 32 operations per word. The 32 bit version compile included -mfpu=neon-vfpv4, but NEON was not implemented, resulting in scalar operation, using single word s registers. I have another version with compile including -funsafe-math-optimizations, that compiles NEON instructions, with similar performance as the 64 bit version, but more sumcheck differences.
The benchmark compiled to use NEON Intrinsic Functions does not include any that specify fused multiply and add operations, reducing maximum possible speed. The 64 bit compiler converts the functions to include fused instructions, providing the fastest speeds.
The main compiler independent feature that provides a clear advantage to 64 bit operation is that the CPU, at 32 bits, does not support double precision SIMD (NEON) operation, with single word d registers being compiled. On the other hand, performance gain does not appear to be meet the potential. This suggests that there are other limiting factors - see disassembly below.
It is difficult to judge relative gcc 9 and 6 performance, probably due to the short running times. The former appears to be more than 10% faster, running the single precision tests. For these, the disassembled instructions look the same as those shown below, but in a different sequence.
Single Precision MP-MFLOPS armv8 64Bit Thu Aug 22 19:50:10 2019 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 ---- Gentoo Pi 4B 64 Bits MFLOPS --- 1T 2908 2854 459 5778 5734 5405 2T 5700 5311 457 10935 11212 7968 4T 10375 5588 490 18181 21842 7637 8T 9675 8460 511 20128 20567 8568 --- Gentoo Pi 3B+ 64 Bits MFLOPS --- -- Raspbian Pi 4B 32 Bits MFLOPS - 1T 792 806 373 1780 1783 1724 987 993 606 2816 2794 2804 2T 1482 1596 382 3542 3509 3380 1823 1837 567 5610 5541 5497 4T 2861 2742 429 5849 7013 5465 2119 3349 647 9884 10702 9081 8T 2770 2877 429 6434 6700 6101 3136 3783 609 10230 10504 9240 Comparisons --------- Pi 4B/3B+ 64 Bits -------- ------ Pi 4B 64 bits/32 bits ----- 1T 3.67 3.54 1.23 3.25 3.22 3.14 2.95 2.87 0.76 2.05 2.05 1.93 2T 3.85 3.33 1.20 3.09 3.20 2.36 3.13 2.89 0.81 1.95 2.02 1.45 4T 3.63 2.04 1.14 3.11 3.11 1.40 4.90 1.67 0.76 1.84 2.04 0.84 MP-MFLOPS Continued Below =========================================================================== MP-MFLOPS 64 Bit gcc 9 Thu Sep 26 12:36:54 2019 FPU Add & Multiply using 1, 2, 4 and 8 Threads Gentoo 64b Pi 3B+ gcc 9 Gentoo 64b Pi 4B gcc 9 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 1T 827 805 371 3232 3157 2802 3162 3072 468 6754 6714 6340 2T 1608 1567 360 6420 6423 5286 6498 6029 496 13329 12397 7623 4T 1764 3142 400 11240 12355 6029 11709 6141 529 24825 25055 8723 8T 2548 2575 381 10813 11755 5827 10828 8158 493 19452 22190 8426 Comparisons ........... 64b Pi4/3B+ .......... .......... Pi4B gcc 9/6 .......... 1T 3.82 3.82 1.26 2.09 2.13 2.26 1.09 1.08 1.02 1.17 1.17 1.17 2T 4.04 3.85 1.38 2.08 1.93 1.44 1.14 1.14 1.09 1.22 1.11 0.96 4T 6.64 1.95 1.32 2.21 2.03 1.45 1.13 1.10 1.08 1.37 1.15 1.14 ########################################################################### Double Precision MP-MFLOPS armv8 64Bit Double Precision Thu Aug 22 19:51:42 2019 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 ---- Gentoo Pi 4B 64 Bits MFLOPS --- 1T 1464 1386 225 3398 3386 3182 2T 2837 2792 228 6720 6741 4547 4T 5172 3414 251 10405 12762 4763 8T 4774 4353 275 11506 12118 4865 --- Gentoo Pi 3B+ 64 Bits MFLOPS --- -- Raspbian Pi 4B 32 Bits MFLOPS - 1T 415 386 206 1400 1403 1333 1187 1220 309 2682 2714 2701 2T 820 813 209 2804 2767 2597 2420 2416 282 5379 5415 4780 4T 1328 1323 212 5433 5340 2465 4665 2381 317 10256 10336 5242 8T 1343 1308 214 5090 5006 3280 4385 3114 310 9721 10340 5131 Comparisons --------- Pi 4B/3B+ 64 Bits -------- ------ Pi 4B 64 bits/32 bits ----- 1T 3.53 3.59 1.09 2.43 2.41 2.39 1.23 1.14 0.73 1.27 1.25 1.18 2T 3.46 3.43 1.09 2.40 2.44 1.75 1.17 1.16 0.81 1.25 1.24 0.95 4T 3.89 2.58 1.18 1.92 2.39 1.93 1.11 1.43 0.79 1.01 1.23 0.91 =========================================================================== MP-MFLOPS 64 Bit gcc 9 Double Precision Thu Sep 26 22:05:10 2019 FPU Add & Multiply using 1, 2, 4 and 8 Threads ---- Gentoo 64b Pi 3B+ gcc 9 ---- ----- Gentoo 64b Pi 4B gcc 9 ---- 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 1T 384 350 127 1582 1546 1372 657 663 183 3283 3358 3169 2T 753 753 184 3109 3157 2645 3203 2690 223 6573 6353 4535 4T 1346 1330 194 4228 6099 3067 5799 3866 292 12432 12665 4906 8T 1234 1340 201 4888 5748 3190 5322 4583 269 10738 8891 4521 Comparisons ........... 64b Pi4/3B+ .......... .......... Pi4B gcc 9/6 .......... 1T 1.71 1.89 1.44 2.08 2.17 2.31 0.45 0.48 0.81 0.97 0.99 1.00 2T 4.25 3.57 1.21 2.11 2.01 1.71 1.13 0.96 0.98 0.98 0.94 1.00 4T 4.31 2.91 1.51 2.94 2.08 1.60 1.12 1.13 1.16 1.19 0.99 1.03 MP-MFLOPS Continued Below NEON Single Precision MP-MFLOPS NEON Intrinsics 64 Bit Thu Aug 22 19:52:48 2019 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 ---- Gentoo Pi 4B 64 Bits MFLOPS --- 1T 3311 3192 535 6442 6548 6198 2T 4607 6186 552 13030 13012 8468 4T 6279 5725 562 23798 24128 9374 8T 7815 12044 486 22725 21712 9395 --- Gentoo Pi 3B+ 64 Bits MFLOPS -- -- Raspbian Pi 4B 32 Bits MFLOPS - 1T 830 823 406 2989 2986 2792 2491 2399 615 4325 4285 4261 2T 1575 1498 414 5981 5872 5445 5629 5520 591 8602 8463 8308 4T 2217 2650 431 11661 11644 6061 10580 5594 553 16991 16493 9124 8T 2733 3197 437 10505 10637 6708 7047 10785 513 14325 16219 8867 Comparisons --------- Pi 4B/3B+ 64 Bits -------- ------ Pi 4B 64 bits/32 bits ----- 1T 3.99 3.88 1.32 2.16 2.19 2.22 1.33 1.33 0.87 1.49 1.53 1.45 2T 2.93 4.13 1.33 2.18 2.22 1.56 0.82 1.12 0.93 1.51 1.54 1.02 4T 2.83 2.16 1.30 2.04 2.07 1.55 0.59 1.02 1.02 1.40 1.46 1.03 =========================================================================== MP-MFLOPS NEON Intrinsics 64 Bit gcc 9 Thu Sep 26 22:02:00 2019 FPU Add & Multiply using 1, 2, 4 and 8 Threads ---- Gentoo 64b Pi 3B+ gcc 9 ---- ----- Gentoo 64b Pi 4B gcc 9 ---- 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 1T 769 765 354 3009 2967 2638 1233 1313 507 6451 6428 6224 2T 1315 1324 293 5863 5990 5097 6307 4824 389 12559 12784 7612 4T 1750 2647 380 10081 11250 5748 8101 5186 531 24762 24708 7902 8T 2180 2664 392 9719 11010 6368 6782 8444 504 22598 24113 7979 ........... 64b Pi4/3B+ .......... .......... Pi4B gcc 9/6 .......... 1T 1.60 1.72 1.43 2.14 2.17 2.36 0.37 0.41 0.95 1.00 0.98 1.00 2T 4.80 3.64 1.33 2.14 2.13 1.49 1.37 0.78 0.70 0.96 0.98 0.90 4T 4.63 1.96 1.40 2.46 2.20 1.37 1.29 0.91 0.94 1.04 1.02 0.84 |
SP NEON 24.1 GFLOPS 6.55 1 core DP 12.7 GFLOPS - 3.39 1 core .L41: .L84: ldr q1, [x1] ldr q16, [x2, x0] ldr q0, [sp, 64] add w3, w3, 1 fadd v18.4s, v20.4s, v1.4s cmp w3, w6 fadd v17.4s, v22.4s, v1.4s fadd v15.2d, v16.2d, v14.2d fadd v0.4s, v0.4s, v1.4s fadd v17.2d, v16.2d, v12.2d fadd v16.4s, v24.4s, v1.4s fmul v15.2d, v15.2d, v13.2d fadd v7.4s, v26.4s, v1.4s fmls v15.2d, v17.2d, v11.2d fadd v6.4s, v28.4s, v1.4s fadd v17.2d, v16.2d, v10.2d fadd v5.4s, v30.4s, v1.4s fmla v15.2d, v17.2d, v9.2d fmul v0.4s, v0.4s, v19.4s fadd v17.2d, v16.2d, v8.2d fadd v4.4s, v10.4s, v1.4s fmls v15.2d, v17.2d, v31.2d fadd v3.4s, v12.4s, v1.4s fadd v17.2d, v16.2d, v30.2d fadd v2.4s, v14.4s, v1.4s fmla v15.2d, v17.2d, v29.2d fadd v1.4s, v8.4s, v1.4s fadd v17.2d, v16.2d, v28.2d fmls v0.4s, v21.4s, v18.4s fmls v15.2d, v17.2d, v0.2d fmla v0.4s, v23.4s, v17.4s fadd v17.2d, v16.2d, v27.2d fmls v0.4s, v25.4s, v16.4s fmla v15.2d, v17.2d, v26.2d fmla v0.4s, v27.4s, v7.4s fadd v17.2d, v16.2d, v25.2d fmls v0.4s, v29.4s, v6.4s fmls v15.2d, v17.2d, v24.2d fmla v0.4s, v31.4s, v5.4s fadd v17.2d, v16.2d, v23.2d fmls v0.4s, v9.4s, v1.4s fmla v15.2d, v17.2d, v22.2d fmla v0.4s, v4.4s, v11.4s fadd v17.2d, v16.2d, v21.2d fmls v0.4s, v3.4s, v13.4s fadd v16.2d, v16.2d, v19.2d fmla v0.4s, v2.4s, v15.4s fmls v15.2d, v17.2d, v20.2d str q0, [x1], 16 fmla v15.2d, v16.2d, v18.2d cmp x1, x0 str q15, [x2, x0] bne .L41 add x0, x0, 16 bcc .L84 32 bit 64 bit 32 bit 64 bit 32 bit 64 bit SP SP DP DP NEON SP NEON SP Maximum GFLOPS 10.7 21.8 10.3 12.7 17.0 24.1 Instructions Total 27 39 26 27 67 27 Floating point 22 32 22 32 32 22 FP operations Total 32 128 32 64 128 128 Add or subtract 11 44 11 22 21 44 Multiply 1 4 1 2 11 4 Fused 20 80 20 40 0 80 Add example fadds fadd faddd fadd vadd.f32 fadd s16, v15.4s, d25, v15.2d, q9, v1.4s, s23, v16.4s, d17, v16.2d, q8, v8.4s, s2 v15.4s d15 v14.2d q14 v1.4s Multiply example fnmuls fmul fmuld fmul vmul.f32 fmul s16, v15.4s, d16, v15.2d, q9, v0.4s, s3, v15.4s, d16, v15.2d, q9, v0.4s, s16 v17.4s d5 v13.2d q12 v19.4s Fused example vfma.f32 fmla vfma.f64 fmla N/A fmla s16, v15.4s, d16, v15.2d, v0.4s, s29, v17.4s, d22, v17.2d, v4.4s, s9 v0.4s d28 v22.2d v11.4s FP registers used 32 4 32 25 16 32 |
2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 SP 4B/64 1T 76406 97075 99969 66015 95363 99951 3B/64 1T 76406 97075 99969 66015 95363 99951 4B/32 1T 76406 97075 99969 66015 95363 99951 DP 4B/64 1T 76384 97072 99969 66065 95370 99951 3B/64 1T 76384 97072 99969 66065 95370 99951 4B/32 1T 76384 97072 99969 66065 95370 99951 NEON Bit SP 4B/64 1T 76406 97075 99969 66015 95363 99951 3B/64 1T 76406 97075 99969 66015 95363 99951 4B/32 1T 76406 97075 99969 66014-X 95363 99951 |
Following is an example of full output. The strange test names were carried forward from a 2014 CUDA benchmark, via Windows and Linux Intel CPU versions. Details are in the following GigaFLOPS Benchmarks report, covering MP-MFLOPS, QPAR and OpenMP. This showed nearly 100 GFLOPS from a Core i7 CPU and 400 GFLOPS from a GeForce GTX 650 graphics card, via CUDA. See GigaFLOPS Benchmarks.htm.
The detail is followed by MFLOPS results on Pi 3B+ and Pi 4B. The direct conversions of the code from large systems lead to excessive memory demands for Raspberry Pi systems, with too many tests dependent on RAM speed, and low MP performance gains. There were glimpses of the usual performance gains an a maximum of over 20 SP GFLOPS on a 64 bit Pi 4B.
The Pi 4B gcc 9/6 performance ratios indicate no real advantage of either compilation, except the results indicate 24.7 SP GFLOPS using gcc 9.
Gentoo 64b Pi 4B gcc 9 OpenMP MFLOPS64g9 Thu Sep 26 16:51:07 2019 Test 4 Byte Ops/ Repeat Seconds MFLOPS First All Words Word Passes Results Same Data in & out 100000 2 2500 0.124228 4025 0.929538 Yes Data in & out 1000000 2 250 0.842066 594 0.992550 Yes Data in & out 10000000 2 25 0.873622 572 0.999250 Yes Data in & out 100000 8 2500 0.147889 13524 0.957117 Yes Data in & out 1000000 8 250 0.904478 2211 0.995518 Yes Data in & out 10000000 8 25 0.951405 2102 0.999549 Yes Data in & out 100000 32 2500 0.324246 24673 0.890215 Yes Data in & out 1000000 32 250 1.097993 7286 0.988088 Yes Data in & out 10000000 32 25 1.045087 7655 0.998796 Yes --------- gcc 9 --------- Mbytes/ Pi 3B+ Pi 4B Pi 4B Pi 3B+ Pi 4B Ops/W0rd 64b 64b 32b 64b 64b All 1T All 1T All 1T All 1T All 1T 0.4/2 2674 755 5386 2780 4716 2850 2341 795 4025 2236 4/2 411 404 563 557 556 429 381 362 594 403 40/2 419 408 545 588 544 632 401 387 572 493 0.4/8 7029 1886 15401 5555 7981 5191 6051 1906 13524 5373 4/8 1656 1495 2223 2116 2389 2082 1491 1352 2211 1948 40/8 1725 1507 2361 2310 2199 2003 1598 1418 2102 2308 0.4/32 6648 1699 20429 5647 8147 5449 12002 3185 24673 6786 4/32 5977 1616 8082 5445 7951 5385 5641 2809 7286 6385 40/32 6027 1616 8470 5479 8030 5379 6142 2809 7655 6415 Pi 4B gcc 9 Pi 4B 4b/3b 64/32b 4b/3b gcc 9/6 All 1T All 1T All 1T All 1T 0.4/2 2.01 3.68 1.14 0.98 1.72 2.81 0.75 0.80 4/2 1.37 1.38 1.01 1.30 1.56 1.11 1.06 0.72 40/2 1.30 1.44 1.00 0.93 1.43 1.27 1.05 0.84 0.4/8 2.19 2.95 1.93 1.07 2.24 2.82 0.88 0.97 4/8 1.34 1.42 0.93 1.02 1.48 1.44 0.99 0.92 40/8 1.37 1.53 1.07 1.15 1.32 1.63 0.89 1.00 0.4/32 3.07 3.32 2.51 1.04 2.06 2.13 1.21 1.20 4/32 1.35 3.37 1.02 1.01 1.29 2.27 0.90 1.17 40/32 1.41 3.39 1.05 1.02 1.25 2.28 0.90 1.17 |
Memory Reading Speed Test OpenMP 64 Bit gcc 9 by Roy Longbottom Start of test Thu Sep 26 22:08:22 2019 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 7616 8480 8749 7548 8520 8530 35856 18594 18601 8 8195 8660 8876 8147 5740 8365 37153 18878 18864 16 7992 7684 8189 8064 8139 8023 35774 18896 18898 32 8975 8535 8024 9048 8536 8512 37465 18392 19024 64 8622 7997 8057 8511 7953 7994 19618 16857 16701 128 11940 11637 11554 12101 11659 11498 13815 13417 13964 256 17008 17339 16359 17104 17396 17038 11877 12344 12376 512 17740 15986 18607 17522 18547 15612 12575 13616 13495 1024 7011 10208 10016 11310 5287 11413 7060 6279 10045 2048 7024 4201 7006 7017 6943 3225 2822 3386 3391 4096 3854 7002 7126 6912 7074 3985 2199 3127 3132 8192 2632 6950 7151 5291 2796 6813 2546 3091 2403 16384 7350 7073 3537 7583 5327 3200 2609 3053 1907 32768 7514 7616 7725 7807 2344 2936 2702 2559 3042 65536 7065 2937 7571 4306 7086 2975 2127 3017 2677 131072 1772 1779 2562 8092 2583 2800 2035 1866 2869 Memory Reading Speed Test notOpenMP 64 Bit gcc 9 by Roy Longbottom 4 12991 21391 23815 13044 22904 23856 11216 9060 9062 8 13380 21857 24416 13414 23420 24400 11630 9313 9312 16 13534 22119 24711 13550 23683 24718 11797 9447 9447 32 11981 19879 21566 12100 21243 21572 9552 8928 8924 64 11695 19992 20989 12044 21020 20966 9356 8613 8602 128 11824 20347 21045 12116 21217 21067 8132 8149 8178 256 11705 20247 21090 12041 21382 21013 8081 8182 5919 512 11515 20242 21155 12059 21089 20938 8093 8127 7376 1024 4504 8674 8151 4658 8682 8680 3894 3739 3887 2048 1868 3231 3636 1868 3581 3491 2639 2871 2896 4096 1921 2994 3748 1925 3781 3443 2589 2634 2636 8192 1836 3719 3695 1921 3624 3791 2603 2596 2595 16384 1951 3724 3002 1977 3838 3249 2584 2572 2384 32768 1710 3431 3427 2008 3186 3449 2545 2531 2529 65536 2030 3034 2135 2047 3035 2394 2550 2535 2546 131072 2029 2023 2024 1873 2059 1652 2378 2466 2392 |
The tables below, with speeds on the considered systems, provide average performance gains of the Pi 4B at 64 bits, somewhat limited in this case.
Gentoo Pi 4B 64 Bits MP-Integer-Test 64 Bit v1.0 Fri Sep 6 16:33:36 2019 Benchmark 1, 2, 4, 8, 16 and 32 Threads MB/second KB KB MB Same All Secs Thrds 16 160 16 Sumcheck Tests 4.3 1 7771 7352 3895 00000000 Yes 3.3 2 15467 14218 3714 FFFFFFFF Yes 3.0 4 28715 26652 3345 5A5A5A5A Yes 3.0 8 30292 26310 3334 AAAAAAAA Yes 3.0 16 29466 28503 3337 CCCCCCCC Yes 3.0 32 29351 30358 3390 0F0F0F0F Yes Pi 4B 32 bit MB/sec Pi 3B+ 64 bit MB/sec KB KB MB KB KB MB 16 160 16 16 160 16 Threads 1 5964 5756 3931 4823 3884 1209 2 11787 11430 3748 9613 7709 1908 4 23214 22060 3456 17737 15137 1779 6 22197 22171 3472 17651 18692 1767 16 22671 23299 3256 18255 18793 1757 32 21379 21881 3346 18246 18674 1748 Pi 4B 64b/32b 64b Pi 4B/3B+ Average Gain 1.31 1.25 0.99 1.63 1.67 2.13 |
These programs were written using a later compiler than those used for MP-MFLOPS, at least resulting in similar speeds between 32 bit and 64 bit versions. Typical Pi 4B/3B+ performance improvements were indicated.
Gentoo Pi 4B 64 Bits MP-Threaded-MFLOPS 64 Bit v1.0 Fri Sep 6 16:30:12 2019 Benchmark 1, 2, 4 and 8 Threads MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 1.7 T1 2 2819 2874 504 40392 76406 99700 3.2 T2 2 5592 5702 511 40392 76406 99700 4.6 T4 2 9223 7520 519 40392 76406 99700 6.0 T8 2 9520 10471 545 40392 76406 99700 8.2 T1 8 5381 5595 2050 54764 85092 99820 9.8 T2 8 11039 10883 2173 54764 85092 99820 11.3 T4 8 19087 21040 2044 54764 85092 99820 12.9 T8 8 19747 21107 2016 54764 85092 99820 17.5 T1 32 6693 6753 6377 35206 66015 99520 20.2 T2 32 13491 13464 8710 35206 66015 99520 22.2 T4 32 25732 26704 9160 35206 66015 99520 24.1 T8 32 25708 25770 8927 35206 66015 99520 End of test Fri Sep 6 16:30:37 2019 Pi 4B 32 bit Pi 3B+ 64 bit Threads KB KB MB KB KB MB Ops/wd 12.8 128 12.8 12.8 128 12.8 T1 2 2641 2607 646 838 826 373 T2 2 5089 5116 618 1659 1650 380 T4 2 8282 8522 683 2584 3296 384 T8 2 8756 9847 686 3013 3056 391 T1 8 5543 5428 2597 1981 1972 1354 T2 8 10754 10603 2711 3936 3923 1518 T4 8 18716 20823 2844 7482 7396 1531 T8 8 19859 21684 2555 7399 7705 1534 T1 32 5309 5274 5265 2820 2809 2462 T2 32 10557 10509 9991 5636 5583 4754 T4 32 20416 20919 11340 10640 10882 6020 T8 32 20072 19787 9330 10641 10926 6159 Average Pi 4B Performance Gains Ops/Word Pi 4B 64b/32b 64b Pi 4B/3B+ 2 1.09 1.04 0.79 3.37 3.16 1.36 8 1.00 1.01 0.77 2.69 2.80 1.40 32 1.27 1.29 0.96 2.40 2.41 1.85 |
Gentoo Pi 4B 64 Bits MP-Threaded-MFLOPS 64 Bit v1.0 Fri Sep 6 16:31:24 2019 Double Precision Benchmark 1, 2, 4 and 8 Threads MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 3.2 T1 2 1398 1462 285 40395 76384 99700 6.2 T2 2 2799 2807 256 40395 76384 99700 8.9 T4 2 5024 4589 257 40395 76384 99700 11.5 T8 2 5089 5545 280 40395 76384 99700 15.7 T1 8 2668 2790 1103 54805 85108 99820 18.8 T2 8 5670 5545 1158 54805 85108 99820 21.7 T4 8 10259 10011 1068 54805 85108 99820 24.7 T8 8 10239 10824 1036 54805 85108 99820 34.1 T1 32 3317 3390 3195 35159 66065 99521 39.2 T2 32 6791 6754 4753 35159 66065 99521 43.1 T4 32 12940 13200 4497 35159 66065 99521 46.9 T8 32 13200 13049 4557 35159 66065 99521 End of test Fri Sep 6 16:32:11 2019 Pi 4B 32 bit Pi 3B+ 64 bit Threads KB KB MB KB KB MB Ops/wd 12.8 128 12.8 12.8 128 12.8 T1 2 993 998 329 412 411 193 T2 2 1971 1995 309 828 824 194 T4 2 3633 3937 340 1543 1514 197 T8 2 3635 3796 339 1525 1551 196 T1 8 2378 2445 1288 980 978 696 T2 8 4770 4860 1282 1975 1964 782 T4 8 9281 9556 1210 3688 3688 781 T8 8 9119 9448 1245 3726 3689 787 T1 32 2697 2726 2708 1402 1403 1231 T2 32 5397 5446 5163 2808 2808 2399 T4 32 10689 10806 5146 5379 5413 3195 T8 32 10716 10494 4497 5450 5485 3150 Average Pi 4B Performance Gains Ops/Word Pi 4B 64b/32b 64b Pi 4B/3B+ 2 1.40 1.37 0.82 3.34 3.39 1.38 8 1.13 1.12 0.87 2.78 2.83 1.44 32 1.23 1.24 1.00 2.40 2.41 1.86 |
Initially, two versions of HPL tests were run, one accessing precompiled Basic Linear Algebra Subprograms and the other with ATLAS alternatives, that had to be built. The whole benchmark suite was produced according to instructions in the following. these instructions.
The ATLAS version was installed, as the older benchmark would not run on the Pi 4. One issue is the time required for the build, apparently due to the numerous tuning tests. Time taken was 14 hours using a Pi 3B+, then 8 hours on a Pi 4. Later, 64 bit ATLAS was built on the Pi 3B+, via Gentoo, taking 26 hours, that included extended periods swapping data with the rather slow main drive.
The procedure specified in the above was used, successfully leading to a working package. Only one change was required, this was to Make.rpi line 95 to;
Following the introduction of 64 bit Gentoo for the Pi 4B, ATLAS was again created, taking more than 10 hours. As indicated in the above links, the HPL benchmark can be a useful stress test, due to the long running time with heavy processing. It can lead to CPU MHz being throttled on the Pi 4B, producing slow GFLOPS speeds. The tests reported here were run using a Pi 4B with a cooling fan, with CPU MHz monitored to help to indicate that the processor was running at full speed.
The benchmark was run on various Raspberry Pi models, using the same parameters. An example of the main output produced is shown below. Key areas are array size parameter N, running time, GFLOPS speed rating and sumcheck (0.0010188 in this case), including whether acceptable (PASSED).
pi@raspberrypi:~/hpl-2.2/bin/rpi $ mpiexec -f nodes-1pi ./xhpl ================================================================================ HPLinpack 2.2 -- High-Performance Linpack benchmark -- February 24, 2016 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 20000 NB : 128 PMAP : Row-major process mapping P : 2 Q : 2 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words -------------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR11C2R4 20000 128 2 2 494.46 1.079e+01 HPL_pdgesv() start time Fri Oct 11 22:34:37 2019 HPL_pdgesv() end time Fri Oct 11 22:42:52 2019 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0010188 ...... PASSED ================================================================================ |
Next, the benchmark produces a sumcheck but, in the case of the ATLAS implementation, these are not consistent using the same problem size, all those shown here were indicated as PASSED (within specified tolerances). The anomaly could be produced using different CPU models or alternative compilations but, the least understandable is identified at the end of the detailed output, where the sumcheck is shown to vary on repeating the program on the same system.
Comparing Pi 4B 32 bit and 64 bit GFLOPS maximum speeds, the 32 bit version appears to be slightly faster (or the same within reasonable tolerances). Then it is not clear (to me), whether the compiled code completely embraces the difference in technology or whether external compile options should be included for the different packages involved.
Anyway, around 10 double precision GFLOPS was the maximum produced by other benchmarks, reported above.
------ Time ------ ----- GFLOPS ----- ----------- Sumcheck ---------- 4B 4B 3B+ 4B 4B 3B+ 4B 4B 3B+ N 64b 32b 64b 64b 32b 64b 64b 32b 64b 4000 5.51 5.20 14.53 7.75 8.20 2.94 0.0022808 0.0023975 0.0025857 8000 38.22 36.70 101.59 8.93 9.30 3.36 0.0017216 0.0016746 0.0017518 16000 269.26 263.00 10.14 10.40 0.0012577 0.0011258 20000 513.67 494.30 10.38 10.80 0.0009637 0.0010188 GFLOPS Comparisons 4B 64b N 64b/32b 4B/3B+ 4000 0.95 2.64 8000 0.96 2.66 16000 0.98 20000 0.96 Example Logged Results Time Gflops -------------------------------------------------------------------------------- WR11C2R4 20000 128 2 2 516.71 1.032e+01 ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0008697 ...... PASSED ================================================================================ First Run WR11C2R4 20000 128 2 2 656.89 8.120e+00 ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0009470 ...... PASSED ================================================================================ |
In order to test a USB drive, it must be mounted - plug in, right click Mount Volume or double click to open. Run df command to find the path, needed for use as a run time parameter.
Following is an example log file and the command used to run the program to test a USB 3 stick. With no MB parameter, default large file sizes are 8 and 16 MB.
############################## Pi 4B USB 3 ############################### Run command ./DriveSpeed64v2g9 MB 512 FilePath /run/media/demouser/PATRIOT ########################################################################## DriveSpeed RasPi 64 Bit 2.0 Fri Sep 13 22:25:40 2019 Selected File Path: /run/media/demouser/PATRIOT/ Total MB 120832, Free MB 119778, Used MB 1054 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 512 30.72 31.11 34.01 287.24 295.04 311.90 1024 34.66 36.11 35.45 298.87 302.38 300.26 Cached 8 42.03 39.58 38.85 1167.71 1029.35 1061.56 Random Read Write From MB 4 8 16 4 8 16 msecs 0.004 0.007 0.310 9.65 10.42 9.71 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.03 0.07 0.13 268.10 427.95 657.48 ms/file 122.73 122.28 122.22 0.02 0.02 0.02 2.557 |
For non-cached tests, in the standard version of this benchmark, the file opening handle includes the O_DIRECT option, specifying Direct I/O (no caching). The latest minor variety of this appears to work, as expected, on the 32 bit Raspbian version, on both main and USB drives. The 64 bit compilation of this indicated a failure to write to the main SD drive and a failure to read from USB flash drives. Omitting O_DIRECT, for reading, appeared to correct the latter (see above). To check this and enable main drive measurements, separate direct I/O free large file write and read only programs were produced, to follow write/reboot/read procedures. These were also necessary to indicate throughput simultaneously writing or reading two USB 3 drives.
Following are 64 bit Pi 4B SD main drive results from the separate write and read tests, followed by full results from Pi 4B with 32 bit Raspbian, using a same brand SD card. Note the similarity in writing and reading speeds of large files.
################# Main SD Drive From Write/Read Tests Below =################ Write1 Write2 Write3 Read1 Read2 Read3 Write 18.99 19.34 19.47 1337.09 1164.91 1325.96 - cached Read N/A N/A N/A 45.80 45.88 45.89 - not cached ============================== 32 Bit Results ============================== DriveSpeed RasPi 1.1 Mon Apr 29 10:20:57 2019 Current Directory Path: /home/pi/Raspberry_Pi_Benchmarks/DriveSpeed/drive1 Total MB 14845, Free MB 8198, Used MB 6646 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 16.41 11.21 12.27 39.81 40.10 40.39 16 11.79 21.10 34.05 40.18 40.19 40.33 Cached 8 137.47 156.43 285.59 580.73 598.66 587.97 Random Read Write From MB 4 8 16 4 8 16 msecs 0.371 0.371 0.363 1.28 1.53 1.30 200 File Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 3.49 6.41 8.26 7.67 11.68 17.51 ms/file 1.17 1.28 1.98 0.53 0.70 0.94 0.014 |
Following is a summary of results, indicating USB 3 large file reading speed improvements between 6.7 and 8.1 times, but disappointing writing performance, where the slower P speeds might be affected by the mysteries of updating file allocation tables, also influencing random access and dealing with lots of small files, including file delete times. USB 3 use provided little or no performance gains for the latter. Cached reading reflects RAM speed, the only area showing clear difference in performance between the Pi 3B+ and Pi 4B.
MB/second 16 MB USB 2, 1024 MB USB 3 System Drive Write1 Write2 Write3 Read1 Read2 Read3 Pi 3B+ USB 2 P 11.5 11.4 11.5 36.6 37.7 37.3 Pi 3B+ USB 2 R 15.9 16.4 13.9 37.1 40.1 39.8 Pi 4B USB 2 P 12.6 12.6 12.6 37.0 37.3 37.2 Pi 4B USB 2 R 22.6 22.9 22.9 36.5 36.3 36.5 Pi 4B USB 3 P 34.7 36.1 35.5 298.9 302.4 300.3 Pi 4B USB 3 R 48.9 44.6 53.4 249.4 248.8 246.2 Compare MB/second Pi 4B P USB 3/2 2.75 2.88 2.81 8.07 8.11 8.07 Pi 4B R USB 3/2 2.17 1.94 2.33 6.83 6.85 6.74 Cached MB/second Write1 Write2 Write3 Read1 Read2 Read3 Pi 3B+ USB 2 P 13.6 14.2 14.4 633.4 544.0 464.3 Pi 3B+ USB 2 R 13.7 14.4 19.4 623.5 661.4 557.6 Pi 4B USB 2 P 15.0 14.7 14.8 1204.0 1047.3 1066.3 Pi 4B USB 2 R 20.8 21.2 13.9 930.2 933.6 1230.3 Pi 4B USB 3 P 42.0 39.6 38.9 1167.7 1029.4 1061.6 Pi 4B USB 3 R 21.1 15.9 36.2 1103.6 944.9 981.0 Compare Pi 4B P USB 3/2 2.80 2.70 2.63 0.97 0.98 1.00 Pi 4B R USB 3/2 1.01 0.75 2.60 1.19 1.01 0.80 Random milliseconds Read Write Pi 3B+ USB 2 P 0.013 0.013 0.254 11.76 10.18 9.80 Pi 3B+ USB 2 R 0.017 0.008 0.032 1.09 1.39 11.72 Pi 4B USB 2 P 0.006 0.007 0.215 9.56 8.54 8.75 Pi 4B USB 2 R 0.009 0.005 0.016 1.35 2.12 1.34 Pi 4B USB 3 P 0.004 0.007 0.310 9.65 10.42 9.71 Pi 4B USB 3 R 0.004 0.004 0.008 1.75 0.85 0.92 Compare Pi 4B P USB 3/2 1.50 1.00 0.69 0.99 0.82 0.90 Pi 4B R USB 3/2 2.25 1.25 2.00 0.77 2.49 1.46 200 Small Files milliseconds Write Read Delete Pi 3B+ USB 2 P 134.2 128.6 129.6 0.08 0.12 0.07 3.36 Pi 3B+ USB 2 R 105.5 104.7 107.6 0.05 0.05 0.07 0.26 Pi 4B USB 2 P 125.8 125.5 125.8 0.02 0.02 0.02 3.12 Pi 4B USB 2 R 104.1 104.0 104.0 0.02 0.02 0.03 0.14 Pi 4B USB 3 P 122.7 122.3 122.2 0.02 0.02 0.02 2.56 Pi 4B USB 3 R 105.4 104.0 104.3 0.02 0.02 0.03 0.15 Compare Pi 4B P USB 3/2 1.03 1.03 1.03 1.00 1.00 1.00 1.22 Pi 4B R USB 3/2 0.99 1.00 1.00 1.00 1.00 1.00 0.95 |
Run Commands ./DriveSpeed264WR MB 1024 and ./DriveSpeed264Rd MB 1024 Current Directory Path: /home/demouser/RPi3-64-Bit-Benchmarks/IOtests/writeread Total MB 28225, Free MB 18761, Used MB 9464 1024 MB MBytes/Second Write1 Write2 Write3 Read1 Read2 Read3 Write 18.99 19.34 19.47 1337.09 1164.91 1325.96 Read N/A N/A N/A 45.80 45.88 45.89 vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 0 673848 60668 2792716 0 0 45056 0 767 1181 0 2 75 23 0 0 1 0 630228 60668 2835544 0 0 44544 0 789 1199 0 2 74 23 0 0 1 0 585204 60668 2880268 0 0 45056 0 691 1041 0 3 75 23 0 |
Run Commands ./DriveSpeed264WR MB 1024 FilePath /run/media/demouser/PATRIOT and ./DriveSpeed264Rd MB 1024 FilePath /run/media/demouser/PATRIOT Selected File Path: /run/media/demouser/PATRIOT/ Total MB 120832, Free MB 119752, Used MB 1080 1024 MB MBytes/Second Write1 Write2 Write3 Read1 Read2 Read3 Write 58.45 23.10 22.91 1368.04 1190.71 1354.84 Read N/A N/A N/A 306.18 294.93 302.91 vmstat procs -----------memory--------- ---swap-- -----io---- -system-- ------cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 256 811672 20920 2696504 0 0 305664 0 3898 6182 1 15 73 11 0 0 1 256 510852 20920 2996188 0 0 303616 0 4304 5936 1 16 72 12 0 1 0 256 239400 20920 3267636 0 0 307184 0 4512 6177 1 17 71 11 0 |
Selected File Path: /run/media/demouser/REMIX_OS/ Total MB 9017, Free MB 7485, Used MB 1532 1024 MB MBytes/Second Write1 Write2 Write3 Read1 Read2 Read3 Write 46.43 28.81 36.57 1265.07 1103.23 1236.02 Read N/A N/A N/A 172.71 172.14 176.49 vmstat procs -----------memory--------- ---swap-- -----io---- -system-- ------cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 256 111512 912 3417624 0 0 175189 0 4315 5929 1 12 71 17 0 0 1 256 169756 992 3358840 0 0 169043 0 4064 5515 1 11 71 17 0 0 1 256 177444 1068 3351176 0 0 155724 0 4088 6023 1 12 70 16 0 |
Later is a bad example, where one drive appears to be running at USB 2 speed.
Run Commands ./DriveSpeed264WR MB 512 FilePath /run/media/demouser/PATRIOT and. ./DriveSpeed264WR MB 512 FilePath /run/media/demouser/REMIX_OS and ./DriveSpeed264Rd MB 512 FilePath /run/media/demouser/PATRIOT Log 1 and ./DriveSpeed264Rd MB 512 FilePath /run/media/demouser/REMIX_OS Log 2 Write/Read Thu Sep 19 16:07:48 2019 /run/media/demouser/REMIX_OS/ Write/Read Thu Sep 19 16:07:46 2019 /run/media/demouser/PATRIOT/ 512 MB MBytes/Second Write1 Write2 Write3 Read1 Read2 Read3 R 28.72 33.89 44.69 1302.19 1131.65 1374.24 P 11.93 8.86 6.21 1232.47 1072.38 1213.36 Sep 23 17:11:21 2019 /run/media/demouser/PATRIOT/ Sep 23 17:11:20 2019 /run/media/demouser/REMIX_OS/ 512 MB MBytes/Second Write1 Write2 Write3 Read1 Read2 Read3 Seconds P N/A N/A N/A 159.78 187.44 294.23 7.7 R N/A N/A N/A 221.83 232.10 230.94 6.7+2 delayed start vmstat procs -----------memory--------- ---swap-- -----io---- -system-- ------cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 3160720 74616 296092 0 0 0 0 2031 3601 4 2 94 0 0 0 1 0 3112052 74616 342188 0 0 45552 0 1512 2257 1 3 93 4 0 0 1 0 2908004 74616 547600 0 0 206336 0 4684 7169 4 14 67 15 0 2 0 0 2531960 74616 919400 0 0 369136 0 5495 8033 4 24 47 25 0 2 0 0 2149064 74616 1303288 0 0 382960 0 5168 7007 1 21 52 26 0 1 1 0 1771492 74616 1681348 0 0 385024 0 5969 8255 1 23 49 26 0 1 1 0 1383524 74616 2068788 0 0 386016 0 5621 7926 1 21 49 29 0 0 2 0 999100 74616 2453280 0 0 383488 0 4602 6895 1 19 54 26 0 0 1 0 628988 74616 2824188 0 0 368640 0 5405 8153 2 20 56 22 0 1 0 0 310748 74624 3142732 0 0 317424 20 4622 6551 1 17 72 10 0 1 0 0 223052 73680 3231812 0 0 268288 0 2815 5012 1 18 72 10 0 0 0 0 223824 73680 3231280 0 0 32768 0 1044 2009 1 3 95 1 0 0 0 0 223824 73680 3231280 0 0 0 0 393 619 0 0 99 0 0 =============================================================================== Bad Example Write1 Write2 Write3 Read1 Read2 Read3 P N/A N/A N/A 36.37 37.72 37.48 R N/A N/A N/A 248.18 248.22 223.53 |
An example of a LanSpeed64 log file is provided below, preceded by examples of the required mount and run commands.
For further details of required procedures see
Raspberry Pi 3B+ 32 bit and 64 bit Benchmarks and stress tests.htm LAN/WiFi section.
The 64 bit results are followed by details from running the benchmark on a 32 bit system, and showing the same levels of performance, within the usual variability.
Commands sudo mount -t cifs -o dir_mode=0777,file_mode=0777 //192.168.1.68/d /media/public ./LanSpeed64 FilePath /media/public/test Log File LanSpeed RasPi 64 Bit 1.0 Thu Sep 12 22:06:06 2019 Selected File Path: /media/public/test/ Total MB 266240, Free MB 70991, Used MB 195249 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 66.13 92.09 92.76 96.36 96.85 97.30 16 80.79 93.59 94.61 103.99 104.34 104.57 Random Read Write From MB 4 8 16 4 8 16 msecs 0.004 0.009 0.435 0.95 0.92 0.93 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.37 2.45 4.77 1.37 2.49 4.92 ms/file 2.99 3.35 3.43 2.98 3.29 3.33 0.467 == ************************ 32 Bit Pi 4B ************==************ MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 67.82 12.97 90.19 99.84 93.49 96.83 16 92.25 92.66 92.96 103.9 105.28 91.17 Random Read Write From MB 4 8 16 4 8 16 msecs 0.007 0.01 0.04 1.01 0.85 0.91 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.47 2.8 5.14 2.47 4.71 8.61 ms/file 2.78 2.92 3.19 1.66 1.74 1.90 0.256 |
I changed the hub settings to provide separate 2.4 and 5 GHz hub address selections, with 72 and 180 Mbits/second being indicated, respectively. These sort of numbers were confirmed on my Smartphone, but variable. The 64 bit version would not connect to the network at 5 GHz, unlike the 32 bit program, for example, obtaining 15 MB/second writing and 8 MB/second reading. these differences could be, I suppose, due to program, software and/or hub incompatibility.
Random access times appeared to be quite similar on all WiFi tests, with faster but variable comparative times via LAN. There were similar relationships on dealing with numerous small files.
Some results from running the 32 bit benchmark on a Pi 4B are provided. Performance there was also erratic, these speeds representing best case measurements, reading large files somewhat faster than those achieved at 64 bits.
Large Files MB/second System MB Write1 Write2 Write3 Read1 Read2 Read3 PC WiFi 16 4.08 4.16 4.11 2.34 1.68 1.30 PC LAN 16 106.11 106.11 105.89 50.67 33.86 25.47 LAN 3B+ 16 28.63 29.03 28.96 22.18 32.28 32.61 3B+ WiFi 16 11.15 11.00 10.76 4.01 3.89 3.09 4B WiFi1 16 6.43 6.39 6.47 4.33 4.13 4.86 4B WiFi2 16 13.26 13.34 13.25 3.69 4.22 4.00 4B LAN 16 80.79 93.59 94.61 103.99 104.34 104.57 4B LAN 128 96.58 96.67 95.74 106.41 107.24 107.82 32 Bit 4B WiFi1 16 6.70 6.82 6.76 7.19 6.53 7.22 4B WiFi2 16 11.50 13.93 14.13 9.91 8.88 9.92 Random milliseconds System Read Write PC WiFi 1.711 1.972 2.015 2.26 2.28 2.25 PC LAN 0.606 0.590 0.532 0.47 0.48 0.47 LAN 3B+ 0.030 0.816 0.484 1.19 1.16 1.16 3B+ WiFi 3.052 3.167 3.475 3.60 3.39 3.45 4B WiFi1 3.286 3.549 3.627 4.02 3.45 3.72 4B WiFi2 2.786 2.822 2.944 3.20 2.94 2.92 4B LAN 0.004 0.009 0.435 0.95 0.92 0.93 32 Bit 4B WiFi1 2.691 2.875 3.048 3.13 2.93 2.84 4B WiFi2 Similar 200 Small Files milliseconds per file System Write Read Delete PC WiFi 10.09 12.42 13.81 5.50 6.11 8.06 1.507 PC LAN 4.05 4.59 4.53 2.38 2.23 2.64 0.661 LAN 3B+ 3.72 4.36 4.45 3.33 3.40 3.60 0.378 3B+ WiFi 12.61 13.53 14.97 13.17 14.06 15.88 2.534 4B WiFi1 15.08 16.53 22.83 12.96 14.23 17.29 2.509 4B WiFi2 11.38 12.85 12.82 10.64 11.83 14.15 2.083 4B LAN 2.99 3.35 3.43 2.98 3.29 3.33 0.467 32 Bit 4B WiFi1 12.14 18.59 15.70 11.10 22.20 12.99 2.153 4B WiFi2 30.85 17.83 18.10 16.62 14.93 16.01 3.361 |
############################# Pi 3B+ ############################# Whetstone Benchmark Java Version, Sep 20 2019, 11:06:12 1 Pass Test Result MFLOPS MOPS millisecs N1 floating point -1.124750137 310.88 0.0618 N2 floating point -1.131330490 289.41 0.4644 N3 if then else 1.000000000 241.15 0.4292 N4 fixed point 12.000000000 706.28 0.4460 N5 sin,cos etc. 0.499110132 23.31 3.5700 N6 floating point 0.999999821 130.04 4.1480 N7 assignments 3.000000000 89.19 2.0720 N8 exp,sqrt etc. 0.825148463 21.92 1.6970 MWIPS 775.89 12.8884 Operating System Linux, Arch. aarch64, Version 4.19.67 Java Vendor IcedTea, Version 1.8.0_222 ############################# Pi 4B ############################## Whetstone Benchmark Java Version, Sep 12 2019, 20:15:35 1 Pass Test Result MFLOPS MOPS millisecs Gains N1 floating point -1.124750137 488.80 0.0393 1.57 N2 floating point -1.131330490 475.92 0.2824 1.64 N3 if then else 1.000000000 344.31 0.3006 1.43 N4 fixed point 12.000000000 1571.86 0.2004 2.23 N5 sin,cos etc. 0.499110132 43.55 1.9104 1.87 N6 floating point 0.999999821 264.15 2.0420 2.03 N7 assignments 3.000000000 264.00 0.7000 2.96 N8 exp,sqrt etc. 0.825148463 25.80 1.4420 1.18 MWIPS 1445.70 6.9171 1.86 Operating System Linux, Arch. aarch64, Version 4.19.67 Java Vendor IcedTea, Version 1.8.0_222 ######################### Pi 4B 32 Bit ########################### Whetstone Benchmark OpenJDK11 Java Version, May 15 2019, 18:48:20 1 Pass Test Result MFLOPS MOPS millisecs N1 floating point -1.124750137 524.02 0.0366 N2 floating point -1.131330490 494.12 0.2720 N3 if then else 1.000000000 289.92 0.3570 N4 fixed point 12.000000000 1092.99 0.2882 N5 sin,cos etc. 0.499110132 59.86 1.3900 N6 floating point 0.999999821 345.95 1.5592 N7 assignments 3.000000000 331.54 0.5574 N8 exp,sqrt etc. 0.825148463 25.41 1.4640 MWIPS 1687.92 5.9244 Operating System Linux, Arch. arm, Version 4.19.37-v7l+ Java Vendor BellSoft, Version 11.0.2-BellSoft |
Pi 4B performance gains shown below were indicated between 2.1 and 3.42 times.
At the end are 32 bit results from a Pi 4B test, using alternative Java software, with similar results.
############################# Pi 3B+ ############################# Java Drawing Benchmark, Sep 20 2019, 11:08:33 Produced by javac 1.7.0_02 Test Frames FPS Display PNG Bitmap Twice Pass 1 335 33.46 Display PNG Bitmap Twice Pass 2 546 54.53 Plus 2 SweepGradient Circles 502 50.08 Plus 200 Random Small Circles 366 36.59 Plus 320 Long Lines 134 13.30 Plus 4000 Random Small Circles 46 4.59 Total Elapsed Time 60.2 seconds Operating System Linux, Arch. aarch64, Version 4.19.67 Java Vendor IcedTea, Version 1.8.0_222 ############################# Pi 4B ############################## Java Drawing Benchmark, Sep 12 2019, 20:18:28 Produced by javac 1.7.0_02 Test Frames FPS Gains Display PNG Bitmap Twice Pass 1 1146 114.52 3.42 Display PNG Bitmap Twice Pass 2 1318 131.79 2.42 Plus 2 SweepGradient Circles 1237 123.66 2.47 Plus 200 Random Small Circles 972 97.13 2.65 Plus 320 Long Lines 415 41.48 3.12 Plus 4000 Random Small Circles 97 9.65 2.10 Total Elapsed Time 60.1 seconds Operating System Linux, Arch. aarch64, Version 4.19.67 Java Vendor IcedTea, Version 1.8.0_222 ######################### Pi 4B 32 Bit ########################### Java Drawing Benchmark, May 15 2019, 18:55:41 Produced by OpenJDK 11 javac Test Frames FPS Display PNG Bitmap Twice Pass 1 877 87.65 Display PNG Bitmap Twice Pass 2 1042 104.18 Plus 2 SweepGradient Circles 1015 101.47 Plus 200 Random Small Circles 779 77.85 Plus 320 Long Lines 336 33.52 Plus 4000 Random Small Circles 83 8.25 Total Elapsed Time 60.1 seconds Operating System Linux, Arch. arm, Version 4.19.37-v7l+ Java Vendor BellSoft, Version 11.0.2-BellSoft |
The benchmark measures graphics speed in terms of Frames Per Second (FPS) via six simple and more complex tests. The first four tests portray moving up and down a tunnel including various independently moving objects, with and without texturing. The last two tests, represent a real application for designing kitchens. The first is in wireframe format, drawn with 23,000 straight lines. The second has colours and textures applied to the surfaces.
Pi 4B average performance gains are included below, with textured objects the best, at 2.1 times, and worst, at around 1.5 times, with the slow kitchen displays.
Dual Monitors - The benchmark was also run with two 1920x1080 monitors connected. It displayed two identical displays when the mirror option was selected. Without this, the normal display, from where the program is executed, appeared on one display, and the OpenGL images on the other. This was fine when the usual display dimensions, as shown below, were specified. With no parameters, full screen image was assumed to be 3840x1080 and this was displayed horizontally squashed into 1920 pixels. FPS measurements for the latter are shown below. On running the 32 bit version via Raspbian, the default display was 3840x1080, across both monitors, but only on one monitor, when 1920x1080 parameters or less were specified. There was no mirror option. See performance below.
In order to demonstrate maximum speeds, VSYNCH (vblank) has to be switched off. The command is shown in the following script that is used to run a series of tests.
export vblank_mode=0 ./videogl64g9 Width 160, Height 120, NoEnd ./videogl64g9 Width 320, Height 240, NoHeading, NoEnd ./videogl64g9 Width 640, Height 480, NoHeading, NoEnd ./videogl64g9 Width 1024, Height 768, NoHeading, NoEnd ./videogl64g9 NoHeading |
32 bit Pi 4B results are also provided, in this case, a bit slower than the 64 bit speeds.
############################# Pi 3B+ ############################# GLUT OpenGL Benchmark 64 Bit Version 1, Fri Sep 20 11:15:47 2019 Running Time Approximately 5 Seconds Each Test Window Size Coloured Objects Textured Objects WireFrm Texture Pixels Few All Few All Kitchen Kitchen Wide High FPS FPS FPS FPS FPS FPS 160 120 389.6 227.2 122.6 75.3 30.0 21.5 320 240 328.1 201.7 113.8 73.3 30.2 21.3 640 480 203.3 144.7 87.8 62.0 30.2 21.0 1024 768 107.1 94.5 60.3 51.1 28.9 20.0 1920 1080 45.3 47.5 36.9 33.1 28.7 20.0 ############################## Pi 4B ############################# 160 120 767.4 420.3 258.3 154.3 45.7 31.7 320 240 682.9 388.8 245.0 148.3 45.1 30.8 640 480 367.1 262.6 217.9 140.1 46.2 30.9 1024 768 150.8 148.8 128.6 117.3 45.3 30.4 1920 1080 71.9 73.9 64.0 61.6 43.3 27.9 Pi 4B Gains 1.77 1.74 2.12 2.10 1.52 1.46 Dual Monitor- mirrored displays 1920 1080 65.0 66.3 61.6 58.2 42.7 27.5 Dual Monitor - not mirrored squashed image on one monitor 3840 1080 60.9 59.6 57.2 54.8 40.8 26.8 Dual Monitor 32 bit two monitors 3840 1080 26.9 26.6 26.1 25.1 25.5 15.9 ************************ Pi 4B 32 Bit ************************ GLUT OpenGL Benchmark 32 Bit Version 1, Fri Oct 11 19:12:24 2019 Window Size Coloured Objects Textured Objects WireFrm Texture Pixels Few All Few All Kitchen Kitchen 320 240 663.3 365.9 218.6 126.3 33.1 23.5 640 480 318.7 259.7 192.4 116.8 32.2 22.1 1024 768 138.9 134.1 112.7 102.7 31.9 21.4 1920 1080 57.5 56.1 53.3 50.0 29.3 19.5 Avg 64b/32b 1.13 1.13 1.15 1.19 1.42 1.39 |
Passes and sampling seconds to determine running time. If the stress test also has sampling periods, it is normally not possible to synchronise them but approximate periods can be matched.
CPU MHz - This can vary faster than any sampling time based on seconds, but the general trend can be useful. Tests that measure speed over sampling periods provide a better indication.
Core Voltage - This appears to vary a little, reason unknown.
CPU Temperature - assuming that it is correct, as it change slowly, this is the most useful measurement.
PMIC temperature - No issue so far with Power Management Integrated Circuit temperatures
################################################### Parameters - upper or lower case ./RPiHeatMHzVolts2 passes 33 secs 20 log 12 or ./RPiHeatMHzVolts2 P 33 S 20 L 12 For 33 samples at 20 second intervals, log file RPiHeatMHz12.txt To cover 10 minute test ################################################### Temperature and CPU MHz Measurement Start at Mon Oct 28 20:49:52 2019 Using 33 samples at 20 second intervals Seconds 0.0 ARM MHz=1500, core volt=0.8490V, CPU temp=61.0'C, pmic temp=55.2'C 20.0 ARM MHz=1500, core volt=0.8437V, CPU temp=73.0'C, pmic temp=62.8'C 40.3 ARM MHz=1500, core volt=0.8437V, CPU temp=77.0'C, pmic temp=66.5'C 60.5 ARM MHz=1500, core volt=0.8437V, CPU temp=79.0'C, pmic temp=69.4'C 80.7 ARM MHz=1500, core volt=0.8437V, CPU temp=80.0'C, pmic temp=70.3'C 101.0 ARM MHz=1500, core volt=0.8437V, CPU temp=81.0'C, pmic temp=70.3'C 121.2 ARM MHz=1500, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 141.4 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 161.7 ARM MHz=1500, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 181.9 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C |
Following HPL results here, are some for my integer and floating point stress tests. Although further comparative tests are needed to be conclusive, it does seem that the 64 bit floating point versions are faster than the 32 bit varieties and subject to lower temperature increases.
The earlier HPL benchmark results quoted obtained speeds of 8.1 GFLOPS on a cold start and 10.8 GFLOPS later, with a cooling fan in operation for both. The first results below were run without a fan, with a room temperature around 21°C, producing 7.6 GFLOPS on a cold start. Then average CPU frequency came out at 1056 MHz, with an average temperature of 80.3°C.
The second results followed a warm reboot to use a different version of Gentoo with HPL installed, obtaining 5.54 GFLOPS, with severe CPU frequency throttling, down to 600 MHz, with temperatures up to 80.3°C. Averages were 790 MHz and 80.3°C.
Shortly afterwards, with the fan in place, the Pi ran at 1500 MHz continuously, achieving 10.4 GFLOPS, with a maximum temperature of 64°C.
================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR11C2R4 20000 128 2 2 702.81 7.589e+00 HPL_pdgesv() start time Sat Aug 24 10:42:58 2019 HPL_pdgesv() end time Sat Aug 24 10:54:41 2019 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0008453 ...... PASSED ================================================================================ Example 2 - Note different sumchecks again ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR11C2R4 20000 128 2 2 963.16 5.538e+00 HPL_pdgesv() start time Tue Oct 29 11:51:10 2019 HPL_pdgesv() end time Tue Oct 29 12:07:13 2019 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0009005 ...... PASSED ================================================================================ Temperature and CPU MHz Measurement Start at Tue Oct 29 11:50:27 2019 Using 40 samples at 30 second intervals Seconds 0.0 ARM MHz=1500, core volt=0.8542V, CPU temp=63.0'C, pmic temp=58.0'C 30.0 ARM MHz=1500, core volt=0.8542V, CPU temp=79.0'C, pmic temp=69.4'C 60.3 ARM MHz=1000, core volt=0.8542V, CPU temp=83.0'C, pmic temp=72.2'C 91.6 ARM MHz=1000, core volt=0.8490V, CPU temp=85.0'C, pmic temp=74.1'C 122.2 ARM MHz=1000, core volt=0.8490V, CPU temp=84.0'C, pmic temp=74.1'C 152.7 ARM MHz= 750, core volt=0.8490V, CPU temp=83.0'C, pmic temp=74.1'C 183.2 ARM MHz=1000, core volt=0.8490V, CPU temp=84.0'C, pmic temp=76.0'C 213.8 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 244.3 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 274.7 ARM MHz= 600, core volt=0.8490V, CPU temp=86.0'C, pmic temp=76.9'C 305.2 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 335.6 ARM MHz=1000, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 366.1 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 396.6 ARM MHz= 600, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 427.2 ARM MHz= 750, core volt=0.8490V, CPU temp=86.0'C, pmic temp=76.9'C 457.5 ARM MHz= 600, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 488.0 ARM MHz= 600, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 518.6 ARM MHz= 750, core volt=0.8490V, CPU temp=84.0'C, pmic temp=76.9'C 549.0 ARM MHz= 600, core volt=0.8490V, CPU temp=86.0'C, pmic temp=76.9'C 579.6 ARM MHz= 750, core volt=0.8490V, CPU temp=86.0'C, pmic temp=76.0'C 610.1 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 640.6 ARM MHz= 750, core volt=0.8490V, CPU temp=86.0'C, pmic temp=76.9'C 671.1 ARM MHz= 750, core volt=0.8490V, CPU temp=86.0'C, pmic temp=76.9'C 701.6 ARM MHz= 600, core volt=0.8490V, CPU temp=86.0'C, pmic temp=76.0'C 732.0 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 762.4 ARM MHz= 600, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 792.9 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 823.4 ARM MHz= 750, core volt=0.8490V, CPU temp=84.0'C, pmic temp=76.9'C 853.9 ARM MHz= 600, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 884.4 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 914.9 ARM MHz= 600, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 945.3 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.9'C 975.8 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 1006.3 ARM MHz= 750, core volt=0.8490V, CPU temp=84.0'C, pmic temp=76.0'C 1036.7 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=76.0'C 1067.0 ARM MHz= 750, core volt=0.8490V, CPU temp=85.0'C, pmic temp=74.1'C Averages 790 84.1 75.5 |
In this case, a summary of separate tests for L1 cache, L2 cache and RAM are given. During the 10 minute sessions, the cache tests were mainly running at 1000 MHz, with those using RAM at the full speed 1500 MHz. No temperatures above 84°C were recorded.
Examining the full detail of the first test indicated that average CPU MHz and measured MB/second were around 75% of the maximum.
KB KB MB Same All Secs Thrds 16 160 16 Sumcheck Tests 3.0 4 28715 26652 3345 5A5A5A5A Yes 3.0 8 30292 26310 3334 AAAAAAAA Yes ./RPiHeatMHzVolts2 passes 66 secs 10 log 34 - used for all 10 minute stress tests ==== Stress Test Parameters - upper or lower case, only first letter counts ==== Threads 1, 2, 4, 8, 16, 32 KB between 12 and 15624 Log < 100 Minutes any > 0 ./MP-IntStress64 KB 16 Threads 8 Mins 10 Log 34 Seconds MB/sec 0.0 ARM MHz=1500, core volt=0.8455V, CPU temp=62.0'C, pmic temp=57.1'C 10.0 ARM MHz=1500, core volt=0.8455V, CPU temp=69.0'C, pmic temp=62.8'C 28695 20.2 ARM MHz=1500, core volt=0.8402V, CPU temp=73.0'C, pmic temp=64.6'C 28729 152.5 ARM MHz=1000, core volt=0.8402V, CPU temp=82.0'C, pmic temp=72.2'C 21523 305.5 ARM MHz=1000, core volt=0.8402V, CPU temp=83.0'C, pmic temp=74.1'C 20026 448.2 ARM MHz=1000, core volt=0.8402V, CPU temp=83.0'C, pmic temp=74.1'C 19611 601.1 ARM MHz=1000, core volt=0.8402V, CPU temp=83.0'C, pmic temp=74.1'C 19199 %Min/Max 66.9 ./MP-IntStress64 KB 160 Threads 8 Mins 10 Log 34 Seconds MB/sec 0.0 ARM MHz=1500, core volt=0.8402V, CPU temp=64.0'C, pmic temp=57.1'C 10.0 ARM MHz=1500, core volt=0.8402V, CPU temp=71.0'C, pmic temp=62.8'C 26323 20.2 ARM MHz=1500, core volt=0.8402V, CPU temp=75.0'C, pmic temp=66.5'C 26140 152.9 ARM MHz=1000, core volt=0.8402V, CPU temp=82.0'C, pmic temp=74.1'C 18016 306.5 ARM MHz=1000, core volt=0.8402V, CPU temp=83.0'C, pmic temp=74.1'C 17306 449.8 ARM MHz=1000, core volt=0.8402V, CPU temp=84.0'C, pmic temp=74.1'C 17248 603.3 ARM MHz= 750, core volt=0.8402V, CPU temp=84.0'C, pmic temp=74.1'C 16832 %Min/Max 63.9 ./MP-IntStress64 KB 16000 Threads 8 Mins 10 Log 34 Seconds MB/sec 0.0 ARM MHz=1500, core volt=0.8402V, CPU temp=66.0'C, pmic temp=60.9'C 10.0 ARM MHz=1500, core volt=0.8402V, CPU temp=71.0'C, pmic temp=62.8'C 3372 20.3 ARM MHz=1500, core volt=0.8402V, CPU temp=72.0'C, pmic temp=62.8'C 3369 155.2 ARM MHz=1500, core volt=0.8402V, CPU temp=76.0'C, pmic temp=68.4'C 3365 309.8 ARM MHz=1500, core volt=0.8402V, CPU temp=79.0'C, pmic temp=69.4'C 3367 454.4 ARM MHz=1500, core volt=0.8402V, CPU temp=78.0'C, pmic temp=70.3'C 3367 599.7 ARM MHz=1500, core volt=0.8402V, CPU temp=78.0'C, pmic temp=70.3'C 3368 %Min/Max 99.8 |
Following writing the above, the 32 bit stress test was repeated, with results shown below. Although not conclusive from a single run, they indicate that the impact was more severe than the 64 bit run, CPU speed sample reducing to 600 MHz, higher temperatures and a larger performance degradation.
Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 4.6 T4 2 9223 7520 519 40392 76406 99700 6.0 T8 2 9520 10471 545 40392 76406 99700 11.3 T4 8 19087 21040 2044 54764 85092 99820 12.9 T8 8 19747 21107 2016 54764 85092 99820 22.2 T4 32 25732 26704 9160 35206 66015 99520 24.1 T8 32 25708 25770 8927 35206 66015 99520 ==== Stress Test Parameters - upper or lower case, only first letter counts ==== Threads 1,2,4,8,16,32,64 KB 12 to 15624 Ops/Wordd 2,8,32 Log<100 Minutes any>0 ./MP-FPUStress64 KB 1280 T 8 Ops 8 Mins 10 Log 33 Seconds MFLOPS 0.0 ARM MHz=1500, core volt=0.8437V, CPU temp=64.0'C, pmic temp=59.0'C 10.0 ARM MHz=1500, core volt=0.8437V, CPU temp=71.0'C, pmic temp=62.8'C 17309 20.2 ARM MHz=1500, core volt=0.8437V, CPU temp=75.0'C, pmic temp=66.5'C 18018 101.9 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 14224 204.2 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 12806 306.8 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=73.1'C 12447 409.4 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=73.1'C 11870 501.6 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 12191 604.1 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=74.1'C 12169 %Min/Max 65.9 ./MP-FPUStress64 KB 1280 T 8 Ops 32 Mins 10 Log 33 Seconds MFLOPS 0.0 ARM MHz=1500, core volt=0.8437V, CPU temp=65.0'C, pmic temp=59.0'C 10.0 ARM MHz=1500, core volt=0.8437V, CPU temp=72.0'C, pmic temp=65.6'C 22634 20.2 ARM MHz=1500, core volt=0.8437V, CPU temp=76.0'C, pmic temp=67.5'C 22992 101.9 ARM MHz=1500, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 18629 204.0 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=74.1'C 16674 306.3 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 16448 408.6 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 16158 500.7 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 16081 603.0 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 15553 %Min/Max 67.6 ====================================================================================== 32 Bit Version ./MP-FPUStress KB 1280 T 8 Ops 32 Mins 10 Log 73 Seconds MFLOPS 0.0 ARM MHz=1500, core volt=0.8560V, CPU temp=56.0'C, pmic temp=50.5'C 10.0 ARM MHz=1500, core volt=0.8560V, CPU temp=70.0'C, pmic temp=60.9'C 20233 20.7 ARM MHz=1500, core volt=0.8560V, CPU temp=74.0'C, pmic temp=64.6'C 20221 106.4 ARM MHz=1000, core volt=0.8560V, CPU temp=83.0'C, pmic temp=70.3'C 14173 204.3 ARM MHz=1000, core volt=0.8455V, CPU temp=84.0'C, pmic temp=73.1'C 13115 302.2 ARM MHz=1000, core volt=0.8455V, CPU temp=85.0'C, pmic temp=74.1'C 12650 400.2 ARM MHz= 750, core volt=0.8455V, CPU temp=85.0'C, pmic temp=74.1'C 11957 508.8 ARM MHz=1000, core volt=0.8455V, CPU temp=85.0'C, pmic temp=74.1'C 11485 585.1 ARM MHz= 600, core volt=0.8455V, CPU temp=84.0'C, pmic temp=74.1'C 11454 606.9 ARM MHz=1000, core volt=0.8455V, CPU temp=84.0'C, pmic temp=74.1'C 11242 %Min/Max 55.6 |
The 32 bit version was also rerun, producing similar results as those at 64 bits.
Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 8.9 T4 2 5024 4589 257 40395 76384 99700 11.5 T8 2 5089 5545 280 40395 76384 99700 21.7 T4 8 10259 10011 1068 54805 85108 99820 24.7 T8 8 10239 10824 1036 54805 85108 99820 43.1 T4 32 12940 13200 4497 35159 66065 99521 46.9 T8 32 13200 13049 4557 35159 66065 99521 ==== Stress Test Parameters - upper or lower case, only first letter counts ==== Threads 1,2,4,8,16,32,64 KB 12 to 15624 Ops/Wordd 2,8,32 Log<100 Minutes any>0 ./MP-FPUStress64DP KB 1280 T 8 Ops 32 Mins 10 Log 31 Seconds MFLOPS 0.0 ARM MHz=1500, core volt=0.8437V, CPU temp=63.0'C, pmic temp=57.1'C 10.0 ARM MHz=1500, core volt=0.8437V, CPU temp=71.0'C, pmic temp=62.8'C 12718 20.2 ARM MHz=1500, core volt=0.8437V, CPU temp=74.0'C, pmic temp=66.5'C 12755 30.5 ARM MHz=1500, core volt=0.8437V, CPU temp=77.0'C, pmic temp=68.4'C 12750 40.7 ARM MHz=1500, core volt=0.8437V, CPU temp=81.0'C, pmic temp=70.3'C 12755 50.9 ARM MHz=1500, core volt=0.8437V, CPU temp=81.0'C, pmic temp=70.3'C 12183 61.2 ARM MHz=1500, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 11358 71.4 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 10922 81.6 ARM MHz=1000, core volt=0.8437V, CPU temp=80.0'C, pmic temp=72.2'C 10333 91.8 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 9948 102.0 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 9692 112.3 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 9466 122.6 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 9217 132.8 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=74.1'C 9181 143.0 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=74.1'C 9145 153.2 ARM MHz=1000, core volt=0.8437V, CPU temp=80.0'C, pmic temp=72.2'C 9043 163.4 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 8921 173.6 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 9838 183.9 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 8755 194.1 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=74.1'C 8737 204.4 ARM MHz=1000, core volt=0.8437V, CPU temp=81.0'C, pmic temp=72.2'C 8721 214.7 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 8721 224.9 ARM MHz=1500, core volt=0.8437V, CPU temp=83.0'C, pmic temp=73.1'C 8670 235.1 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=73.1'C 8619 245.4 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=74.1'C 8592 255.6 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=72.2'C 8592 265.9 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8540 276.2 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=73.1'C 8488 286.4 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=74.1'C 8547 296.7 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8510 307.0 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8473 317.2 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8507 327.5 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8541 337.7 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8544 347.9 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=74.1'C 8464 358.2 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8531 368.4 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8495 378.7 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8460 388.9 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8514 399.2 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8484 409.4 ARM MHz=1000, core volt=0.8437V, CPU temp=82.0'C, pmic temp=74.1'C 8454 419.6 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8459 429.8 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8489 440.1 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8472 450.3 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8428 460.6 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8384 470.9 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8384 481.2 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8387 491.4 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8391 501.7 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8244 511.9 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8346 522.1 ARM MHz= 750, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8272 532.4 ARM MHz=1000, core volt=0.8437V, CPU temp=83.0'C, pmic temp=74.1'C 8272 542.6 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8329 552.8 ARM MHz= 750, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8239 563.1 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8183 573.3 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8129 583.6 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8343 593.9 ARM MHz=1000, core volt=0.8437V, CPU temp=84.0'C, pmic temp=74.1'C 8266 604.1 ARM MHz=1000, core volt=0.8437V, CPU temp=85.0'C, pmic temp=74.1'C 8190 %Min/Max 63.7 |
The tests were run twice, without and with a cooling fan in place. Results are shown below. In this case, the no fan tests were not that much slower, obtaining averages of 77 to 80% of the fan cooled speeds on OpenGL FPS, CPU MHz and total Loops MFLOPS.
These results were produced with all programs compiled by gcc 9 and not run on a hot day. Compared with performance using 32 bit versions, detailed in the 32 Bit Repor Raspberry Pi 4B Stress Tests Including High Performance Linpack.htm. The 64 bit results were far better, but the former were produced by an older compiler and run on a hot day. The tests were repeated, using 32 bit programs produced by the later gcc 8 compiler.
As before, the 64 bit gcc 9 Livermore Loops and OpenGL single core benchmarks were faster than the new 32 bit versions, in this case by 14% for the former and 40% for the latter. On running the stress test, both had similar average CPU MHz, CPU temperature and PMIC temperature, with 64 bit FPS and MFLOPS maintaining performance advantage, with similar ratios as obtained from single core tests.
run.sh lxterminal -e ./RPiHeatMHzVolts2 Passes 35 Seconds 30 Log 20 & lxterminal -e ./liverloopsPi64Rg9 Seconds 12 Log 21 & lxterminal -e ./liverloopsPi64Rg9 Seconds 12 Log 22 & lxterminal -e ./liverloopsPi64Rg9 Seconds 12 Log 23 runogl.sh export vblank_mode=0 & ./videogl64g9 Test 6 Mins 16 Log 20 No Fan With Fan Seconds MHz CPU C PMIC C FPS MHz CPU C PMIC C FPS 0 1500 57 51 1500 37 32 30 1500 75 63 27 1500 53 44 27 60 1500 76 68 29 1500 53 44 28 90 1500 81 72 25 1500 58 50 27 120 1500 81 70 23 1500 55 48 26 150 1000 82 74 23 1500 57 49 29 180 1000 80 72 22 1500 54 47 27 210 1000 81 72 24 1500 55 46 29 240 1500 80 72 26 1500 54 44 28 270 1500 81 72 27 1500 55 47 28 300 1000 82 72 22 1500 56 48 29 330 1500 82 72 24 1500 56 50 29 360 1000 82 72 24 1500 56 49 28 390 1000 82 72 22 1500 58 50 26 420 1000 83 72 22 1500 57 50 26 450 1000 82 74 19 1500 56 50 30 480 1000 82 74 21 1500 56 48 28 510 1000 82 72 22 1500 54 46 29 540 1000 81 72 22 1500 55 47 30 570 1500 81 72 24 1500 55 47 30 600 1000 82 74 24 1500 57 49 30 630 1500 81 72 23 1500 58 51 29 660 1000 82 72 23 1500 57 50 29 690 1000 83 73 22 1500 59 51 28 720 1000 83 72 21 1500 57 51 28 750 1000 82 74 21 1500 57 50 29 780 1000 84 74 19 1500 54 47 29 810 1000 82 72 19 1500 56 48 29 840 1000 82 72 20 1500 54 46 29 870 1000 82 72 20 1500 53 46 30 900 1000 82 72 23 1500 49 42 31 Average 1161 81 71 23 1500 55 47 29 Minimum 1000 57 51 19 1500 37 32 26 Maximum 1500 84 74 29 1500 59 51 31 % Hot/Cold Average 77 68 66 80 Minimum 67 65 61 73 Maximum 100 70 69 94 MFLOPS Average Geomean Harmean Average Geomean Harmean 1 684 562 453 898 732 590 2 716 574 451 887 712 571 3 716 566 438 895 724 582 Total %Hot/Cold MFLOPS 79 78 77 |
Patterns No. Hex No. Hex No. Hex No. Hex No. Hex No. Hex No. Hex 1 0 25 800000 49 3 73 FF 97 FFFFDFFF 121 FFFFEAAA 145 FFFFF0F0 2 1 26 1000000 50 33 74 FF00FF 98 FFFFBFFF 122 FFFFAAAA 146 FFF0F0F0 3 2 27 2000000 51 333 75 1FF 99 FFFF7FFF 123 FFFEAAAA 147 F0F0F0F0 4 4 28 4000000 52 3333 76 3FF 100 FFFEFFFF 124 FFFAAAAA 148 FFFFFFE0 5 8 29 8000000 53 33333 77 7FF 101 FFFDFFFF 125 FFEAAAAA 149 FFFF83E0 6 10 30 10000000 54 333333 78 FFF 102 FFFBFFFF 126 FFAAAAAA 150 FE0F83E0 7 20 31 20000000 55 3333333 79 1FFF 103 FFF7FFFF 127 FEAAAAAA 151 FFFFFFC0 8 40 32 40000000 56 33333333 80 3FFF 104 FFEFFFFF 128 FAAAAAAA 152 FFFC0FC0 9 80 33 1 57 7 81 7FFF 105 FFDFFFFF 129 EAAAAAAA 153 FFFFFF80 10 100 34 5 58 1C7 82 FFFF 106 FFBFFFFF 130 AAAAAAAA 154 FFE03F80 11 200 35 15 59 71C7 83 FFFFFFFF 107 FF7FFFFF 131 FFFFFFFC 155 FFFFFF00 12 400 36 55 60 1C71C7 84 FFFFFFFE 108 FEFFFFFF 132 FFFFFFCC 156 FF00FF00 13 800 37 155 61 71C71C7 85 FFFFFFFD 109 FDFFFFFF 133 FFFFFCCC 157 FFFFFE00 14 1000 38 555 62 F 86 FFFFFFFB 110 FBFFFFFF 134 FFFFCCCC 158 FFFFFC00 15 2000 39 1555 63 F0F 87 FFFFFFF7 111 F7FFFFFF 135 FFFCCCCC 159 FFFFF800 16 4000 40 5555 64 F0F0F 88 FFFFFFEF 112 EFFFFFFF 136 FFCCCCCC 160 FFFFF000 17 8000 41 15555 65 F0F0F0F 89 FFFFFFDF 113 DFFFFFFF 137 FCCCCCCC 161 FFFFE000 18 10000 42 55555 66 1F 90 FFFFFFBF 114 BFFFFFFF 138 CCCCCCCC 162 FFFFC000 19 20000 43 155555 67 7C1F 91 FFFFFF7F 115 FFFFFFFE 139 FFFFFFF8 163 FFFF8000 20 40000 44 555555 68 1F07C1F 92 FFFFFEFF 116 FFFFFFFA 140 FFFFFE38 164 FFFF0000 21 80000 45 1555555 69 3F 93 FFFFFDFF 117 FFFFFFEA 141 FFFF8E38 22 100000 46 5555555 70 3F03F 94 FFFFFBFF 118 FFFFFFAA 142 FFE38E38 23 200000 47 15555555 71 7F 95 FFFFF7FF 119 FFFFFEAA 143 F8E38E38 24 400000 48 55555555 72 1FC07F 96 FFFFEFFF 120 FFFFFAAA 144 FFFFFFF0 Sequences - First 16 No. File No. File No. File No. File 1 0 1 2 3 5 0 2 1 3 9 0 3 1 2 13 0 1 2 3 2 1 2 3 0 6 1 3 2 0 10 1 0 3 2 14 1 2 3 0 3 2 3 0 1 7 2 0 1 3 11 2 1 0 3 15 2 3 0 1 4 3 0 2 1 8 3 1 2 0 12 3 2 1 0 16 3 0 2 1 ########################################################################### Run Time Parameters - Upper or Lower Case Default R or Repeats Data size, multiplier of 10.25 MB, more or less 16 P or Patterns Number of patterns for smaller files < 164 164 M or Minutes Large file reading time 2 L or Log Log file name extension 0 to 99 0 S or Seconds Time to read each block, last section 1 F or FilePath For other than SD card or SD card directory C or CacheData Omit O_DIRECT on opening files to allow caching No O or OutputPatterns Log patterns and file sequences used as above No D or DontRunReadTests Or only run write tests No Format ./burnindrive2 Repeats 16, Minutes 2, Log 0, Seconds 1 or ./burnindrive2 R 16, M 2, L 0, S 1 ########################################################################### Examples of Results Main SD Card Default Parameters File 1 164.00 MB written in 14.66 seconds - 11.2 MB/second To File 4 164.00 MB written in 12.15 seconds - 13.5 MB/second Read passes 1 x 4 Files x 164.00 MB in 0.33 minutes - 33.1 MB/second To Read passes 7 x 4 Files x 164.00 MB in 2.28 minutes - 33.6 MB/second Passes in 1 second(s) for each of 164 blocks of 64KB: - 164 measurements 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 580 95120 read passes of 64KB blocks in 2.76 minutes - 36.8 MB/second |
A snapshot of vmstat system performance is also provided. The bo and bi KB/second writing and reading speeds are essentially the same as the sum those reported by the programs handling the main and USB drives. LAN speeds are not included in vmstat.
Total CPU utilisation (us + sy) is shown to be nearly 90% at the start of writing and closer to 75% on reading, representing average utilisation per core or at least three cores at 100%. Next page shows variations in performance with time.
############################### Script File ############################### lxterminal -e ./RPiHeatMHzVolts2 Passes 35 Seconds 30 Log 20 & lxterminal -e ./burnindrive264g9 Seconds 4 Minutes 1 Log 21 & lxterminal -e ./burnindrive264g9 Seconds 4 Minutes 1 FilePath /run/media/demouser/PATRIOT Log 22 & lxterminal -e ./burnindrive264g9 Seconds 4 Minutes 1 FilePath /run/media/demouser/REMIXOSSYS Log 23 & lxterminal -e ./burnindrive264g9 Seconds 4 Minutes 1 FilePath /media/public/test Log 24 & lxterminal -e ./MP-FPUStress64 KB 256 T 2 Ops 32 Mins 12 Log 33 vmstat 10 96 > vmstat.txt ############################################################################ Main SD Drive Tue Nov 5 15:47:03 2019 End of test Tue Nov 5 16:00:06 2019 Write 164 MB x files 4 53.6 seconds = 12.2 MB/second (BI 12.7) Read 164 MB x files 3 x 4 67.2 seconds = 29.3 MB/second (BI 33.6) Read 329480 x 64 KB 659.4 seconds = 32.0 MB/second (BI 36.8) ============================================================ USB 3 Drive 1 Tue Nov 5 15:47:03 2019 End of test Tue Nov 5 15:59:31 2019 Write 164 MB x files 4 17.5 seconds = 37.5 MB/second (BI 68.3) Read 164 MB x files 6 x 4 72.0 seconds = 54.7 MB/second (BI 75.0) Read 735800 x 64 KB 657.6 seconds = 71.6 MB/second (BI 66.5) ============================================================ USB 3 Drive 2 Tue Nov 5 15:47:03 2019 End of test Tue Nov 5 15:59:57 2019 Write 164 MB x files 4 37.4 seconds = 17.5 MB/second (BI 23.8) Read 164 MB x files 3 x 4 75.6 seconds = 26.0 MB/second (BI 28.5) Read 282740 x 64 KB 660.0 seconds = 27.4 MB/second (BI 29.8) ============================================================ 1 Gbps LAN Tue Nov 5 15:47:03 2019 End of test Tue Nov 5 15:59:35 2019 Write 164 MB x files 4 18.1 seconds = 36.2 MB/second (BI 55.7) Read 164 MB x files 3 x 4 74.4 seconds = 26.4 MB/second (BI 34.0) Read 303920 x 64 KB 659.4 seconds = 29.5 MB/second (BI 45.3) ============================================================ MP-Threaded-MFLOPS 64 Bit v1.1 Tue Nov 5 15:47:03 2019 End of test Tue Nov 5 15:59:13 2019 2 core GFLOPS 10.9 to 7.4 with CPU throttling. See RPiHeatMHzVolts2 results where detail is included ============================================================ From vmstat 10 second sampling Secs procs ---------memory---------- ---swap-- -----io---- --system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 10 5 3 0 3059800 94956 346060 0 0 14 63204 17819 19051 51 38 2 9 0 20 3 2 0 3058696 95248 346704 0 0 14265 60713 17613 18789 51 33 4 12 0 60 4 2 0 3061196 95668 343572 0 0 93479 7577 24239 24987 54 19 4 23 0 70 4 3 0 3050632 95684 353600 0 0 112115 24 24496 25316 54 20 12 14 0 710 3 3 0 3058696 96532 349460 0 0 132992 16 18936 22387 53 22 3 22 0 720 5 1 0 3058728 96548 349452 0 0 134400 13 20635 23842 54 23 1 23 0 |
Speeds and Temperature - These tests were run without an active cooling fan, resulting in some CPU throttling, with clock speed down to 1000 MHz some of the time, when the temperature reached 80°C. The MP-Threaded-MFLOPS dual core performance measurements have been added to the environmental details, mainly indicating the effects of throttling.
The burnindrive last results record the number of read passes in 4 seconds, in a table comprising 14 lines of 11 recordings and one with 10, over approximately 11 minutes. The average burnindrive results for each line are provided below, not exactly synchronised, but giving an indication of changes in throughput with time. Total passes and percentage degradation are also shown, the latter not being as severe as CPU speed reductions.
Temperature and CPU MHz Measurement + MP-FPUStress64 2 Core MFLOPS Start at Tue Nov 5 15:47:03 2019 Using 25 samples at 30 second intervals Seconds MFLOPS 0.0 ARM MHz=1500, core volt=0.8560V, CPU temp=66.0'C, pmic temp=59.0'C 30.0 ARM MHz=1500, core volt=0.8560V, CPU temp=75.0'C, pmic temp=65.6'C 10890 60.2 ARM MHz=1500, core volt=0.8560V, CPU temp=78.0'C, pmic temp=68.4'C 10551 90.4 ARM MHz=1500, core volt=0.8560V, CPU temp=80.0'C, pmic temp=70.3'C 10549 120.6 ARM MHz=1500, core volt=0.8560V, CPU temp=81.0'C, pmic temp=70.3'C 10452 150.8 ARM MHz=1500, core volt=0.8560V, CPU temp=81.0'C, pmic temp=70.3'C 9862 181.1 ARM MHz=1000, core volt=0.8560V, CPU temp=81.0'C, pmic temp=70.3'C 9482 211.4 ARM MHz=1500, core volt=0.8560V, CPU temp=82.0'C, pmic temp=72.2'C 9137 241.6 ARM MHz=1500, core volt=0.8507V, CPU temp=81.0'C, pmic temp=72.2'C 9132 271.9 ARM MHz=1000, core volt=0.8507V, CPU temp=82.0'C, pmic temp=70.3'C 9122 302.2 ARM MHz=1500, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 9389 332.4 ARM MHz=1500, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 8550 362.7 ARM MHz=1000, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 9043 392.9 ARM MHz=1500, core volt=0.8455V, CPU temp=81.0'C, pmic temp=72.2'C 8045 423.3 ARM MHz=1000, core volt=0.8455V, CPU temp=81.0'C, pmic temp=72.2'C 8174 453.6 ARM MHz=1500, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 8444 483.9 ARM MHz=1500, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 8335 514.3 ARM MHz=1000, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 7951 544.6 ARM MHz=1500, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 8125 574.8 ARM MHz=1500, core volt=0.8455V, CPU temp=83.0'C, pmic temp=72.2'C 8078 605.1 ARM MHz=1000, core volt=0.8455V, CPU temp=81.0'C, pmic temp=72.2'C 8280 635.4 ARM MHz=1000, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 7845 665.7 ARM MHz=1000, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 7761 696.0 ARM MHz=1000, core volt=0.8455V, CPU temp=82.0'C, pmic temp=73.1'C 8341 726.2 ARM MHz=1000, core volt=0.8455V, CPU temp=82.0'C, pmic temp=72.2'C 7407 |