Raspberry Pi 3B+ 32 Bit and 64 Bit Benchmarks and Stress Tests

Roy Longbottom


Contents


Summary Whetstone Benchmark Dhrystone 2 Benchmark
Linpack Benchmark Livermore Loops Benchmark Memory Speed Benchmark
NEON Float & Integer Benchmark Bus Speed Benchmark FFT Benchmarks
MultiThreading Benchmarks MP-MFLOPS Benchmarks MP-Whetstone Benchmark
MP-Dhrystone Benchmark MP-linpack Benchmark MP-BusSpd Benchmark
MP-RandMem Benchmark OpenMP-MemSpeed Benchmark OpenMP-MFLOPS Benchmark
Java Benchmarks Java Whetstone Benchmark JavaDraw Benchmark
OpenGL GLUT Benchmark I/O Benchmarks DriveSpeed Benchmark
LAN Benchmark WiFi Benchmark
Stress Tests MHz, Temperature & Voltage Monitor Integer Stress Tests
Floating Point Stress Tests Livermore Loops Stress Tests OpenGL Stress Tests
OpenGL + CPU Stress Tests FLIRC Case Stress Tests
Assembly Code System ID



Summary

Previously, I have run my 32 bit and 64 bit benchmarks on the appropriate range of Raspberry Pi computers, up to model 3B, and Operating Systems. Details of the benchmarks and results are available in Raspberry Pi 32 Bit and 64 Bit Benchmarks and Stress Tests.htm.

This report contains brief reminders of the benchmarks, with 32 bit and 64 bit results on a Rspberry Pi 3B+. On obtaining the computer, the original Operating Systems failed to boot. Raspbian required an update, but a new version of a 64 bit system was required. In this case, a working Gentoo became available.

Existing benchmarks were used, to provide comparisons between the old 3B model and the new 3B+ and 64 bit versus 32 bit operation. The new version of Gentoo was used on both hardware platforms to provide compatible results. The benchmarks and results are summarised as follows.

Single Core CPU Tests - comprising Whetstone, Dhrystone, Linpack and Livermore Loops Classic Benchmarks. Performance improvements of the 3B+, compared with the model 3B, were effectively proportional to increased CPU MHz, as expected. Possibly compiler version dependent 64/32 bit average gains were between 1.32 and 2.49 times.

Memory Benchmarks - measuring Floating point and integer performance using data from caches and RAM. Expected 3B+ gains were generally demonstrated with cached data, but could be slightly slower from RAM. In the numerous measurements, 32 bit compilations were sometimes faster than the 64 bit versions.

Multithreading Benchmarks - Most of the multithreading benchmarks execute the same calculations using 1, 2, 4 and 8 threads. These include some of the Classic Benchmarks, the most efficient was Whetstones, where each thread runs a nearly independent copy, and four could be completed in the same time as one. Next were Dhrystones, with some shared variables, providing 4 core gains of less than 2.5 times, then Linpack, demonstrating that the original code was completely unsuitable for multithreading, due to frequent thread starts and stops.

MP cache/RAM benchmarks are included, with serial and random access and sharing data, but starting at different points. Here serial reading provided adequate MP gains from cached data and some from RAM, then no gain with reading and writing and slow RAM random access. Then there is another, compiled using OpenMP, that providied totally confusing results.

Finally, a number of benchmarks attempt to measure maximum MFLOPS floating point speed, using the same series of calculations, with variants covering single and double precision (SP and DP), vector intrinsic functions and OpenMP. Performance is shown to radically improve with new compiler versions. Best 64 bit four core scores were 11.56 GFLOPS SP and 4.49 DP. Assembly code produced by the compilers is included later.

Java and OpenGL Benchmarks - A Java Whetstone benchmark is provided and one using JavaDraw functions. The latter has six tests with increasing demands, where the 3B+ was slightly faster than the 3B and 32 bit speeds faster than 64 bits, performance being dependent on Java and driver versions. The OpenGL benchmark has parameters to run at different window sizes, drawing coloured and textured objects, including an effective real design application. Results show similar performance using 3B and 3B+ running at both 32 and 64 bits, for the simpler graphics tests, with the 3B+ becoming faster with increasing complexity. The OpenGL program can also be used as a stress test.

Drive LAN and WiFi Benchmarks - Variations of the same progam are provided to benchmark internal and USB drives or LAN and WiFi connections, measuring performance using large files, small files and random access. The 64 bit drive version failed to run as expected (a direct I/O caching issue). Both 32 bit and 64 bit LAN/WiFi versions ran successfully on 3B and 3B+, communicating with Windows 7, Windows 10 and Linux Ubuntu. However, there were numerous complications and peculiar results that warrant further investigation. At least, LAN and WiFi large file data transfer speeds could be significantly faster via the 3B+.

Stress Test Summary below or Go To Start


Stress Tests - There are three main stress testing programs, that have parameters to select specific tests and running time, including one using integer calculations and another floating point, each with options to select cache or RAM sized data. The third is the OpenGL benchmark. To assist in resolving an earlier issue, I modified my CPU MHz/temperature monitor to include core voltage. Most sessions were run for 15 minutes.

The first tests used four copies of the processor programs, at both 32 and 64 bits, on the Pi 3B+ in a plastic case, with no CPU heatsink. These demonstrated the thermal characteristics, with the CPU MHz reducing from 1400 to 1200 (within a minute) at 70°C, with core voltage falling the same time. Later, thermal throttling kicked in on reaching 80°C, with MHz down to below 1000. Surprisingly, the floating point test performance reductions were less than those from the integer tests.

Running the OpenGL stress test (on a hot day) lead to performance reductions on reaching 70°C, after four minutes, but not much further. It was noted that more than one CPU core was being used and this affected later tests. The OpenGL test was one of those that ran slowly when an extended power cable was used, with reductions in MHz to 600, voltage to 1.2 and Frames Per Second from 20 to 8 (at a low 52.6°C).

The next tests comprised three CPU tests plus OpenGL, on the 3B+ at 64 bits. Using both the integer and floating point programs, temperatures exceeded 80°C for at least 10 minutes out of 15, worst OpenGL performance being with the latter program, again reducing to 8 FPS (at 1034 MHz, and 82.2°C).

FLIRC Case - The CPU and OpenGL tests were repeated with The Pi 3B+ board in an aluminium FLIRC case, that acts as an efficient heat sink. During two consecutive 15 minute tests, 1200 MHz was recorded occasionally, minimum FPS of 17 and CPU performance near maximum possible.

Assembly Code - In order to identify reasons for significant performance differences, disassembled code was produced, with that for critical processing loops provided. Main differences were between the use of scalar and vector instructions.

System ID - Details of the system used are identified using original standard functions. These show differences between 64 bit and 32 bit OS operation and Linux versions, but exactly the same for Pi 3B and 3B+ processors.

Whetstone Benchmark below or Go To Start


Whetstone Benchmark - whetstonePiA7, whetstonePi64

The Whetstone Benchmark was the first general purpose benchmark that set industry standards of performance, particularly for minicomputers, and introduced in 1972. The benchmark produced speed ratings in terms of Thousands of Whetstone Instructions Per Second (KWIPS). In 1978, self timing versions (by yours truly) produced speed ratings, for each of the eight test procedures, in MOPS (Millions of Operations Per Second) or MFLOPS (Millions of Floating Point Operations Per Second), with an overall rating in MWIPS, mainly dependent on floating point speed, perticularly the COS and EXP tests on the latest systems.

See Whetstone Benchmark History and Results whetstone.htm and Whetstone Benchmark Detailed Results On PCs whetstone results.htm.

As for most benchmarks that completely depend on CPU speed, measured performance comparisons, at a given bit density, are essentially proportional to clock MHz, in this case 1400/1200. Performance at 64 bits is effectively the same as the 32 bit results, except for faster speeds in the tests that use such as COS and EXP functions. In turn, this leads to a superior 64 bit MWIPS rating.

As noted with my Android benchmarks, performance using Java can vary considerably with the version of run time software included. For comparable 3B and 3B+ speeds at 64 bits, running both under the later Gentoo upgrade was required. For the same reason, it is not appropriate to compare Java results between 32 bit and 64 bit systems.


 System          MHz  MWIPS  ------MFLOPS-------   ------------MOPS---------------
                               1      2      3     COS   EXP  FIXPT      IF  EQUAL
 32 Bit

 RPi 3   v8-A53 1200  711.6  336.5  329.7  256.9  12.2   8.8 1498.5  1796.7 1198.7
 RPi 3B+ v8-A53 1400  829.9  392.7  384.6  299.8  14.2  10.2 1748.1  2095.8 1398.5

 Ratio          1.17   1.17   1.17   1.17   1.17  1.16  1.16   1.17    1.17   1.17

 Java

 Both Java 1,8.0_65

 RPi 3   v8-A53 1200  183.4  184.1  179.6   91.1  5.94  1.19  460.5    88.6  276.6
 RPi 3B+ v8-A53 1400  211.8  214.2  207.6  105.8  6.92  1.37  535.5   103.1  321.3  

 Ratio          1.17   1.15   1.16   1.16   1.16  1.16  1.15   1.16    1.16   1.16

 64 Bit

 RPi 3   v8-A53 1200  969.9  330.1  346.7  282.8  19.5  11.2 1459.9  ###### 1171.9
 RPi 3B+ v8-A53 1400 1124.9  383.2  402.7  327.8  22.6  13.0 1699.5  ###### 1358.0

 Ratio          1.17   1.16   1.16   1.16   1.16  1.16  1.16   1.16           1.16
 3B+ 64/32 bit         1.36   0.98   1.05   1.09  1.59  1.27   0.98           0.97     

 Java

 Java 1.8.0_121, Linux 4.10.0

 RPi 3   v8-A53 1200  783.0  335.4  296.3  207.0  19.0  18.1  667.1   160.8   88.3

 Both Java 1.8.0_161, Linux 4.14.31

 RPi 3   v8-A53 1200  667.9  268.9  249.6  112.2  20.0  18.8  608.8   207.8   76.7
 RPi 3B+ v8-A53 1400  774.6  311.5  289.9  130.2  23.2  21.8  706.6   241.0   89.1

 Ratio          1.17   1.16   1.16   1.16   1.16  1.16  1.16   1.16    1.16   1.16

 ###### compiler optimiser produces 1 pass, this test does not affect MWIPS much 
  


Dhrystone Benchmark below or Go To Start



Dhrystone 2 Benchmark - dhrystonePiA7, dhrystonePi64

The Dhrystone "C" benchmark provides a measure of integer performance (no floating point instructions). It became the key standard benchmark from 1984, with the growth of Unix systems. The first version was produced by Reinhold P. Weicker in ADA and translated to "C" by Rick Richardson. Two versions are available - Dhrystone versions 1.1 and 2.1. The second version, used here, was produced to avoid over-optimisation problems encountered with version 1, but some is still possible. Speed was originally measured in Dhrystones per second. This was later changed to VAX MIPS by dividing Dhrystones per second by 1757, the DEC VAX 11/780 result, the latter being regarded as the first 1 MIPS minicomputer.

See hrystone Results On PCs and Later Devices Dhrystone Results.htm

Here, 3B+/3B speeds are again proportional to CPU MHz. 64 bit speeds are indicated as being 40% faster than 32 bit, maybe due to more efficient instructions or benefiting from the availability of additional registers. Then the latter might allow more over-optimisation.


  System            MHz   VAX MIPS  MIPS/MHz

  32 Bit

  RPi 3   v8-A53   1200     2469     2.06
  RPi 3B+ v8-A53   1400     2881     2.06

  Ratio            1.17     1.17

  64 Bit

  RPi 3   v8-A53   1200     3475     2.90
  RPi 3B+ v8-A53   1400     4021     2.87

  Ratio            1.17     1.16
  3B+ 64/32 bit             1.40
  


Linpack Benchmark below or Go To Start


Linpack Benchmark - linpackPiA7, linpackPiA7SP, linpackPi64, linpackPiSP64
Plus Linpack NEON - linpackPiNEONi, linpackPiNEONi64

The Linpack Benchmark was produced from the "LINPACK" package of linear algebra routines. It became the primary benchmark for scientific applications, particularly under Unix, from the mid 1980's, with a slant towards supercomputer performance. The original double precision C version, used here, operates on 100x100 matrices. Performance is governed by an inner loop in function daxpy() with a linked triad dy[i] = dy[i] + da * dx[i], and is measured in Millions of Floating Point Operations Per Second (MFLOPS).

Programming procedures and displayed output are the same as the original version for PCs (My accepted conversion at Netlib - 1996), where the bloated detail was needed due to using a low resolution timer. As for the original Fortran version, two sets of results are produced, with different memory alignment, and the lowest MFLOPS selected as the speed rating. This can lead to variation over multiple runs.

The benchmark produces a set of numeric results of calculations that demonstrate accuracy and consistency. These can vary, mainly by not much, as a result of the compiler generating different scalar or vector instructions. The source code includes the option of changing values used for comparison purposes, to suit particular situations. In this case, some benchmark programs have not been modified and result in an error message (see below). The range of results encountered is also shown.

See also Linpack Benchmark Results On PCs and Later Devices linpack results.htm

Besides compiled from standard C code, a new version is included, using NEON Intrinsic Functions for the daxpy function. This produces a significant performance gain with 32 bit compilation, but the vector instructions, used at 64 bits, provide similar speed gains.

Note the 64 bit performance gains in the table, that are up to near 2.5 times. Model 3B+/3B performance ratios are again mainly proportional to those for CPU MHz. MFLOPS per MHz ratios are also shown, now better than the Whetstone benchmark at up to 0.43 single precision and 0.29 double precision, for 64 bit programs.

         
                           ------ MFLOPS ----    --- MFLOPS/MHz -- 
 System            MHz     DP     SP  NEON SP    DP    SP  NEON SP

 32 Bit

 RPi 3   v8-A53   1200    180    194    486    0.15   0.16   0.41
 RPi 3B+ v8-A53   1400    210    226    562    0.15   0.16   0.40

 Ratio            1.17   1.17   1.16   1.16


 64 Bit

 RPi 3   v8-A53   1200    343    484    521   0.29   0.40   0.43
 RPi 3B+ v8-A53   1400    397    563    605   0.29   0.40   0.43

 Ratio            1.17   1.20   1.17   1.17
 3B+ 64/32 bit           1.89   2.49   1.08


 Error Message Example

 Variable norm. resid Non-standard result was              1.9 instead of              1.7
 Variable resid       Non-standard result was   8.46778499e-14 instead of   7.41628980e-14
 Variable x[0]-1      Non-standard result was  -1.11799459e-13 instead of  -1.49880108e-14
 Variable x[n-1]-1    Non-standard result was  -9.60342916e-14 instead of  -1.89848137e-14
 
 Results of Calculations

         norm resid       resid            x[0]-1          x[n-1]-1
          
 DP Pi      1.7     7.41628980e-14   -1.49880108e-14   -1.89848137e-14
 DP Pi 2-3  1.9     8.46778499E-14   -1.11799459E-13   -9.60342916E-14
 DP Pi 64   1.9     8.46778499e-14   -1.11799459e-13   -9.60342916e-14

 SP Pi      1.6     3.80277634e-05   -1.38282776e-05   -7.51018524e-06 
 SP Pi NEON 2.2     5.16722466e-05   -2.38418579e-07   -5.06639481e-06
 SP Pi 2-3  2.0     4.69621336E-05   -1.31130219E-05   -1.30534172E-05
 SP Pi 64   2.0     4.69621336e-05   -1.31130219e-05   -1.30534172e-05
    


Livermore Loops Benchmark below or Go To Start


Livermore Loops Benchmark - liverloopsPiA7, liverloopsPi64

This original main benchmark for supercomputers was first introduced in 1970, initially comprising 14 kernels of numerical application, written in Fortran. This was increased to 24 kernels in the 1980s. Performance measurements are in terms of Millions of Floating Point Operations Per Second or MFLOPS. The kernels are executed three times with different double precision data array sizes. Following are overall MFLOPS results for various systems, geometric mean being the official average performance.

The speed of the original Raspberry Pi could be rated as 4.5 times faster than the Cray 1 supercomputer - see my quote on Cost and Physical Differences. Now, one core of the Raspberry Pi 3B+ produces performance equivalent to 24 Cray 1 computers.

Some of the program's 3 x 24 kernels included produce inconsistent speeds, particularly for the minimum value but CPU MHz ratios still broadly apply to the performance summaries. The 64 bit official average MFLOPS rating is shown as being 32% faster than at 32 bits, with double precision MFLOPS/MHz at 0.20. The latter for maximum speed is 0.51.

See also livermore loops results.htm for results on PCs and later devices.
  
 32 Bit SUmmary
                          -------------- DP MFLOPS -------------- Per MHz
 System            MHz    Maximum Average Geomean Harmean Minimum Geomean

 RPi 3   v8-A53   1200     398.4   210.6   185.9   160.2    56.5    0.15
 RPi 3B+ v8-A53   1400     462.5   243.8   215.2   185.7    65.6    0.15

 Ratio            1.17      1.16    1.16    1.16    1.16    1.16

 64 Bit Summary

 RPi 3   v8-A53   1200     633.1   275.8   245.2   215.5    81.3    0.20
 RPi 3B+ v8-A53   1400     720.6   320.2   285.6   251.9    94.4    0.20

 Ratio            1.17      1.18    1.16    1.15    1.14    1.04 
 3B+ 64/32 bit              1.59    1.31    1.32    1.35    1.44
          

 32 Bit DP MFLOPS 24 Loops

 Raspberry Pi 3 1200 MHz
 192.9  228.0  398.4  337.4  124.6  167.5  359.7  384.3  347.7  171.6  132.5   74.7
  83.9  109.1  225.4  221.2  307.9  288.6  202.2  211.9  114.7   56.9  300.2  170.1

 Raspberry Pi 3B+ 1400 MHz
 223.8  264.4  462.5  392.9  146.0  159.4  416.0  446.3  406.7  199.1  153.8   86.7
  99.5  126.7  261.8  256.8  357.4  333.6  234.7  239.5  132.9   66.0  345.4  197.5

 Ratios 0.95 to 1.19, average 1.15 (Normal variations)

 64 Bit DP MFLOPS 24 Loops

 Raspberry Pi 3 1200 MHz
 463.4  256.0  465.9  455.0  194.9  181.3  633.1  410.3  417.9  196.2  146.2  211.4
 104.5  139.5  250.8  222.1  379.5  447.1  286.4  238.0  239.3   82.0  312.6  179.9

 Raspberry Pi 3B+ 1400 MHz
  538.9  297.5  539.8  528.6  225.5  208.6  720.6  477.9  475.9  252.1  169.7  245.2
  127.2  159.7  290.9  258.2  441.1  509.4  332.9  279.9  302.9   95.5  337.4  208.9

 Ratios         1.03 to 1.31, sverage 1.16 (Normal variations)
 3B+ 64/32 bit  1.00 to 2.83, avsrage 1.40


  


Memory Speed Benchmark below or Go To Start


Memory Speed Benchmark - memspeedPiA7, memSpdPi64, memSpdPiNEON

MemSpeed benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, normally sized 2 x 4 KB to 2 x 4 MB. Calculations are as shown in the result headings. For the first two double precision tests, speed in Million Floating Point Operations Per Second (MFLOPS) can be calculated by dividing MB/second by 8 and 16. For single precision divide by 4 and 8. There is also a version that instructs the compiler to use NEON code. The 32 bit version results are provided below, but the particular compile options used were not acceptable using a 64 bit compiler.

In this case, relative 3B/3B+ speed ratios were calculated as separate averages for tests that use L1 cache, L2 cache and RAM. The cache based measurements were, as usual, equivalent to those derived from CPU MHz, but indicate that RAM could be slightly slower.

As would be expected, the use of NEON instructions provided a performance gain, using single precision floating point (32 bit system only). The 64/32 bit ratios are provided below, for the normal MemSpeed benchmarks, indicating the highest 64 bit gains were for double precision calculations, then single precision MFLOPS/MHz ratios, were similar to that derived from the NEON benchmark, 64 bit floating point calculations benefiting from using vector instructions. See details of Assembly Code.

The first two calculations are essentially the same as those in the Linpack benchmark performance dependent daxpy function, but speed not deflated by frequent calls to a function. This increases 64 bit MFLOPS/MHz to 0.43 double precision and 0.52 single precision.


 ############################ RPi 3 32 Bit ###############################

             Memory Reading Speed Test vfpv4 32 Bit Version 1

  Memory   x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl  Int32   Dble   Sngl  Int32   Dble   Sngl  Int32
    Used    MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S

              Raspberry Pi 3 CPU 1200 MHz, SDRAM 900 MHz

       8    1619   1812   3448   2375   2237   3793   2698   3121   3147
      16    1621   1814   3459   2379   2240   3793   2710   3136   3162
      32    1577   1743   3243   2277   2132   3138   2702   3123   3131
      64    1537   1690   3126   2196   2047   3362   2566   2890   2917
     128    1570   1714   3257   2243   2076   3502   2624   2993   3027
     256    1573   1720   3285   2261   2084   3522   2652   3071   2930
     512    1453   1598   2785   2055   1906   2081   2430   2783   2815
    1024     918   1097   1327   1204   1185   1355   1606   1261   1263
    2048     891   1032   1224   1133   1113   1191    882    811    817
    4096     885   1023   1223   1127   1104   1201    787    756    755
    8192     876   1019   1225   1118    954   1203    876    871    873

 Max MFLOPS  203    454
    Per MHz 0.17   0.38


 ########################### RPi 3B+ 32 Bit ##############################

              Raspberry Pi 3B+ CPU 1400 MHz, SDRAM ?                      Avg. Gain

       8    1899   2125   4041   2783   2624   4448   3164   3693   3693  1.17 L1
      16    1901   2128   4058   2791   2628   4462   3177   3703   3707
      32    1852   2049   3817   2686   2508   4161   3186   3715   3711
      64    1796   1959   3574   2542   2367   3855   2945   3347   3347  1.16 L2
     128    1826   1989   3741   2600   2408   4031   3042   3506   3508
     256    1833   1995   3771   2617   2414   4068   2860   3616   3617
     512    1517   1618   2587   2039   1911   2687   2459   2825   2832
    1024     968   1098   1221   1172   1140   1211   1455   1144   1137  0.98 RAM 
    2048     911    980   1060   1038   1026   1062   1013    941    935
    4096     913    993   1064   1047   1038    948    992    902    903
    8192     926   1013   1077   1074   1065   1085    782    784    783

 Max MFLOPS  238    532
    Per MHz 0.17   0.38



                               More Below
 
######################### RPi 3 NEON 32 Bit ############################## Memory Reading Speed Test NEON 32 Bit Version 1 by Roy Longbottom Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S Raspberry Pi 3 CPU 1200 MHz, SDRAM 900 MHz 8 1627 2387 3467 2387 3181 3812 2713 3164 3149 16 1621 2377 3457 2377 3169 3805 2713 3164 3165 32 1577 2273 3238 2280 2985 3535 2647 3103 3105 64 1526 2150 3018 2157 2793 3256 2568 2921 2921 128 1554 2217 3190 2216 2925 3436 2631 3028 3029 256 1561 2228 3225 2221 2948 3471 2654 3077 3077 512 1434 2010 2742 1978 2534 2313 2468 2840 2840 1024 950 1227 1324 1182 1306 1339 1581 1298 1298 2048 935 1136 1215 1128 1212 1214 915 880 885 4096 913 1121 1180 1131 1213 1212 825 844 842 8192 926 1134 1212 1126 936 1199 792 774 790 Max MFLOPS 203 594 Per MHz 0.17 0.50 ######################### RPi 3B+ NEON 32 Bit ########################### Raspberry Pi 3B+ CPU 1400 MHz, SDRAM ? Avg. Gain 8 1890 2774 4027 2773 3694 4427 3130 3674 3676 1.16 L1 16 1885 2778 4037 2758 3702 4439 3155 3693 3691 32 1813 2581 3646 2590 3366 3951 3130 3586 3591 64 1808 2565 3653 2575 3370 3943 2987 3363 3366 1.17 L2 128 1790 2534 3606 2536 3334 3893 3040 3485 3485 256 1796 2538 3638 2544 3360 3914 3079 3572 3569 512 1654 2273 3163 2301 2945 3333 3010 3435 3447 1024 959 1166 1185 1165 1209 1213 1438 1141 1130 0.97 RAM 2048 918 1059 1080 1061 1088 1081 1073 890 889 4096 922 1076 1082 1069 1069 1084 1015 867 871 8192 929 1089 1091 1083 1102 1081 786 774 774 Max MFLOPS 236 695 Per MHz 0.17 0.50 ######################### RPi 3 Gentoo 64 Bit ########################## Memory Reading Speed Test armv8 64 Bit Raspberry Pi 3B CPU 1200 MHz, SDRAM 900 MHz Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 8 4161 2506 3749 5347 3393 4166 4641 3730 3731 16 4032 2506 3758 5357 3419 4162 4674 3753 3750 32 4016 2486 3721 5311 3390 4137 4673 3714 3727 64 3372 2342 3361 4232 3123 3685 4244 3522 3499 128 3352 2393 3454 4266 3189 3789 4359 3564 3563 256 3227 2398 3463 4266 3224 3769 4246 3525 3525 512 633 2010 2885 3603 2674 2457 3733 3081 3084 1024 560 889 1217 1192 1202 1011 857 1094 1095 2048 565 880 1145 1131 991 1156 844 885 788 4096 514 1092 987 1127 1134 1159 873 944 951 8192 531 887 1150 1139 1038 1162 782 799 704 Max MFLOPS 520 627 Per MHz 0.43 0.52 More Below
######################## RPi 3B+ Gentoo 64 Bit ########################## Raspberry Pi 3B+ CPU 1400 MHz, SDRAM ? Avg. Gain 8 4822 2888 4346 6190 3955 4830 5372 4324 4325 1.16 L1 16 4684 2904 4337 6197 3955 4833 5389 4340 4343 32 4471 2898 4345 6172 3951 4824 5438 4323 4347 64 3814 2630 3721 4671 3467 4052 5272 4238 4208 1.18 L2 128 3866 2727 3905 4797 3601 4257 4935 4102 4103 256 3891 2765 3975 4877 3700 4296 4901 4096 4102 512 671 2305 3252 3791 3003 3530 3638 3721 3718 1024 694 1263 1324 1320 1317 1277 1192 1482 1477 1.18 RAM 2048 645 1213 1255 1245 1132 1269 840 921 925 4096 617 1204 1122 1230 1238 1120 968 990 990 8192 658 1210 1256 1224 1101 1271 1011 1082 1084 Max MFLOPS 602 726 Per MHz 0.43 0.52 ###################### RPi 3B+ Gentoo NEON 64 Bit ######################## Compile options to use NEON instructions are not available at 64 bit working. ##################### Compare 64 bit / 32 bit Pi 3 ####################### Memory Reading Speed Test Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 8 2.57 1.38 1.09 2.25 1.52 1.10 1.72 1.20 1.19 256 2.05 1.39 1.05 1.89 1.55 1.07 1.60 1.15 1.20 8192 0.61 0.87 0.94 1.02 1.09 0.97 0.89 0.92 0.81 #################### Compare 64 bit / 32 bit Pi 3B+ ###################### 8 2.54 1.36 1.08 2.22 1.51 1.09 1.70 1.17 1.17 256 2.12 1.39 1.05 1.86 1.53 1.06 1.71 1.13 1.13 8192 0.71 1.19 1.17 1.14 1.03 1.17 1.29 1.38 1.38


NEON Memory Speed Benchmark below or Go To Start


NEON Float & Integer Benchmark - NeonSpeed, NeonSpeedPi64

This was my first benchmark produced to measure speed using NEON instructions on ARM v7 CPUs using Android. It executes some of the code used in Memory Speed Benchmark, with additional tests recoded using NEON intrinsic functions. In this case there are no double precision calculations.

Pi 3B+ CPU/Cache performance gains are again proportional to MHz, with some worse via RAM. Single precision MFLOPS per MHz increased up to 0.92 through using NEON intrinsic functions. These were compiled as different vector instructions, including the use of the Fused Multiply Accumulate variety. See details of Assembly Code.

No significant 64 bit performance gains were indicated, using these test functions, as similar instructions were generated. Some of the 32 bit functions were also somehat faster.

 
 ##################### RPi 3 32 Bit #########################

               NEON Speed Test V 1.0    

           Raspberry Pi 3 CPU 1200 MHz

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   2720   4001   3459   4225   4474   4750
      32   2598   3706   3268   3879   4091   4320
      64   2453   3389   3069   3526   3675   3859
     128   2503   3466   3178   3598   3718   3918
     256   2530   3516   3230   3649   3779   3950
     512   2221   2923   2718   2964   3104   3217
    1024   1262   1326   1317   1316   1324   1316
    4096   1170   1213   1204   1213   1210   1195
   16384   1177   1229   1218   1147   1222   1215
   65536   1181   1226   1221    916   1208   1218

Max MFLOPS  680   1000
  Per MHz  0.57   0.84 

 ##################### RPi 3B+ 32 Bit #######################

           Raspberry Pi 3B+ CPU 1400 MHz           Avg Gain

      16   3188   4690   4055   4953   5243   5570  1.17 L1
      32   3143   4578   3990   4811   5120   5431
      64   2927   4089   3693   4253   4446   4674  1.16 L2
     128   2864   3912   3588   4060   4172   4478
     256   2905   3953   3632   4119   4213   4524
     512   2255   2835   2661   2873   2922   3035
    1024   1234   1264   1263   1265   1248   1232  0.93 RAM
    4096   1099   1114   1110   1106   1091   1088
   16384   1116   1128   1116   1117   1102   1092
   65536   1113   1132   1122    837   1107   1090

Max MFLOPS  797   1173
  Per MHz  0.57   0.84 


 ################### RPi 3 Gentoo 64 Bit ####################

       NEON Speed Test armv8 64 Bit V 1.0 

           Raspberry Pi 3 CPU 1200 MHz

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   2350   4419   3415   4176   4686   4843
      32   2247   3991   3216   3831   4258   4348
      64   2161   3631   3038   3559   3886   3882
     128   2212   3744   3148   3648   3980   3966
     256   2230   3766   3171   3677   4009   3962
     512   1931   2736   2685   2663   3267   3322
    1024   1116   1116   1223   1135   1156   1213
    4096   1065   1075   1146   1040   1117   1162
   16384   1065   1072   1149    978   1106   1078
   65536   1007   1150   1076    824   1103   1137

Max MFLOPS  588   1105
  Per MHz  0.49   0.92

                               More Below
 
################## RPi 3B+ Gentoo 64 Bit ################### Raspberry Pi 3B+ CPU 1400 MHz Avg Gain 16 2724 5109 3961 4841 5446 5607 1.16 L1 32 2612 4645 3726 4450 4968 5036 64 2523 4247 3540 4150 4521 4519 1.16 L2 128 2583 4363 3666 4253 4616 4635 256 2576 4314 3674 4254 4591 4631 512 1852 2871 2608 2466 2916 2698 1024 1222 1207 1305 1179 1280 1216 1.08 RAM 4096 1157 1144 1214 1109 1181 1160 16384 1175 1245 1244 1134 1191 1180 65536 1143 1258 1185 909 1144 1260 Max MFLOPS 681 1277 Per MHz 0.49 0.91 ############### Compare 64 bit / 32 bit Pi 3 ################# Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 0.86 1.10 0.99 0.99 1.05 1.02 256 0.88 1.07 0.98 1.01 1.06 1.00 65536 0.85 0.94 0.88 0.90 0.91 0.93 ############## Compare 64 bit / 32 bit Pi 3B+ ################ 16 0.85 1.09 0.98 0.98 1.04 1.01 256 0.89 1.09 1.01 1.03 1.09 1.02 65536 1.03 1.11 1.06 1.09 1.03 1.16


Bus Speed Benchmark below or Go To Start


Bus Speed Benchmark - busspeedPiA7, busSpdPi64

This benchmark is designed to identify reading data in bursts over buses and possible maximum data transfer speed from RAM (using 1 core - see MP version). The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read.

Model 3B+ Speed gains are provided for reading all data, as usual similar to increase in MHz, with RAM speed ratio much less, but not negative. The 64 bit compiler produced unexpected slower speeds on reading all data from L1 cache, compared with addressing increment of 2 words. This leads to an indication that the 32 bit program is faster, using this test. As indicted for the MP-BusSpd Benchmark, the 64 bit compiler did not identify that vector SIMD instructions could be used.

 ##################### RPi 3 32 Bit #########################

               BusSpeed vfpv4 32b V1

           Raspberry Pi 3 CPU 1200 MHz

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   3335   3741   4075   4371   4388   4413
      32   1964   2229   2787   4271   4308   4311
      64    612    615   1121   1932   2880   3546
     128    570    573   1034   1803   2756   3467
     256    541    544    995   1758   2737   3457
     512    382    408    794   1360   2269   3105
    1024    128    136    256    533   1025   1945
    4096    109    125    245    482    961   1585
   16384    120    125    241    477    964   1744
   65536    120    125    243    477    947   1881

 ##################### RPi 3B+ 32 Bit #######################

           Raspberry Pi 3B+ CPU 1400 MHz             Gain
                                                   Read All

      16   3751   4125   4755   4965   5083   5104  1.16 L1
      32   1983   2177   2819   4258   4681   4958
      64    719    728   1333   2298   3428   4165  1.17 L2    
     128    664    666   1201   2130   3285   4084
     256    625    635   1163   2055   3197   4032
     512    329    360    702   1309   2297   3342
    1024    128    143    279    548   1061   2128  1.00 RAM
    4096    115    131    256    498    978   1694
   16384    124    130    254    489    994   1620
   65536    126    129    253    492   1003   1728


 ################### RPi 3 Gentoo 64 Bit ####################

           Raspberry Pi 3 CPU 1200 MHz

            BusSpeed armv8 64 Bit Mon

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   3312   3684   4007   4341   4390   3341
      32   2019   2158   2687   4172   4235   3294
      64    577    595   1124   1861   2836   3062
     128    546    556   1040   1754   2733   3062
     256    516    530   1000   1696   2692   3094
     512    341    272    708   1264   2099   2626
    1024     77    126    251    488    847   1860
    4096     85    115    222    446    908   1685
   16384     99    115    231    393    902   1704
   65536     98    115    229    443    810   1700

                               More Below
 
################## RPi 3B+ Gentoo 64 Bit ################### Raspberry Pi 3B+ CPU 1400 MHz Gain Read All 16 3823 4251 4638 4945 5045 3854 1.15 L1 32 1543 1677 2423 3331 4152 3680 64 672 694 1306 2169 3300 3577 1.17 L2 128 635 648 1211 2055 3202 3604 256 600 615 1163 1971 3152 3612 512 328 278 695 1272 2256 2978 1024 94 140 281 543 960 2075 1.12 RAM 4096 99 128 259 448 1016 1931 16384 125 129 258 500 898 1863 65536 125 114 257 500 1015 1898 ############### Compare 64 bit / 32 bit Pi 3 ################# Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 0.99 0.98 0.98 0.99 1.00 0.76 256 0.95 0.97 1.01 0.96 0.98 0.89 65536 0.82 0.92 0.94 0.93 0.86 0.90 ############## Compare 64 bit / 32 bit Pi 3B+ ################ 16 1.02 1.03 0.98 1.00 0.99 0.76 256 0.96 0.97 1.00 0.96 0.99 0.90 65536 0.99 0.88 1.02 1.02 1.01 1.10

FFT Benchmarks below or Go To Start


FFT Benchmarks - fft1-RPi2, fft3c-Rpi2, fft1-RPi64, FFT3c-RPi64

There are two benchmarks, FFT1, the original, and FFT3c, optimised, with 32 bit and 64 bit versions, when appropriate. Performance is measured in milliseconds, for FFTs sized 1K to 1024K, with three measurements using both single and double precision floating point data, plus some sumchecks for the largest ones.

The second of the three measurements are provided below. Note that three of the smaller FFT tests can be executed in less than a millisecond, when the CPU MHz scaling governor can produce a lower frequency (64 bit system), leading to increased running time, until the high MHz kicks in (see example below). For full speed, the scaling governor setting should be performance (sudo su echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor).

Much of the data is accessed on a skipped sequential basis, where only part of data transferred in bursts, over buses, is likely to be used. The 3C version was optimised to use more of the burst data, producing much improved performance.

Data transfer covers caches and RAM. RPi 3B+ gains are provided for each FFT size, indicating where RAM transfers apply, when the performance ratio is less than that derived from CPU MHz. Comparisons of 64/32 bit performance are also shown, with some good and some bad. Note that the processing activity is unlikely to produce absolutely consistent speeds, particularly when data size is near cache capacity or execution time is very low.


 ################### FFT V 1 32 Bit ####################

               RPi3           RPi3B+         Compare
   Size   -------- milliseconds --------     3B+ Gain
       K  Single  Double  Single  Double  Single  Double
 
       1    0.16    0.16    0.14    0.14    1.16    1.15
       2    0.37    0.42    0.34    0.34    1.09    1.24
       4    1.01    1.09    0.88    0.94    1.14    1.16
       8    2.25    2.51    1.96    2.17    1.15    1.15
      16    5.29    5.85    4.56    5.16    1.16    1.13
      32   12.57   22.48   10.48   19.52    1.20    1.15
      64   44.59  110.41   36.67  130.32    1.22    0.85
     128  217.33  269.62  239.81  314.27    0.91    0.86
     256  525.92  615.26  584.42  705.36    0.90    0.87
     512 1199.23 1364.15 1324.23 1534.86    0.91    0.89
    1024 2538.17 2831.33 2740.23 3152.95    0.93    0.90

 ################### FFT V 3C 32 Bit ###################

       1    0.20    0.16    0.17    0.14    1.16    1.19
       2    0.46    0.37    0.38    0.32    1.21    1.17
       4    1.28    0.89    1.07    0.77    1.19    1.15
       8    2.32    2.05    2.13    1.89    1.09    1.08
      16    5.36    5.98    4.57    5.83    1.17    1.03
      32   12.47   15.48   10.77   15.48    1.16    1.00
      64   31.08   36.99   29.05   37.25    1.07    0.99
     128   72.02   84.24   70.05   85.03    1.03    0.99
     256  160.48  193.81  160.68  199.34    1.00    0.97
     512  367.71  426.24  364.53  437.72    1.01    0.97
    1024  799.23  948.54  794.54  974.48    1.01    0.97

  RPi3B+ FFT3C scaling_governor ondemand

       1                    0.40    0.14
       2                    0.93    0.32
       4                    1.97    0.75
       8                    4.64    1.76
      16                    4.47    5.83 


 ################### FFT V 1 64 Bit ####################

               RPi3           RPi3B+         Compare
   Size   -------- milliseconds --------     3B+ Gain
       K  Single  Double  Single  Double  Single  Double
 
       1    0.18    0.18    0.15    0.15    1.18    1.17
       2    0.35    0.39    0.29    0.41    1.21    0.96
       4    0.87    1.64    0.79    0.99    1.10    1.65
       8    2.08    3.18    1.87    2.45    1.12    1.30
      16    4.68    7.18    3.86    5.23    1.21    1.37
      32   10.76   29.77   10.20   23.64    1.05    1.26
      64   39.65  126.03   50.50  105.28    0.79    1.20
     128  174.53  302.94  148.80  262.45    1.17    1.15
     256  408.05  700.83  352.60  603.27    1.16    1.16
     512  956.18 1543.35  836.47 1362.20    1.14    1.13
    1024 2055.48 3278.12 1841.71 2848.54    1.12    1.15

                               More Below
 
################### FFT V 3C 64 Bit ################### 1 0.18 0.19 0.14 0.18 1.30 1.05 2 0.41 0.43 0.36 0.37 1.13 1.17 4 0.80 0.99 0.70 0.86 1.15 1.15 8 2.10 2.32 2.60 1.95 0.81 1.19 16 6.22 5.66 4.66 5.05 1.33 1.12 32 10.38 15.08 9.05 13.20 1.15 1.14 64 27.59 35.71 24.54 31.38 1.12 1.14 128 71.14 81.19 56.37 72.85 1.26 1.11 256 139.33 190.07 124.65 170.73 1.12 1.11 512 321.96 428.64 295.34 385.23 1.09 1.11 1024 705.92 938.42 629.95 838.97 1.12 1.12 ############## Compare 64 bit / 32 bit ################ RPi3 RPi3B+ K Single Double Single Double FFT1 1 to 8 1.05 0.86 1.06 0.90 16 to 128 1.17 0.83 1.14 1.06 256 to 1K 1.26 0.88 1.58 1.13 FFT3C 1 to 8 1.24 0.89 1.17 0.88 16 to 128 1.05 1.04 1.15 1.17 256 to 1K 1.14 1.01 1.26 1.16


Next MultiThreading Benchmarks or Go To Start


MultiThreading Benchmarks

Most of the multithreading benchmarks execute the same calculations using 1, 2, 4 and 8 threads. One of them, MP-MFLOPS, is available in two different versions, using standard compiled “C” code for single and double precision arithmetic. A further version uses NEON intrinsic functions. Another variety uses OpenMP procedures for automatic parallelism.

On running my multithreading benchmarks, I noted unusual slow performance from certain tests. The first was the MP-Whetstone Benchmark, with independent copies of the program, using 1, 2, 4 and 8 threads. Then, the running time should not increase much using up to 4 threads, but should be just over twice as long using 8. As shown in the example below, the 4 thread test was too slow and this was particularly due to the long running COS test.

  MP-Whetstone Benchmark armv8 64 Bit Mon Jun 18 23:09:29 2018

                    Using 1, 2, 4 and 8 Threads

      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp     Fixpt      If  Equal
                 1      2      3  MOPS  MOPS   ## MOPS    MOPS   MOPS

 1T  1112.9  352.1  379.0  319.2  22.0  12.7 1641076.6  2722.5 1328.7
 2T  2250.7  717.5  767.4  656.4  44.5  25.5 2684285.1  5456.3 2652.7
 4T  2899.0 1342.3 1525.3 1048.1  42.6  46.1 1959513.0  4497.3 4319.1
 8T  3433.1 1654.1 1804.6 1106.4  55.2  47.8 2453184.1 10960.3 4994.0

   Overall Seconds   5.14 1T,   5.11 2T,   8.08 4T,  13.66 8T

  ## over optimised but always had little effect on overall MWIPS   

A (not official) 2.5 amp power supply was used and this was connected via a digital meter that measures current and voltage. During the tests, this reported constant over 5 volts and less than 1 amp. I suspected overheating and ran my RPiHeatMHz program at the same time, (See OpenGL Power Cable Tests for core volts) producing the results below and showing that the CPU MHz flipped into 600 MHz at the time of slow recorded performance. Although the temperature was not excessive. I carried out further tests with the system wrapped in bags of frozen food. The failures still occurred with recorded temperatures of less than 30°C.

 Temperature and CPU MHz Measurement

 Start at Mon Jun 18 23:09:26 2018

 Using 40 samples at 1 second intervals

 Seconds
    0.0     1400 scaling MHz,   1400 ARM MHz, temp=55.8°C
    1.0     1400 scaling MHz,   1400 ARM MHz, temp=55.8°C
    2.2     1400 scaling MHz,   1400 ARM MHz, temp=55.8°C
    3.3     1400 scaling MHz,   1400 ARM MHz, temp=56.4°C 1T
    4.5     1400 scaling MHz,   1400 ARM MHz, temp=56.9°C
    5.7     1400 scaling MHz,   1400 ARM MHz, temp=56.9°C
    6.9     1400 scaling MHz,   1400 ARM MHz, temp=56.9°C
    8.0     1400 scaling MHz,   1400 ARM MHz, temp=57.5°C 2T
    9.2     1400 scaling MHz,   1400 ARM MHz, temp=58.0°C
   10.4     1400 scaling MHz,   1400 ARM MHz, temp=58.0°C
   11.7     1400 scaling MHz,   1399 ARM MHz, temp=59.1°C
   12.9     1400 scaling MHz,   1400 ARM MHz, temp=59.1°C 4T
   14.1     1400 scaling MHz,    600 ARM MHz, temp=59.1°C
   15.4     1400 scaling MHz,    600 ARM MHz, temp=59.1°C
   16.8     1400 scaling MHz,    600 ARM MHz, temp=58.0°C
   18.3     1400 scaling MHz,   1400 ARM MHz, temp=60.1°C
   19.6     1400 scaling MHz,   1400 ARM MHz, temp=60.7°C
   20.8     1400 scaling MHz,   1400 ARM MHz, temp=61.2°C
   22.0     1400 scaling MHz,   1400 ARM MHz, temp=61.8°C 8T
   23.3     1400 scaling MHz,    600 ARM MHz, temp=60.1°C
   24.9     1400 scaling MHz,    600 ARM MHz, temp=60.1°C
   26.4     1400 scaling MHz,   1400 ARM MHz, temp=60.7°C
   27.6     1400 scaling MHz,   1400 ARM MHz, temp=61.2°C
   To
   38.8     1400 scaling MHz,   1400 ARM MHz, temp=60.1°C

Next, I tried using my official Pi 2 amp power supply and that seemed to be fine, but caused the failures when the meter was included, needing connection using a longer wire. It also failed when just the wire extension was included. Now all multithreading programs have been run to verify the results, using a directly connected official 2.5 amp power supply. Even with this, four thread performance can be inconsistent, mainly when the running time is not very long, influenced by other system activity and the programs calculating performance based on the last thread to finish. In these cases, four threads carry out the same number of instructions as a single thread, potentially reducing running time by a quarter. The Whetstone benchmark is probably the best one to identify the power drop, with each thread (of up to four) taking around 5 seconds to execute the same functions.

MP-MFLOPS Benchmarks Next or Go To Start



MP-MFLOPS Benchmarks
   MP-MFLOPSPiA7, MP-MFLOPSDP, MP-MFLOPSPi64,
   MP-MFLOPSPi64DP, MP-NeonMFLOPS, MP-NeonMFLOPS64

This uses multiply and add calculations with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words or half for double precision). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run.

This benchmark was intended to demonstrate near maximum throughput using single precision floating point calculations. It nearly did on an Intel Core i7 CPU, compiled with gcc under Linux, obtaining 23 out of 32 MFLOPS/MHz with SSE instructions (4 cores, quad word registers, linked multiply and add). The latter arrangement (I believe) also applies to the ARM Cortex-A53 where, with the same efficiency, a Raspberry Pi 3B, at 1200 MHz, would be expected to achieve 27600 MFLOPS and a 3B+ 32200 MFLOPS, at 1400 MHz. For ARM, and probably Intel, as shown below, 20 instructions could be executed at the full speed, with 12 at half speed, nearly corresponding with the 72% (23*100/32) efficiency obtained with Intel.

Single Precision and Double Precision Raspberry Pi 3B+ MFLOPS results are shown below for existing compiled 32 bit and 64 bit benchmarks and one that uses Single Precision NEON Intrinsic functions, then those from a new compilation using gcc 7. None achieve the levels of performance suggested above. Source code and benchmarks for the new MP-MFLOPS, compiled by gcc 7, are in the files mpmflops64gcc7.zip and mpmflops64gcc7.tar.gz. Speeds of later results from the OpenMP-MFLOPS Benchmark are included in the table.

Performance using one and four threads is shown, along with the gain via the latter. Note that particularly four thread performance can vary significantly, even when using a reliable power supply - See Above.

Raspberry Pi 3B+ MFLOPS at 32 Operations Per Data Word NEON 64 bit OpenMP 32 bit 64 bit 64 bit 64 bit gcc7 gcc6 gcc7 SP DP SP DP SP SP DP SP SP 1 Thread 813 798 1793 1405 2999 2800 1403 1692 2781 4 Threads 3189 3109 6981 4398 11563 10608 4492 6469 10007 4T/1T 3.92 3.90 3.89 3.13 3.86 3.79 3.20 3.82 3.60
Source and Assembly Codes for these benchmarks runs are shown below. where explanations of the differences are provided. Next are the detailed results.

3B+ to 3B performance gains are provided following the detailed results. This benchmark tends to be limited by processor speed, producing gains proportional to CPU MHz, but subject to random variations. 64 bit speed gains are also shown (before gcc 7), excluding using RAM, these being greater than 2.1 times single precision and 1.5 times double precision.


 ################# MP-MFLOPS Raspbian RPi 3B 32 Bit #################

    Raspberry Pi 3 CPU 1200 MHz, SDRAM 900 MHz

 MP-MFLOPS Linux/ARM V7A v1.0 Sun Jul 15 14:45:33 2018

    FPU Add & Multiply using 1, 2, 4 and 8 Threads

        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      184     181     172     697     697     691
 2T      367     360     339    1393    1373    1379
 4T      642     714     411    2702    2652    2650
 8T      597     689     429    2635    2623    2590
 Results x 100000
 1T    76406   97075   99969   66015   95363   99951

 ########### RPi 3 V7A2 Double Precision ############

 MP-MFLOPS Double Precision v1.0 Sun Jul 15 14:44:57 2018
 
 1T      182     183     160     684     684     670
 2T      354     361     216    1365    1342    1320
 4T      590     709     215    2695    2695    2544
 8T      609     612     219    2576    2662    2529
 Results x 100000
 1T    76384   97072   99969   66065   95370   99951

More Below or Go To Start
################# MP-MFLOPS Raspbian RPi 3B+ 32 Bit ################# MP-MFLOPS Linux/ARM V7A v1.0 Sun Jul 15 14:04:45 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 214 214 199 813 813 805 2T 420 420 367 1614 1597 1575 4T 739 837 408 3078 3189 3050 8T 741 751 427 3207 3117 3058 Results x 100000 1T 76406 97075 99969 66015 95363 99951 ############ RPi 3B+ Double Precision ############# MP-MFLOPS Double Precision v1.0 Sun Jul 15 14:03:57 2018 1T 214 213 182 798 783 783 2T 371 424 220 1595 1561 1558 4T 662 808 215 3109 3109 2805 8T 704 813 218 3087 2899 2738 Results x 100000 1T 76384 97072 99969 66065 95370 99951 ################## MP-MFLOPS Gentoo RPi 3B 64 Bit ################## MP-MFLOPS armv8 64Bit Tue Jul 17 16:01:39 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 711 679 382 1545 1534 1465 2T 1406 1293 384 3088 3054 2838 4T 2468 2554 395 4857 5546 5244 8T 2305 2182 414 5292 5822 5200 Results x 100000 1T 76406 97075 99969 66015 95363 99951 ############# RPi 3 Double Precision ############## MP-MFLOPS armv8 64Bit Double Precision Tue Jul 17 16:02:25 2018 1T 356 333 177 1211 1188 1075 2T 696 691 208 2412 2413 1926 4T 1251 1272 198 4702 4689 2367 8T 886 1292 196 4514 4267 2715 Results x 100000 1T 76384 97072 99969 66065 95370 99951 ############### MP-MFLOPS Gentoo RPi 3B 64 Bit gcc 7 ################ MP-MFLOPS armv8 64Bit gcc 7 Sun Jun 10 11:45:16 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 709 666 367 2416 2299 2070 2T 1415 1379 357 4721 4763 3670 4T 1663 2362 358 8538 5288 5459 8T 2400 2066 337 9161 7144 5857 Results x 100000 1T 76406 97075 99969 66015 95363 99951
More Below or Go To Start
############ RPi 3B Double Precision ############# MP-MFLOPS armv8 64Bit gcc7 DP Sun Jun 10 11:45:45 2018 1T 353 330 150 1206 1188 997 2T 701 687 193 2413 2368 1700 4T 1185 1054 174 4214 3256 2978 8T 1079 1035 173 3240 3541 2993 Results x 100000 1T 76384 97072 99969 66065 95370 99951 ################## MP-MFLOPS Gentoo RPi 3B+ 64 Bit ################## MP-MFLOPS armv8 64Bit Wed Apr 25 10:22:43 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 824 763 380 1793 1795 1704 2T 1620 1577 404 3584 3588 3324 4T 2093 2933 387 6981 6843 4189 8T 2481 2263 409 6851 6391 2944 Results x 100000 1T 76406 97075 99969 66015 95363 99951 ############ RPi 3B+ Double Precision ############# MP-MFLOPS armv8 64Bit Double Precision Wed Apr 25 10:22:59 2018 1T 412 394 185 1405 1380 1275 2T 815 792 200 2769 2801 2402 4T 1468 1244 186 4398 5274 3335 8T 1281 1155 195 4160 5157 3256 Results x 100000 1T 76384 97072 99969 66065 95370 99951 ############### MP-MFLOPS Gentoo RPi 3B+ 64 Bit gcc 7 ############## MP-MFLOPS armv8 64Bit gcc 7 Mon Jun 4 23:58:11 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 814 759 325 2800 2686 2432 2T 1606 1519 288 5566 5570 4011 4T 1814 2822 383 10608 10054 2928 8T 927 874 307 3533 3924 3075 Results x 100000 1T 76406 97075 99969 66015 95363 99951 ############ RPi 3B+ Double Precision ############# MP-MFLOPS armv8 64Bit Double Precision Mon Jun 4 23:54:10 2018 1T 402 383 172 1403 1378 1277 2T 812 795 183 2769 2806 2400 4T 1485 1059 166 4492 5284 3255 8T 1149 1330 171 4420 4961 3221 Results x 100000 1T 76384 97072 99969 66065 95370 99951 ############### MP-NeonMFLOPS Raspbian RPi 3B 32 Bit ############### MP-MFLOPS NEON Intrinsics v1.0 Fri Jul 13 18:35:36 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 565 590 424 1760 1723 1679 2T 962 1150 436 3510 3379 3276 4T 1820 1811 435 6869 6625 5989 8T 1645 1990 436 6583 6743 6072 Results x 100000 1T 76406 97075 99969 66014 95363 99951
More Below or Go To Start
############### MP-NeonMFLOPS Raspbian RPi 3B+ 32 Bit ############### MP-MFLOPS NEON Intrinsics v1.0 Sun Jul 15 14:06:03 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 694 686 430 2035 1960 1940 2T 1119 1358 438 4052 3888 3693 4T 1840 2638 438 6911 7827 6384 8T 1881 2540 437 7963 7089 6590 Results x 100000 1T 76406 97075 99969 66014 95363 99951 ################ MP-NeonMFLOPS Gentoo RPi 3B 64 Bit ################ MP-MFLOPS NEON Intrinsics 64 Bit Thu Mar 2 17:03:53 2017 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 695 688 386 2595 2577 2423 2T 1373 1242 375 5163 5096 4445 4T 1389 1785 371 10035 10030 6171 8T 2071 2470 339 9410 9481 6209 Results x 100000 1T 76406 97075 99969 66015 95363 99951 ################ MP-NeonMFLOPS Gentoo RPi 3B+ 64 Bit ################ MP-MFLOPS NEON Intrinsics 64 Bit Mon Jun 11 11:59:09 2018 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 788 816 391 2999 2997 2670 2T 1450 1447 414 5986 5900 5057 4T 1593 2126 364 11563 9250 6338 8T 2539 2596 370 9361 11017 5855 Results x 100000 1T 76406 97075 99969 66015 95363 99951 #################### Compare 3B+ / 3B 1 Thread ###################### 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 32 bit SP 1.16 1.18 1.16 1.17 1.17 1.16 32 bit DP 1.18 1.16 1.14 1.17 1.14 1.17 64 bit SP 1.16 1.12 0.99 1.16 1.17 1.16 64 bit DP 1.16 1.16 1.05 1.16 1.16 1.19 ########################### 64 bit/32 bit ########################### Pi3B SP 2.37 3.44 0.73 2.04 2.27 1.79 Pi3B DP 2.14 1.64 0.67 1.72 1.54 0.95 Pi3B+ SP 2.83 3.50 0.95 2.27 2.15 1.37 Pi3B+ DP 2.22 1.54 0.87 1.41 1.70 1.19


MP-Whetstone Benchmark Next or Go To Start


MP-Whetstone Benchmark - MP-WHETS, MP-WHETSPi64

Multiple threads each run the eight test functions at the same time, but with some dedicated variables. Measured speed is based on the last thread to finish, with Mutex functions, used to avoid the updating conflict by only allowing one thread at a time to access common data. Performance is generally proportional to the number of cores used. There can be some significant differences from the single CPU Whetstone benchmark results on particular tests due to a different compiler being used.

None of the test functions are suitable for SIMD operation, with the simpler instructions being used, possibly leading to some 32 bit tests being faster than those compiled for 64 bits. The Fixed Point MIPS loops are clearly over optimised but, in any case, the time taken has little influence on the overall MWIPS rating.

For both 32 and 64 bit versions, overall single core MWIPS were 17% faster on the 3B+, proportional to CPU MHz ratios. MP speed improvements can be judged by the overall running times shown, which should be similar for 1, 2 and 4 threads and double with 8 threads.

 ################# MP-Whetstone  Raspbian RPi 3B 32 Bit ################# 

  MP-Whetstone Benchmark Linux/ARM v1.0 Sun Jun 17 20:55:06 2018

      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt     If  Equal
                 1      2      3  MOPS  MOPS    MOPS   MOPS   MOPS

 1T   924.2  335.9  276.8  298.5  18.5  10.4  5817.2 1035.3  719.4
 2T  1864.8  672.5  664.3  594.1  37.3  20.7 11726.4 2386.9 1438.8
 4T  3718.4 1286.3 1303.9 1193.5  74.3  41.5 19961.4 4698.4 2862.7
 8T  3908.9 1639.8 1746.6 1274.2  75.9  43.6 29809.6 6321.5 3002.2

   Overall Seconds   5.02 1T,   4.97 2T,   5.07 4T,  10.08 8T

 ################# MP-Whetstone Raspbian RPi 3B+ 32 Bit ################# 

  MP-Whetstone Benchmark Linux/ARM v1.0 Sun Jun 17 22:56:26 2018

      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt     If  Equal
                 1      2      3  MOPS  MOPS    MOPS   MOPS   MOPS

 1T  1084.2  391.0  384.9  348.6  21.7  12.1  6967.0 1013.1  822.3
 2T  2174.4  778.3  775.7  691.9  43.5  24.2 13762.0 2787.4 1675.0
 4T  4343.8 1540.9 1558.3 1389.5  86.6  48.4 27529.5 5549.8 3338.5
 8T  4548.4 1895.1 1896.0 1504.7  88.0  51.0 39107.6 7287.7 3440.6

   Overall Seconds   5.05 1T,   5.00 2T,   5.06 4T,  10.10 8T

 ################## MP-Whetstone Gentoo RPi 3B 64 Bit ################## 

  MP-Whetstone Benchmark armv8 64 Bit Tue Jun 19 00:00:13 2018

      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp     Fixpt      If  Equal
                 1      2      3  MOPS  MOPS      MOPS    MOPS   MOPS

 1T   979.8  330.4  322.9  281.5  20.0  10.8 1368033.3  2335.5 1177.1
 2T  1986.0  623.9  659.3  564.2  40.0  22.4 2311401.6  4675.1 2355.7
 4T  3914.5 1206.2 1295.8 1122.2  78.3  44.3 3007162.3  9230.2 4636.6
 8T  4039.7 1498.5 1670.6 1170.2  79.5  45.3 1183764.2 12054.6 5082.7

   Overall Seconds   5.04 1T,   5.01 2T,   5.27 4T,  10.22 8T

 ################## MP-Whetstone Gentoo RPi 3B+ 64 Bit ##################

  MP-Whetstone Benchmark armv8 64 Bit Tue Jun 26 12:02:45 2018

      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp    Fixpt       If  Equal
                 1      2      3  MOPS  MOPS     MOPS     MOPS   MOPS

 1T  1151.6  383.0  382.7  327.6  23.2  13.0 1717931.5  2720.5 1364.5
 2T  2311.6  766.5  766.8  657.2  46.5  26.0 3478249.3  5460.9 2738.4
 4T  4579.6 1505.5 1525.7 1304.4  92.0  51.6 4647842.5 10777.1 5448.5
 8T  4788.4 1814.9 1961.4 1381.9  95.0  53.3 5689217.0 13827.3 5810.6

   Overall Seconds   4.96 1T,   4.95 2T,   5.05 4T,  10.07 8T
   

MP-Dhrystone Benchmark Next or Go To Start


MP-Dhrystone Benchmark - MP-DHRYPiA7, MP-DHRYPi64

This runs multiple copies of the whole program at the same time. Dedicated data arrays are used for each thread but there are numerous other variables that are shared. The latter can reduce performance gains via multiple threads and, in certain situations, these can be slower than using a single thread (not in this case).

Average performance gain of the 3B+ over the older 3B were. as usual, the same as the CPU MHz ratio. Single thread performance, at 64 bits, was 55% faster than at 32 bits but, in both cases, reduced to 10% via four threads.


 ################# MP-Dhrystone Raspbian RPi 3B 32 Bit ################# 

  MP-Dhrystone Benchmark Linux/ARM V7A v1.0 Sun Jun 17 20:36:41 2018

 Threads                        1        2        4        8

 Seconds                     0.78     0.92     1.27     2.52
 Dhrystones per Second    4107750  6949821 10067546 10156278
 VAX MIPS rating             2338     3956     5730     5780

         End of test Sun Jun 17 20:36:48 2018


 ################# MP-Dhrystone Raspbian RPi 3B+ 32 Bit ################# 

  MP-Dhrystone Benchmark Linux/ARM V7A v1.0 Mon Jun 18 10:05:26 2018

 Threads                        1        2        4        8

 Seconds                     0.85     0.96     1.36     2.71
 Dhrystones per Second    4732954  8293353 11799850 11823294
 VAX MIPS rating             2694     4720     6716     6729

         End of test Mon Jun 18 10:05:33 2018


 ################## MP-Dhrystone Gentoo RPi 3B 64 Bit ################## 

  MP-Dhrystone Benchmark armv8 64 Bit Tue Jun 19 00:02:49 2018

 Threads                        1        2        4        8

 Seconds                     0.63     0.79     1.45     2.86
 Dhrystones per Second    6364104 10106501 11050923 11173626
 VAX MIPS rating             3622     5752     6290     6359

         End of test Tue Jun 19 00:02:55 2018
 

 ################## MP-Dhrystone Gentoo RPi 3B+ 64 Bit ##################

  MP-Dhrystone Benchmark armv8 64 Bit Mon Jun 18 23:11:33 2018

 Threads                        1        2        4        8

 Seconds                     0.54     0.74     1.24     2.46
 Dhrystones per Second    7376153 10819564 12921258 13021546
 VAX MIPS rating             4198     6158     7354     7411

         End of test Mon Jun 18 23:11:39 2018

   


MP-Linpack Benchmark Next or Go To Start


MP-Linpack Benchmark - linpackNeonMP, linpackNeonMP64

The original Linpack benchmark for Raspberry Pi, operates on double precision floating point 100x100 matrices (N = 100). This version uses mainly the same C programming code as the single precision floating point NEON compilation. It is run run on 100x100, 500x500 and 1000x1000 matrices using 0, 1, 2 and 4 separate threads. The 0 thread procedures are identical to those in the single core 100 x 100 NEON compilation, using NEON intrinsic functions. The benchmark was produced to demonstrate that the original Linpack 100x100 code could not be converted (by me) to show increased performance using multiple threads. The official line is that users are allowed to use their own linear equation solver for this purpose. These Raspbian tests were carried out under the later Stretch release.

Performance can vary somewhat with this benchmark but reflect the usual 3B+ speed gains, at least on averaging all results. On the same basis, average 64 bit speeds are suggested as being the same as those at 32 bits, but some indicate slower performance. Similar performance could be expected as the compiled code is derived from high level NEON SIMD vector functions.

The poor performance, even using a single thread, is due to the frequent starting and stopping of threads to execute the critical calculations. Consistent threaded speed indicates shared data write back to RAM dependency. This probably increases with larger matrices as more calculations are carried out during a threaded function call.

    MFLOPS 0 to 4 Threads, N 100, 500, 1000

 ################# MP-Linpack  Raspbian RPi 3B 32 Bit ################# 

 Using NEON Intrinsics, Sun Jun 17 20:32:04 2018

 Threads      None        1        2        4

 N  100     542.22    61.00    60.67    60.74 
 N  500     480.55   311.06   316.00   303.48 
 N 1000     364.07   272.49   231.10   232.07 

 NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1

 N              100             500            1000
 NR            2.17            5.42            9.50

 RE  5.16722466e-05  6.46698638e-04  2.26586126e-03
 MA  1.19209290e-07  1.19209290e-07  1.19209290e-07
 X0 -2.38418579e-07 -5.54323196e-05 -1.26898289e-04
 XN -5.06639481e-06 -4.70876694e-06  1.41978264e-04
 
 ################# MP-Linpack Raspbian RPi 3B+ 32 Bit ################# 

 Using NEON Intrinsics, Mon Jun 18 10:00:08 2018

 Threads      None        1        2        4

 N  100     633.11    70.82    70.13    70.20 
 N  500     505.37   323.24   326.81   327.73 
 N 1000     378.29   337.34   337.01   337.80 

 SumChecks as above but note 64 bit differences - rounding effects?

 ################## MP-Linpack Gentoo RPi 3B 64 Bit ################## 

  64 Bit NEON Intrinsics, Tue Jun 19 00:04:01 2018

   MFLOPS 0 to 4 Threads, N 100, 500, 1000

 Threads      None        1        2        4

 N  100     551.48    87.43    81.66    82.68 
 N  500     359.51   258.43   242.92   255.61 
 N 1000     296.11   281.75   279.20   282.71 

 NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1

 N              100             500            1000
 NR            1.97            5.40           13.51

 RE  4.69621336e-05  6.44138840e-04  3.22485110e-03
 MA  1.19209290e-07  1.19209290e-07  1.19209290e-07
 X0 -1.31130219e-05  5.79357147e-05 -3.08930874e-04
 XN -1.30534172e-05  3.51667404e-05  1.90019608e-04

 ################## MP-Linpack Gentoo RPi 3B+ 64 Bit ##################

  64 Bit NEON Intrinsics, Mon Jun 18 23:13:26 2018

 Threads      None        1        2        4

 N  100     639.82   100.30    95.24    95.25 
 N  500     430.41   292.80   291.12   290.04 
 N 1000     349.47   313.59   312.38   313.40 

 SumChecks as above but note 32 bit differences - rounding effects?

MP-BusSpeed Benchmark Next - or Go To Start


MP-BusSpd Benchmark - MP-BusSpeed2PiA7, MP-BusSpeedPi64

This runs integer read only tests using caches and RAM, each thread accessing the same data sequentially. To start with, data is read with large address increments to demonstrate burst data transfers. Performance gains, using L1 cache, can be proportional to the number of cores, but not quite so using L2. The program is designed to produce maximum throughput over buses and demonstrates the fastest RAM speeds using multiple cores.

In the original version, each thread started reading data from the same starting point. This produced acceptable results until shared L2 caches appeared. Then it produced excessive RAM speeds, using more than one thread. With version 2, as used for the following, each thread starts reading from different addresses, providing more realistic results.

Considering just the ReadAll speeds, and MP performance variability, the usual 3B+/3B gains applied. Compared with the BusSpeed benchmark results, the 64 bit one thread performance was much slower and many old 3B cache based speeds significantly faster. In this case, disassembly code was examined to identify why. The ReadAll C code comprises a loop with 64 read statements, using AND. The 64 bit compiler produced code with 64 scalar instructions (e.g. and w3, w3, w0) and 64 loads, compared with 32 bits, with 16 four way SIMD instructions (e.g. vand q15, q15, q6), 16 vector loads, but lots of other adds (for indexing?).

At least, performance on reading data from RAM could be nearly doubled using multithreading.

 ################# MP-BusSpd Raspbian RPi 3B 32 Bit ################# 

  MP-BusSpd ARM V7A v2 Fri Jul 13 18:29:45 2018

   MB/Second Reading Data, 1, 2, 4 and 8 Threads

  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   2690   3768   3793   4081   4387   4223
      2T   5086   6856   7148   7710   8571   8159
      4T   8285  11814  13335  15091  16656  15720
      8T   6381   8690  10777  11997  14310  13789
122.9 1T    567    557   1059   1802   2804   3934
      2T    888    903   1746   3287   5379   7686
      4T    895    928   1810   3671   7205  13860
      8T    909    927   1837   3691   7049  13125
12288 1T    120    124    240    475    963   1906
      2T    135    123    245    505   1010   1978
      4T    135    132    259    467   1080   2135
      8T    126    124    255    500    973   2158

         End of test Fri Jul 13 18:29:57 2018

 ################# MP-BusSpd Raspbian RPi 3B+ 32 Bit ################# 

  MP-BusSpd ARM V7A v2 Fri Jul 13 20:18:36 2018

  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   3510   4345   4419   4731   5031   4928
      2T   6010   7992   8384   9018  10024   9648
      4T  10127  13748  15247  17581  19516  18252
      8T   7165  10780  13100  14043  16201  16504
122.9 1T    662    648   1247   2090   3246   4565
      2T   1030   1024   2047   3829   6317   8962
      4T   1040   1078   2167   4340   8380  15935
      8T   1052   1077   2122   4263   8362  15826
12288 1T    129    133    267    516   1044   2085
      2T    141    139    280    544   1115   2126
      4T    141    159    301    530   1075   2338
      8T    153    140    273    618   1190   2488

         End of test Fri Jul 13 20:18:48 2018

 ################## MP-BusSpd Gentoo RPi 3B 64 Bit ################## 

  MP-BusSpd armv8 64 Bit Tue Jun 19 00:06:12 2018

  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   1462   2407   2584   2038   1461   1492
      2T   4412   4081   4820   3867   2822   2928
      4T   6446   6019   8348   6814   5330   5346
      8T   2598   3924   6114   5788   3827   5016
122.9 1T    535    569   1016   1578   1425   1470
      2T    687    859   1708   3013   2829   2932
      4T    721    878   1829   3573   4369   5261
      8T    780    897   1827   3588   4949   5271
12288 1T     30    111    213    365    835   1024
      2T     45     65    143    337    798   1590
      4T     58     71    253    341    663   1546
      8T     47     97    147    443    904   1821

         End of test Tue Jun 19 00:06:25 2018

More Below or Go To Start
################## MP-BusSpd Gentoo RPi 3B+ 64 Bit ################## MP-BusSpd armv8 64 Bit Tue Jun 26 12:07:50 2018 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3258 2799 3065 2378 1695 1731 2T 5422 4839 5632 4506 3254 3404 4T 7974 6313 9724 8025 6234 5524 8T 5033 4680 6906 6331 5271 5780 122.9 1T 623 666 1188 1871 1657 1694 2T 1010 1023 2010 3474 3270 3408 4T 914 1044 2033 4057 6376 6781 8T 957 1053 2166 4226 5865 6250 12288 1T 115 109 241 484 921 1048 2T 54 100 227 457 780 2025 4T 59 99 186 383 868 1623 8T 67 92 230 481 736 2018 End of test Tue Jun 26 12:08:04 2018


MP-RandMem Benchmark Next or Go To Start


MP-RandMem Benchmark - MP-RandMemPiA7, MP-RandMemPi64

The benchmark has cache and RAM read only and read/write tests using sequential and random access, each thread accessing the same data but starting at different points. It uses the Mutex functions as in Whetstone above, sometimes leading to no performance gains using multiple threads. Although performance via the L1 cache, L2 cache and RAM can be different, it is normally consistent, in each of these areas, during read/write tests.

There can be a lot of variability on 4 thread/1 thread performance gains and many runs might be requires to provide accurate comparisons. On all tests, 3B+/3B performance gains were as expected for cache based results, with averages between 1.6 and 1.7, with RAM performance being similar. Read only MP gains were mainly greater than 3.5 times for cache tests, except on random access to L2, at around 2.4 times, understandably lower using a shared cache. There were also some MP increased throughput using RAM. Raspbian based results indicate slightly improved performance over those using 64 bit Gentoo.

 ################# MP-RandMem Raspbian RPi 3B 32 Bit ################# 

  MP-RandMem Linux/ARM v1.0 Sun Jul 15 10:54:39 2018 

  MB/Second Using 1, 2, 4 and 8 Threads

  KB       SerRD SerRDWR   RndRD RndRDWR

 12.3 1T    4078    3814    4018    3798
      2T    8045    3768    8043    3777
      4T   15622    3724   15625    3730
      8T   15208    3723   15020    3724
122.9 1T    3289    3393     827     891
      2T    6556    3379    1512     880
      4T   12125    3364    2078     886
      8T   12309    3364    2042     886
12288 1T    1669     878      65      64
      2T    3485     872     121      65
      4T    4296     876     146      65
      8T    2435     878     147      65

    End of test Sun Jul 15 10:55:24 2018

 ################# MP-RandMem Raspbian RPi 3B+ 32 Bit ################# 

 MP-RandMem Linux/ARM v1.0 Sun Jul 15 11:03:26 2018

  KB       SerRD SerRDWR   RndRD RndRDWR

 12.3 1T    4747    4447    4776    4435
      2T    9253    4362    9378    4362
      4T   18114    4343   18080    4322
      8T   17813    4345   17788    4321
122.9 1T    3871    3893     948    1016
      2T    7612    3954    1742    1021
      4T   14399    3929    2383    1025
      8T   14089    3935    2367    1023
12288 1T    1850     860      67      68
      2T    3670     867     126      67
      4T    4097     874     146      68
      8T    2919     873     148      68

    End of test Sun Jul 15 11:04:10 2018

 ################## MP-RandMem Gentoo RPi 3B 64 Bit ################## 

 MP-RandMem armv8 64 Bit Tue Jun 19 00:08:43 2018

  KB       SerRD SerRDWR   RndRD RndRDWR
 
 12.3 1T    4260    3071    4261    3081
      2T    7500    3054    7496    3059
      4T   15092    3018   14794    3019
      8T   14315    2977   14544    2989
122.9 1T    3385    2861     867     837
      2T    6323    2653    1543     838
      4T   10638    2873    2009     835
      8T   10810    2841    1947     834
12288 1T    1607     746      71      60
      2T    1605     696     123      59
      4T    1939     766     129      58
      8T    1682     681     141      58

    End of test Tue Jun 19 00:09:34 2018

More Below or Go To Start
################## MP-RandMem Gentoo RPi 3B+ 64 Bit ################## MP-RandMem armv8 64 Bit Tue Jun 26 12:09:25 2018 KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 4939 3573 4941 3574 2T 8730 3553 8704 3545 4T 17121 3499 17197 3498 8T 16685 3454 17097 3471 122.9 1T 3936 3347 1014 975 2T 7334 3344 1794 976 4T 12475 3333 2281 973 8T 12261 3314 2323 974 12288 1T 1921 793 77 63 2T 1896 784 113 58 4T 1621 707 126 58 8T 1302 764 106 57 End of test Tue Jun 26 12:10:13 2018


OpenMP-MemSpeed Benchmark Next or Go To Start


OpenMP-MemSpeed Benchmark - OpenMP-MemSpeed2, NotOpenMP-MemSpeed2, OpenMP-MemSpeed264, NotOpenMP-MemSpeed264

This is the same as Memory Speed Benchmark, with similar results to NotOpenMP varieties, but with measurements extending to test more memory, also using the OpenMP directive and compile parameter. The NotOpenMP tests use the same code without specifying a compilation using OpenMP. These allow comparisons of MP performance gains over the full range of memory use. There were extremely wide variations in MP performance gains, generally with improvements using data from RAM, but from caches, the best using double precision floating point via Raspbian, and worst all integer tests with a greater than 50% loss.

The usual proportional to MHz 3B+ versus 3B gains were provided, with data from caches, and RAM throughput slightly faster. 64 bit/32 bit NotOpenMP performance ratios were similar to Memory Speed Benchmark, but some were worse using OpenMP. As with some other benchmarks, a new gcc 7 compilation might provide improvement improvement, by including more efficient 64 bit instructions.

 ############## OpenMP-MemSpeed Raspbian RPi 3B 32 Bit ############## 

     Memory Reading Speed Test Not OpenMP Version 2 by Roy Longbottom

               Start of test Sun Jun 17 20:56:39 2018

  Memory   x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl  Int32   Dble   Sngl  Int32   Dble   Sngl  Int32
    Used    MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S

       4    1577   2537   3790   2360   3449   3789   2673   2694   2692
       8    1594   2547   3811   2388   3469   3812   2717   2716   2716
      16    1595   2553   3825   2393   3478   3825   2728   2728   2728
      32    1556   2435   3566   2312   3272   3566   2730   2712   2715
      64    1508   2300   3304   2177   3065   3303   2542   2485   2485
     128    1515   2305   3353   2183   3108   3356   2644   2573   2574
     256    1527   2341   3431   2226   3183   3432   2673   2615   2616
     512    1406   2083   2869   1983   2702   2873   2558   2495   2404
    1024     935   1228   1295   1194   1300   1315   1561   1360   1349
    2048     889   1091   1170   1083   1162   1167   1211   1096   1099
    4096     890   1109   1169   1089   1167   1168    911    895    903
    8192     906   1141   1188   1116   1194   1168    811    804    802
   16384     916   1159   1202   1132   1209   1206    766    761    761
   32768     928   1166   1206   1119   1224   1206    760    746    746
   65536     970   1171   1210   1140   1225   1212    811    810    808
  131072     966   1172   1207   1141   1230   1146    953    908    883

                End of test Sun Jun 17 20:57:07 2018

     Memory Reading Speed Test OpenMP Version 2 by Roy Longbottom

               Start of test Mon Jul  9 10:33:36 2018

       4    5535   2990   1372   8773   4728   1478  15869   7828   1261
       8    6068   3107   1382  10109   5056   1486  16438   8104   1258
      16    5739   3119   1317  10193   5114   1393  16624   7862   1220
      32    5689   3121   1405  10216   5150   1473  16737   8624   1302
      64    5416   3055   1303   8618   4928   1403  12254   8045   1218
     128    5396   3050   1359   9101   4932   1379   9496   8089   1249
     256    5399   3049   1361   8980   4921   1488   8361   7625   1294
     512    4418   2770   1458   6865   4226   1421   5432   5042   1130
    1024    3785   2477   1110   4361   3461   1202   1533   1573   1158
    2048    3729   2466    975   4268   3439   1200   1017   1017   1150
    4096    3714   2477   1144   4228   3370   1431    986    979   1041
    8192    3799   2368   1157   3968   3366   1484    961    950   1142
   16384    1477   2341   1079   4107   3047   1547    982    985   1037
   32768    3351   2499   1080   2089   3216   1437   1005   1001    794
   65536    3820    614   1026   3901   3078   1209   1006   1008    954
  131072     944    614    746   1160    858    765   1074   1034    566

                End of test Mon Jul  9 10:34:05 2018

More Below or Go To Start
############## OpenMP-MemSpeed Raspbian RPi 3B+ 32 Bit ############## Memory Reading Speed Test Not OpenMP Version 2 by Roy Longbottom Start of test Mon Jun 18 10:18:27 2018 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 1839 2961 4423 2755 4024 4423 3115 3143 3140 8 1860 2973 4449 2787 4047 4447 3169 3169 3170 16 1862 2978 4463 2791 4058 4462 3137 3183 3182 32 1789 2759 3988 2612 3684 3979 3188 3180 3179 64 1753 2664 3806 2526 3541 3805 3016 3028 3046 128 1776 2721 3968 2574 3683 3965 3073 2989 2989 256 1797 2768 4066 2621 3779 4068 3108 3045 3045 512 1623 2217 3289 2282 3109 3294 2987 2927 2952 1024 997 1327 1373 1299 1387 1394 1665 1349 1351 2048 942 1142 1175 1133 1171 1173 1204 1054 1050 4096 955 1172 1194 1154 1192 1190 890 886 883 8192 962 1199 1209 1179 1214 1210 802 799 799 16384 976 1216 1217 1194 1214 1216 761 760 759 32768 939 1218 1205 1192 1219 1213 909 864 864 65536 1052 1221 1216 1197 1214 1213 905 870 869 131072 1053 1227 1221 1201 1223 1220 821 800 801 End of test Mon Jun 18 10:18:54 2018 Memory Reading Speed Test OpenMP Version 2 by Roy Longbottom Start of test Mon Jul 9 10:23:59 2018 4 5851 3452 1602 10192 5484 1719 18419 9062 1469 8 6142 3447 1613 11721 5857 1721 19080 9581 1465 16 6315 3608 1568 11687 5911 1699 19295 9528 1468 32 6292 3606 1451 11704 5926 1617 19413 9486 1396 64 5795 3522 1456 10640 5718 1627 14169 9392 1397 128 5657 3542 1485 10485 5713 1588 11330 9181 1398 256 5769 3501 1395 10164 5624 1597 9837 8853 1387 512 3531 2773 1297 4798 3425 1358 2175 2131 1191 1024 3496 2742 1313 4581 3804 1337 1653 1647 1321 2048 2820 2658 1334 4437 3721 1382 1060 1056 1264 4096 3986 2755 1344 4475 3747 1535 952 1040 1258 8192 4080 2757 1023 4376 3672 1576 1008 992 1147 16384 1434 2385 1212 4418 3500 1481 965 985 1027 32768 1388 2107 1090 4044 3099 1355 941 933 1358 65536 2576 1750 1159 2296 3825 1661 943 971 1509 131072 1035 693 857 1210 918 834 983 967 881 End of test Mon Jul 9 10:24:28 2018 ############### OpenMP-MemSpeed Gentoo RPi 3B 64 Bit ############### Memory Reading Speed Test notOpenMP 64 Bit by Roy Longbottom Start of test Tue Jun 19 00:12:21 2018 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 4099 2497 4141 5239 3385 4138 4545 3689 3668 8 3935 2507 4155 5339 3399 4168 4646 3730 3732 16 3857 2506 4171 5358 3417 4155 4680 3744 3751 32 3628 2383 3847 4751 3187 3847 4580 3717 3729 64 3074 2242 3416 3956 2940 3438 4432 3635 3618 128 3042 2274 3495 3933 3008 3492 4209 3488 3498 256 1688 2091 3149 3141 2703 3146 4078 3361 3294 512 550 771 2042 2164 1454 2044 2089 2399 2192 1024 419 984 1199 979 881 1197 1106 1391 1111 2048 448 782 1035 1012 860 1034 764 1061 961 4096 453 941 905 855 1035 1041 833 840 1031 8192 490 882 1082 886 1063 876 772 807 792 16384 478 893 1109 1043 958 966 720 733 734 32768 529 1059 939 1106 975 898 695 701 495 65536 1095 986 1017 990 1104 694 682 686 625 131072 1094 1059 1101 1116 933 1133 608 680 628 End of test Tue Jun 19 00:12:49 2018
More Below or Go To Start
Memory Reading Speed Test OpenMP 64 Bit by Roy Longbottom Start of test Tue Jun 19 00:14:09 2018 4 5824 3109 1688 8228 4588 1815 11107 5046 2049 8 6064 3205 1690 9310 4858 1824 10003 5704 2052 16 3584 3228 1634 8855 4902 1742 10164 5598 1856 32 2996 3110 1601 9238 4895 1672 9022 5497 1765 64 3697 2665 1563 8723 4798 1655 10674 5331 1758 128 2947 3206 1672 8900 4841 1618 9060 5456 1768 256 2780 2934 1633 9004 4878 1794 8188 5170 1805 512 1048 2730 1750 3995 3993 1593 5896 2950 1298 1024 1114 1912 1580 3900 2488 1205 1272 1258 1025 2048 906 616 1358 3842 3171 1128 855 935 831 4096 658 904 1226 1245 3142 1380 696 756 570 8192 577 965 874 3917 3096 1405 868 742 813 16384 477 2140 914 2886 3074 1003 742 841 896 32768 605 792 1562 1143 2709 1019 899 861 591 65536 1312 555 1163 3974 803 1181 908 833 780 131072 890 572 708 980 819 707 1132 1192 450 End of test Tue Jun 19 00:14:41 2018 ############### OpenMP-MemSpeed Gentoo RPi 3B+ 64 Bit ############### Memory Reading Speed Test notOpenMP 64 Bit by Roy Longbottom Start of test Mon Jun 18 23:32:35 2018 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 4 4749 2896 4773 6096 3935 4800 5266 4270 4261 8 4625 2906 4828 6167 3957 4828 5384 4307 4318 16 4624 2902 4840 6203 3954 4838 5422 4347 4332 32 4263 2770 4480 5530 3706 4474 5385 4221 4228 64 3857 2704 4232 4873 3589 4202 5002 4121 4122 128 3899 2760 4371 4906 3720 4373 5030 4079 4111 256 3445 2738 4357 4747 3700 4342 4970 4082 3973 512 760 2098 3027 2577 2271 3004 3830 3325 2786 1024 604 990 1172 1165 1308 1343 979 869 1217 2048 607 940 1160 975 1089 1177 916 887 971 4096 532 908 1017 1155 1156 924 967 1032 1036 8192 579 812 1203 1180 1031 1200 749 861 864 16384 559 1171 1026 1196 1200 1070 727 804 687 32768 498 1107 1064 1197 1202 1135 799 649 763 65536 1176 1060 1165 1177 1203 1230 588 959 963 131072 984 1177 1232 1101 1207 1235 938 723 1047 End of test Mon Jun 18 23:33:03 2018 Memory Reading Speed Test OpenMP 64 Bit by Roy Longbottom Start of test Mon Jun 18 23:36:51 2018 4 6782 3616 1934 9575 4858 2114 12917 6574 2216 8 7385 3728 1976 8941 5652 2107 13290 6637 2392 16 7099 3769 1945 11092 5751 2105 11593 2798 993 32 1492 1311 781 4563 2391 836 4156 2736 930 64 1432 1264 801 4343 2345 852 3740 2531 914 128 4476 3711 1812 9677 5654 2070 4091 2665 846 256 1085 1563 833 4399 1497 882 3969 2671 894 512 700 615 603 3265 1367 2042 4460 4861 1210 1024 3947 1881 1701 4342 3626 1630 1235 1174 1049 2048 873 1383 1866 1161 3587 1754 915 788 718 4096 670 810 989 1521 3601 1348 944 927 798 8192 808 879 1280 4222 2073 2031 945 927 968 16384 593 1368 1459 4072 1181 1464 959 718 893 32768 777 1994 1471 1223 1269 1800 958 945 790 65536 1736 1275 1271 1006 1249 1787 869 980 1293 131072 988 696 808 1163 928 754 735 1273 562 End of test Mon Jun 18 23:37:24 2018


OpenMP-MFLOPS Benchmark Next or Go To Start


OpenMP-MFLOPS Benchmark - OpenMP-MFLOPS, notOpenMP-MFLOPS, OpenMP-MFLOPS64, notOpenMP-MFLOPS64

This benchmark carries out the same calculations as the MP-MFLOPS Benchmarks but, in addition, calculations with eight operations per data word. As with OpenMP-MemSpeed, the single core notOpenMP and full OpenMP versions are compiled from the same code and carry out identical numbers of floating point calculations. For some unknown reason, the 32 bit Raspbian versions produce different numeric results at 32 operations per word (see First Results sumcheck).

There are variabilities in measured speeds, but the usual 3B+/3B performance ratios can be assumed. Multiprocessor performance gains were disappointing with the 32 bit Raspbian version, but up to scratch at 64 bits, via Gentoo. The latter benchmark was recompiled using gcc 7 to produce similar best case performance as MP-MFLOPS - see table above. For these particular benchmarks, the only real 64 bit/32 bit gains are on using 32 operations per word (at between 2.3 to 3.0 times).

The gcc 7 versions, OpenMP-MFLOPS64G7 and notOpenMP-MFLOPS64G7, can be downloaded in ompmflops7.tar.gz.


 ############## OpenMP-MFLOPS Raspbian RPi 3B 32 Bit ############## 

            Not OpenMP MFLOPS Benchmark 1 Sun Jun 17 20:58:49 2018

  Test             4 Byte  Ops/   Repeat    Seconds   MFLOPS       First   All
                    Words  Word   Passes                         Results  Same


 Data in & out     100000     2     2500   0.763554      655    0.929538   Yes
 Data in & out    1000000     2      250   1.206237      415    0.992550   Yes
 Data in & out   10000000     2       25   1.134379      441    0.999250   Yes

 Data in & out     100000     8     2500   1.161077     1723    0.957126   Yes
 Data in & out    1000000     8      250   1.453741     1376    0.995524   Yes
 Data in & out   10000000     8       25   1.435932     1393    0.999550   Yes

 Data in & out     100000    32     2500   5.024988     1592    0.890268   Yes
 Data in & out    1000000    32      250   5.158612     1551    0.988078   Yes
 Data in & out   10000000    32       25   5.275346     1516    0.998806   Yes

                End of test Sun Jun 17 20:59:12 2018

            OpenMP MFLOPS Benchmark 1 Sun Jun 17 21:02:32 2018


 Data in & out     100000     2     2500   0.277303     1803    0.929538   Yes
 Data in & out    1000000     2      250   1.183362      423    0.992550   Yes
 Data in & out   10000000     2       25   1.138538      439    0.999250   Yes

 Data in & out     100000     8     2500   0.445954     4485    0.957126   Yes
 Data in & out    1000000     8      250   1.299288     1539    0.995524   Yes
 Data in & out   10000000     8       25   1.407459     1421    0.999550   Yes

 Data in & out     100000    32     2500   4.305910     1858    0.890232   Yes
 Data in & out    1000000    32      250   3.822810     2093    0.988068   Yes
 Data in & out   10000000    32       25   3.757323     2129    0.998785   Yes

                End of test Sun Jun 17 21:02:51 2018


 ############## OpenMP-MFLOPS Raspbian RPi 3B+ 32 Bit ############## 

            Not OpenMP MFLOPS Benchmark 1 Mon Jun 18 10:20:24 2018

  Test             4 Byte  Ops/   Repeat    Seconds   MFLOPS       First   All
                    Words  Word   Passes                         Results  Same

 Data in & out     100000     2     2500   0.682055      733    0.929538   Yes
 Data in & out    1000000     2      250   1.200001      417    0.992550   Yes
 Data in & out   10000000     2       25   1.120259      446    0.999250   Yes

 Data in & out     100000     8     2500   0.997494     2005    0.957126   Yes
 Data in & out    1000000     8      250   1.314719     1521    0.995524   Yes
 Data in & out   10000000     8       25   1.262752     1584    0.999550   Yes

 Data in & out     100000    32     2500   4.307349     1857    0.890268   Yes
 Data in & out    1000000    32      250   4.438297     1802    0.988078   Yes
 Data in & out   10000000    32       25   4.432952     1805    0.998806   Yes

                End of test Mon Jun 18 10:20:44 2018

More Below or Go To Start
OpenMP MFLOPS Benchmark 1 Mon Jun 18 10:30:42 2018 Data in & out 100000 2 2500 0.230493 2169 0.929538 Yes Data in & out 1000000 2 250 1.210853 413 0.992550 Yes Data in & out 10000000 2 25 1.158906 431 0.999250 Yes Data in & out 100000 8 2500 0.394372 5071 0.957126 Yes Data in & out 1000000 8 250 1.251015 1599 0.995524 Yes Data in & out 10000000 8 25 1.199460 1667 0.999550 Yes Data in & out 100000 32 2500 3.305651 2420 0.890232 Yes Data in & out 1000000 32 250 3.356769 2383 0.988068 Yes Data in & out 10000000 32 25 3.325095 2406 0.998785 Yes End of test Mon Jun 18 10:30:58 2018 ############### OpenMP-MFLOPS Gentoo RPi 3B 64 Bit ############### notOpenMP MFLOPS64 Tue Jun 19 00:21:01 2018 Test 4 Byte Ops/ Repeat Seconds MFLOPS First All Words Word Passes Results Same Data in & out 100000 2 2500 0.797313 627 0.929538 Yes Data in & out 1000000 2 250 1.412364 354 0.992550 Yes Data in & out 10000000 2 25 1.317844 379 0.999250 Yes Data in & out 100000 8 2500 1.232307 1623 0.957117 Yes Data in & out 1000000 8 250 1.658661 1206 0.995518 Yes Data in & out 10000000 8 25 1.585769 1261 0.999549 Yes Data in & out 100000 32 2500 5.476343 1461 0.890215 Yes Data in & out 1000000 32 250 5.663824 1412 0.988088 Yes Data in & out 10000000 32 25 5.664788 1412 0.998796 Yes End of test Tue Jun 19 00:21:26 2018 OpenMP MFLOPS64 Tue Jun 19 00:22:01 2018 Data in & out 100000 2 2500 0.256622 1948 0.929538 Yes Data in & out 1000000 2 250 1.451293 345 0.992550 Yes Data in & out 10000000 2 25 1.290990 387 0.999250 Yes Data in & out 100000 8 2500 0.350070 5713 0.957117 Yes Data in & out 1000000 8 250 1.409139 1419 0.995518 Yes Data in & out 10000000 8 25 1.279655 1563 0.999549 Yes Data in & out 100000 32 2500 1.456351 5493 0.890215 Yes Data in & out 1000000 32 250 1.617333 4946 0.988088 Yes Data in & out 10000000 32 25 1.596874 5010 0.998796 Yes End of test Tue Jun 19 00:22:12 2018 ############### OpenMP-MFLOPS Gentoo RPi 3B+ 64 Bit ############### notOpenMP MFLOPS64 Mon Jun 18 23:35:18 2018 Test 4 Byte Ops/ Repeat Seconds MFLOPS First All Words Word Passes Results Same Data in & out 100000 2 2500 0.811203 616 0.929538 Yes Data in & out 1000000 2 250 1.277145 391 0.992550 Yes Data in & out 10000000 2 25 1.203417 415 0.999250 Yes Data in & out 100000 8 2500 1.057153 1892 0.957117 Yes Data in & out 1000000 8 250 1.426572 1402 0.995518 Yes Data in & out 10000000 8 25 1.384547 1445 0.999549 Yes Data in & out 100000 32 2500 4.729263 1692 0.890215 Yes Data in & out 1000000 32 250 4.933636 1622 0.988088 Yes Data in & out 10000000 32 25 4.928863 1623 0.998796 Yes End of test Mon Jun 18 23:35:40 2018
More Below or Go To Start
OpenMP MFLOPS64 Mon Jun 18 23:39:10 2018 Data in & out 100000 2 2500 0.242835 2059 0.929538 Yes Data in & out 1000000 2 250 1.249640 400 0.992550 Yes Data in & out 10000000 2 25 1.167199 428 0.999250 Yes Data in & out 100000 8 2500 0.307482 6504 0.957117 Yes Data in & out 1000000 8 250 1.251838 1598 0.995518 Yes Data in & out 10000000 8 25 1.157598 1728 0.999549 Yes Data in & out 100000 32 2500 1.236653 6469 0.890215 Yes Data in & out 1000000 32 250 1.404484 5696 0.988088 Yes Data in & out 10000000 32 25 1.357588 5893 0.998796 Yes End of test Mon Jun 18 23:39:20 2018 ############ OpenMP-MFLOPS Gentoo RPi 3B 64 Bit gcc 7 ############ notOpenMP MFLOPS64 GCC7 Mon Jul 9 11:40:27 2018 Test 4 Byte Ops/ Repeat Seconds MFLOPS First All Words Word Passes Results Same Data in & out 100000 2 2500 0.760105 658 0.929538 Yes Data in & out 1000000 2 250 1.343421 372 0.992550 Yes Data in & out 10000000 2 25 1.275814 392 0.999250 Yes Data in & out 100000 8 2500 1.209384 1654 0.957117 Yes Data in & out 1000000 8 250 1.584924 1262 0.995518 Yes Data in & out 10000000 8 25 1.570061 1274 0.999549 Yes Data in & out 100000 32 2500 3.340260 2395 0.890215 Yes Data in & out 1000000 32 250 3.687632 2169 0.988088 Yes Data in & out 10000000 32 25 3.684894 2171 0.998796 Yes End of test Mon Jul 9 11:40:46 2018 OpenMP MFLOPS64 GCC7 Mon Jul 9 11:41:32 2018 Data in & out 100000 2 2500 0.238010 2101 0.929538 Yes Data in & out 1000000 2 250 1.350017 370 0.992550 Yes Data in & out 10000000 2 25 1.259776 397 0.999250 Yes Data in & out 100000 8 2500 0.339368 5893 0.957117 Yes Data in & out 1000000 8 250 1.353498 1478 0.995518 Yes Data in & out 10000000 8 25 1.275593 1568 0.999549 Yes Data in & out 100000 32 2500 0.917181 8722 0.890215 Yes Data in & out 1000000 32 250 1.406178 5689 0.988088 Yes Data in & out 10000000 32 25 1.296176 6172 0.998796 Yes End of test Mon Jul 9 11:41:42 2018
More Below or Go To Start
############ OpenMP-MFLOPS Gentoo RPi 3B+ 64 Bit gcc 7 ############ notOpenMP MFLOPS64 GCC7 Mon Jul 9 11:28:16 2018 Test 4 Byte Ops/ Repeat Seconds MFLOPS First All Words Word Passes Results Same Data in & out 100000 2 2500 0.646220 774 0.929538 Yes Data in & out 1000000 2 250 1.199940 417 0.992550 Yes Data in & out 10000000 2 25 1.158499 432 0.999250 Yes Data in & out 100000 8 2500 1.049060 1906 0.957117 Yes Data in & out 1000000 8 250 1.403440 1425 0.995518 Yes Data in & out 10000000 8 25 1.375663 1454 0.999549 Yes Data in & out 100000 32 2500 2.876464 2781 0.890215 Yes Data in & out 1000000 32 250 3.176167 2519 0.988088 Yes Data in & out 10000000 32 25 3.171387 2523 0.998796 Yes End of test Mon Jul 9 11:28:33 2018 OpenMP MFLOPS64 GCC7 Mon Jul 9 11:25:46 2018 Data in & out 100000 2 2500 0.250953 1992 0.929538 Yes Data in & out 1000000 2 250 1.185075 422 0.992550 Yes Data in & out 10000000 2 25 1.194995 418 0.999250 Yes Data in & out 100000 8 2500 0.332016 6024 0.957117 Yes Data in & out 1000000 8 250 1.182464 1691 0.995518 Yes Data in & out 10000000 8 25 1.157733 1728 0.999549 Yes Data in & out 100000 32 2500 0.799423 10007 0.890215 Yes Data in & out 1000000 32 250 1.260969 6344 0.988088 Yes Data in & out 10000000 32 25 1.196507 6686 0.998796 Yes End of test Mon Jul 9 11:25:55 2018


Java Benchmark Next or Go To Start


Java Benchmarks

As you probably know, Java programs can run via any Operating System, assuming that a compatible Java RunTime Environment (JRE) is available. The JRE translates a general purpose .class file into hardware dependent computer instructions. The .class files are produced using the javac command from a Java Development Kit (JDK) and these can be run via suitable Operating Systems. In this case, Java Whetstone and JavaDraw benchmarks are considered.


Java Whetstone Benchmark - whetstc.class

Details of the benchmark are provided above, and include results from the Java version.


JavaDraw Benchmark - JavaDrawPi.class, JavaDrawPC.class

The benchmark uses small to rather excessive simple objects to measure drawing performance in Frames Per Second (FPS). Five tests draw on a background of continuously changing colour shades, each test adding to the load. Two class files are provided, one produced on a PC, using javac 1.6, and the other on a Raspberry Pi, through javac 1.7. In this case, both have produced similar execution speeds. However, the latter can vary significantly using different Java RTEs, where comparisons can be inappropriate.

As can be seen in the results, the Gentoo 64 bit versions are much slower than those using 32 bit Raspbian, probably a current driver issue. Different drivers and hardware might have also lead to unlike 3B+/3B comparisons, averaging 1.08 times using Raspbian and 1.27 via Gentoo.


   Produced by javac 1.7.0_02, run with java 1.8.0_65
   Operating System    Linux, Arch. arm, Version 4.14.34-v7+
         
 ################# JavaDraw Raspbian RPi 3B 32 Bit ################# 

  Test                              Frames      FPS

  Display PNG Bitmap Twice Pass 1      522    52.19
  Display PNG Bitmap Twice Pass 2      617    61.66
  Plus 2 SweepGradient Circles         627    62.64
  Plus 200 Random Small Circles        603    60.22
  Plus 320 Long Lines                  425    42.44
  Plus 4000 Random Small Circles       306    30.54

         Total Elapsed Time  60.1 seconds

 ################# JavaDraw Raspbian RPi 3B+ 32 Bit ################# 

  Display PNG Bitmap Twice Pass 1      570    56.91
  Display PNG Bitmap Twice Pass 2      663    66.25
  Plus 2 SweepGradient Circles         673    67.29
  Plus 200 Random Small Circles        664    66.38
  Plus 320 Long Lines                  450    44.97
  Plus 4000 Random Small Circles       336    33.51

         Total Elapsed Time  60.1 seconds


 Produced by javac 1.7.0_02, run with java 1.8.0_161
 Operating System Linux, Arch. aarch64, Version 4.14.44-v8-4fca48b7612d-bis+

 ################## JavaDraw Gentoo RPi 3B 64 Bit ##################

  Display PNG Bitmap Twice Pass 1      326    32.59
  Display PNG Bitmap Twice Pass 2      529    52.88
  Plus 2 SweepGradient Circles         500    49.97
  Plus 200 Random Small Circles        306    30.55
  Plus 320 Long Lines                   92     9.18
  Plus 4000 Random Small Circles        45     4.46

         Total Elapsed Time  60.2 seconds

 ################## JavaDraw Gentoo RPi 3B+ 64 Bit ##################

  Display PNG Bitmap Twice Pass 1      391    39.05
  Display PNG Bitmap Twice Pass 2      592    59.18
  Plus 2 SweepGradient Circles         538    53.75
  Plus 200 Random Small Circles        378    37.78
  Plus 320 Long Lines                  167    16.67
  Plus 4000 Random Small Circles        53     5.29

         Total Elapsed Time  60.1 seconds
   

OpenGL GLUT Benchmark Next or Go To Start


OpenGL GLUT Benchmark - videogl32, videogl64

The first four tests portray moving up and down a tunnel including various independently moving objects, with and without texturing. The last two tests, represent a real application for designing kitchens. The first is in wireframe format, drawn with 23,000 straight lines. The second has colours and textures applied to the surfaces. Note, in 2012, I approved a request from a Quality Engineer at Canonical, to use this OpenGL benchmark in the testing framework of the Unity desktop software, probably as the individual tests can be run for extended periods as Stress Tests, at different screen/window sizes. In this case, the benchmark was run via a script file to show performance using the latter, including a command to turn off VSYNC to allow measured performance to exceed 60 FPS.

The first tests tend to be limited by graphics hardware speed where 3B+/3B comparisons are less than the CPU MHz ratio, with the kitchen tests approaching this 16.7% improvement. Although probably affected be different drivers, 64/32 bit comparisons suggest similar graphics speeds but 64 bit CPU instructions indicated performance gains of more than 30% on the textured kitchen.

Example Script File

export vblank_mode=0
./videogl32 Width 320, Height 240, NoEnd
./videogl32 Width 640, Height 480, NoHeading, NoEnd
./videogl32 Width 1024, Height 768, NoHeading, NoEnd
./videogl32 NoHeading

NoEnd prevents logging of configuration. Last command uses default resolution. 

 ################# OpenGL GLUT Raspbian RPi 3B 32 Bit ################# 

 GLUT OpenGL Benchmark 32 Bit Version 1, Fri Jul 27 11:56:04 2018

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    327.8    191.9     81.6     51.3     21.1     13.4
   640   480    245.1    161.1     75.1     48.5     21.0     13.5
  1024   768    110.8    102.0     63.8     45.1     21.1     13.4
  1920  1080     49.4     47.4     37.0     32.9     20.7     13.2

                   End at Fri Jul 27 11:58:18 2018

 ################# OpenGL GLUT Raspbian RPi 3B+ 32 Bit ################# 

 GLUT OpenGL Benchmark 32 Bit Version 1, Fri Jul 27 11:44:59 2018

   320   240    343.2    199.7     88.7     56.6     23.7     15.2
   640   480    241.0    168.2     79.9     52.5     23.8     15.1
  1024   768    110.5    101.7     63.8     47.1     24.2     15.4
  1920  1080     49.7     47.4     36.9     32.8     23.8     15.2

                   End at Fri Jul 27 11:47:13 2018

 ################## OpenGL GLUT Gentoo RPi 3B 64 Bit ##################

 GLUT OpenGL Benchmark 64 Bit Version 1, Tue Jul 17 19:26:36 2018

   160   120    382.3    214.5    118.7     72.3     24.9     18.5
   320   240    328.3    199.7    108.9     69.6     24.9     18.4
   640   480    220.4    162.2     89.7     62.3     24.9     18.4
  1024   768    104.1     96.5     61.1     49.9     24.5     17.9
  1920  1080     50.1     47.4     36.6     32.6     23.8     17.7

                   End at Tue Jul 17 19:29:26 2018

 ################## OpenGL GLUT Gentoo RPi 3B+ 64 Bit ##################

 GLUT OpenGL Benchmark 64 Bit Version 1, Fri Jul 27 11:28:58 2018

   160   120    427.2    239.7    132.6     81.3     28.6     21.2
   320   240    365.7    224.1    121.5     77.5     28.9     21.3
   640   480    247.0    181.6     98.6     68.3     28.5     20.9
  1024   768    116.6    107.0     68.5     56.0     28.2     20.7
  1920  1080     53.8     51.9     40.3     36.1     27.8     20.5

                   End at Fri Jul 27 11:31:47 2018
   

I/O Benchmark Next or Go To Start


I/O Benchmarks

Two varieties of I/O benchmarks are provided, one to measure performance of main and USB drives, and the other for LAN and WiFi network connections. The Raspberry Pi programs write and reads three files at two sizes (defaults 8 and 16 MB), followed by random reading and writing of 1KB blocks out of 4. 8 and 16 MB and finally, writing and reading 200 small files, sized 4, 8 and 16 KB. Run time parameters are provided for the size of large files and file path. The same program code is used for both varieties, the only difference being file opening properties. The drive benchmark includes extra options to use direct I/O, avoiding data caching in main memory, but includes an extra test with caching allowed.


DriveSpeed Benchmark - DriveSpeed, DriveSpeed64

The first results below access a SanDisk Ultra microSDHC card using Raspbian. With this combination, performance using the Pi 3B+ might be judged as slightly faster (or no different). Next are results from running the Raspbian benchmark on a USB 3 flash drive, faster on large files, but slower on small ones. The df command results, that identify the file path, and run command are also shown.

64 Bit Benchmark - As with earlier attempts to run DriveSpeed64, it failed, providing error reports. This appears to be due to the “do not cache” open file options, as proved by running LanSpeed64 on local drives. Results of the latter are provided below, showing high speed cached data transfers, particularly using default large file sizes. However, large file writing and reading speeds can be measured by specifying much larger files that are too large to be cached in RAM. See the run command and results on another SanDisk Ultra microSDHC card, showing similar data transfer speeds as the other one, but using 0.5 and 1 Mbyte files. Results from random access and small file tests are not influenced by the large file parameter.

 
 ################# DriveSpeed Raspbian RPi 3B 32 Bit ################# 

   DriveSpeed RasPi 1.1 Mon Jul 30 14:37:47 2018
 
 Current Directory Path: /home/pi/benchmarks/DriveSpeed
 Total MB   14845, Free MB   10483, Used MB    4363

                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

   8    18.80    18.81    11.18    23.36    23.45    23.45
  16     8.62    11.26    10.62    23.42    23.49    23.51
 Cached
   8   264.48   261.59   272.90   707.81   599.52   753.99

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.323    0.311    0.288     2.56     1.63     1.57

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      2.36     3.19     2.42     5.96    11.02    12.59
 ms/file     1.74     2.57     6.77     0.69     0.74     1.30    0.024

                End of test Mon Jul 30 14:38:18 2018

 ################# DriveSpeed Raspbian RPi 3B+ 32 Bit ################# 

   DriveSpeed RasPi 1.1 Mon Jul 30 14:53:47 2018
 
   8    19.14     6.37    10.66    23.38    23.53    23.61
  16    10.47    10.63    12.90    23.52    23.27    23.60
 Cached
   8   226.44   303.78   299.19   547.29   865.25   921.83

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.356    0.401    0.322     1.62     8.13     1.54

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      2.38     3.01     2.53     8.56     7.86    12.74
 ms/file     1.72     2.72     6.48     0.48     1.04     1.29    0.012

                End of test Mon Jul 30 14:54:19 2018

More Below or Go To Start
############### USB 3 DriveSpeed Raspbian RPi 3B+ 32 Bit ############### df command Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda3 19682912 21792 19661120 1% /media/pi/7CB9-D119 Run command ./DriveSpeed FilePath /media/pi/7CB9-D119 DriveSpeed RasPi 1.1 Mon Jul 30 14:56:56 2018 Selected File Path: /media/pi/7CB9-D119/ Total MB 19222, Free MB 19200, Used MB 21 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 17.14 12.09 16.63 37.33 39.62 39.49 16 16.76 12.13 16.34 38.97 39.64 39.63 Cached 8 55.00 55.95 23.97 883.81 878.40 826.31 Random Read Write From MB 4 8 16 4 8 16 msecs 0.688 0.685 0.682 2.01 2.05 2.12 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.54 0.88 1.69 5.14 9.67 15.83 ms/file 7.55 9.34 9.67 0.80 0.85 1.04 0.016 End of test Mon Jul 30 14:57:28 2018 ################## DriveSpeed Gentoo RPi 3B+ 64 Bit ################## LanSpeed RasPi 64 Bit 1.0 Mon Jul 30 16:49:15 2018 Current Directory Path: /home/demouser/benchmarks/driveLANGENTOO Total MB 28414, Free MB 19958, Used MB 8456 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 124.15 162.19 180.79 664.66 683.68 383.51 16 193.74 210.62 184.05 401.49 511.03 456.56 Random Read Write From MB 4 8 16 4 8 16 msecs 0.003 0.003 0.003 4.03 7.15 3.82 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 41.07 39.81 67.50 255.58 243.20 411.56 ms/file 0.10 0.21 0.24 0.02 0.03 0.04 0.019 Using Large File Command ./LanSpeed64 MB 512 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 512 16.64 15.55 14.77 19.82 23.54 23.53 1024 17.50 16.24 16.09 20.39 23.54 23.54


LAN Benchmark Next or Go To Start


LAN Benchmark - LanSpeed, LanSpeed64

As indicated above, LanSpeed writes and reads three files at two sizes (defaults 8 and 16 MB), followed by random reading and writing of 1KB blocks out of 4. 8 and 16 MB and finally, writing and reading 200 small files, sized 4, 8 and 16 KB. There are 32 bit and 64 bit versions for the Raspberry Pi, larger varieties for Linux based PCs and a Windows EXE file that can be executed from a remote copy.

Note, mount statements are required, whereby the benchmarks are run as local programs on the Raspberry Pi and other Linux based systems. Also, Samba was installed to connect the Pi to a Windows Workgroup, in order to run the Intel EXE based benchmark.

The 32 bit benchmarks were run under Raspbian (Stretch) and 64 bit varieties via Gentoo, with a Raspberry Pi 3B+ communicating with a PC running Windows 7 and a dual booted Windows 10/Linux Ubuntu system, also using the older model 3B to Windows 7 to provide comparisons.

Below is a summary of all of the test results that generally provide best case examples. Even so, it is clear that wide variations in performance make it difficult to provide accurate comparisons. Just dealing with the Pi based programs, note the slow random writing speeds to Windows 10. Considering 3B+ to 3B comparisons to Windows 7, later detailed results include some for reading and writing 512 MB files, where the 3B+ is indicated as being 3.3 to 3.4 times faster on reading with 2.2 times improvement on writing. There appears to be some gain on random writing and 200 short file tests, but not much with the smaller data sizes.

                       16 MB Files MBytes/Second
         Write1   Write2   Write3    Read1    Read2    Read3
 Raspbian
 3B >W7   11.42    11.44    11.44    11.67    11.67    11.67
 3B+>W7   35.46    36.18    36.22    25.79    25.74    25.59
 3B+>W10  35.55    35.95    36.00    26.95    26.95    27.65
 3B+>Ubu  34.58    34.46    34.54    27.19    27.29    27.28
 W7 >3B+  25.67    25.34    16.71    11.49     8.77     7.09
 W10>3B+  25.02    16.97    16.93    11.44     8.66     6.97
 Ubu>3B+  27.52    27.49    27.59    38.91    39.01    39.08        

 Gentoo
 3B >W7   11.22    11.30    11.30    11.63    11.61    11.56
 3B+>W7   33.73    35.52    35.19    24.70    22.54    23.39
 3B+>W10  33.58    35.30    35.42    13.70    26.50     9.31
 3B+>Ubu  33.67    34.73    34.74    20.64    26.90    27.66
 W7 >3B+  25.17    23.77    23.82    14.48    10.39     8.05
 W10>3B+  17.17    25.31    25.26    14.90    10.56     8.15
 Ubu>3B+  21.62    29.01    17.14    39.37    39.62    39.55

 Random         Read milliseconds          Write milliseconds
 From MB        4        8       16        4        8       16

 Raspbian
 3B >W7     0.014    0.685    0.829     1.49     1.22     1.35
 3B+>W7     0.005    0.659    0.857     0.85     0.91     0.99
 3B+>W10    0.005    0.660    1.118    11.79    12.84    14.38
 3B+>Ubu    0.005    0.019    0.456     0.49     0.49     0.49
 W7 >3B+    0.338    0.335    0.330     0.422    0.422    0.404
 W10>3B+    0.471    0.457    0.385     0.474    0.463    0.488
 Ubu>3B+    0.49     0.50     0.50

 Gentoo
 3B >W7     0.022    0.746    0.894     1.58     1.43     1.47
 3B+>W7     0.024    0.864    0.706     1.09     1.06     1.04
 3B+>W10    0.013    0.556    0.775    23.98    15.66    30.23
 3B+>Ubu    0.006    0.067    0.552     0.52     0.52     0.52
 W7 >3B+    0.617    0.613    0.507    0.694    0.651    0.687
 W10>3B+    0.518    0.505    0.499    0.589    0.622    0.609
 Ubu>3B+    0.87     0.63     0.61   

 200 Files      Write ms/file              Read ms/file          Delete
 File KB        4        8       16        4        8       16     secs

 Raspbian
 3B >W7      4.60     4.42     6.08     2.61     3.22     5.20    0.547
 3B+>W7      3.30     3.38     3.68     2.29     2.03     2.71    0.385
 3B+>W10     4.49     4.41     4.81     2.15     2.34     2.54    0.274
 3B+>Ubu     5.00     5.02     5.33     2.41     2.72     4.19    0.311
 W7 >3B+     4.83     4.95     6.01     2.55     2.57     2.95    0.831
 W10>3B+     4.74     5.07     5.96     3.20     2.53     3.03    0.841
 Ubu>3B+     4.12     5.02     4.96     2.41     2,53     2.83    1.479

 Gentoo
 3B >W7      4.78     5.05     6.36     3.07     3.68     6.34    0.833
 3B+>W7      3.44     3.88     4.30     2.48     2.16     2.47    0.254
 3B+>W10     4.11     4.71     6.81     1.77     2.09     2.84    0.415
 3B+>Ubu     5.37     5.56     5.62     3.16     4.32     4.43    0.317
 W7 >3B+     4.49     5.37     5.76     2.76     2.73     3.24    0.812
 W10>3B+     5.28     5.60     6.21     3.67     3.21     3.52    0.849
 Ubu>3B+     3.45     3.69     4.05     2.72     2.72     3.09    1.299
  
Detailed Results Next or Go To Start
############### Raspbian 32 Bit Detailed Results ################ Raspberry Pi 3B To Windows 7 PC LanSpeed RasPi 1.0 Tue May 22 11:43:53 2018 Selected File Path: /media/public/ray/ Total MB 266240, Free MB 90549, Used MB 175691 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 11.32 11.31 11.38 8.19 11.39 11.61 16 11.42 11.44 11.44 11.67 11.67 11.67 Random Read Write From MB 4 8 16 4 8 16 msecs 0.014 0.685 0.829 1.49 1.22 1.35 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.89 1.85 2.69 1.57 2.55 3.15 ms/file 4.60 4.42 6.08 2.61 3.22 5.20 0.547 End of test Tue May 22 11:44:31 2018 ================= Raspberry Pi 3B+ To Windows 7 PC ================= LanSpeed RasPi 1.0 Tue May 22 10:36:18 2018 Selected File Path: /media/public/ray/ Total MB 266240, Free MB 90548, Used MB 175692 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 34.04 8.24 35.16 6.72 26.62 26.68 16 35.46 36.18 36.22 25.79 25.74 25.59 Random Read Write From MB 4 8 16 4 8 16 msecs 0.005 0.659 0.857 0.85 0.91 0.99 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.24 2.42 4.45 1.79 4.04 6.05 ms/file 3.30 3.38 3.68 2.29 2.03 2.71 0.385 End of test Tue May 22 10:36:46 2018 ========================= 512 MB Files Raspbian ======================== MBytes/Second Average Gain Rpi Write1 Write2 Write3 Read1 Read2 Read3 Write Read 3B 11.72 11.72 10.82 11.60 11.62 11.71 3B+ 38.77 38.79 38.67 25.69 25.66 25.55 3.39 2.20 ============== Raspberry Pi 3B+ To Windows 10 Core i7 PC ============== LanSpeed RasPi 1.0 Wed May 23 09:47:34 2018 Selected File Path: /media/public/ Total MB 346679, Free MB 298782, Used MB 47897 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 33.45 35.57 35.47 3.01 26.49 26.78 16 35.55 35.95 36.00 26.95 26.95 27.65 Random Read Write From MB 4 8 16 4 8 16 msecs 0.005 0.660 1.118 11.79 12.84 14.38 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.91 1.86 3.40 1.91 3.51 6.44 ms/file 4.49 4.41 4.81 2.15 2.34 2.54 0.274 End of test Wed May 23 09:48:03 2018
More Below or Go To Start
============= Raspberry Pi 3B+ To Ubuntu Same Core i7 PC ============= LanSpeed RasPi 1.0 Wed May 23 09:59:57 2018 Selected File Path: /media/public/ Total MB 446040, Free MB 369385, Used MB 76655 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 34.49 34.64 34.27 27.62 27.61 27.76 16 34.58 34.46 34.54 27.19 27.29 27.28 Random Read Write From MB 4 8 16 4 8 16 msecs 0.005 0.019 0.456 0.49 0.49 0.49 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.82 1.63 3.07 1.70 3.01 3.91 ms/file 5.00 5.02 5.33 2.41 2.72 4.19 0.311 End of test Wed May 23 10:00:24 2018 #################### From Remote PC to Raspbian ###################### Windows 7 PC to Raspberry Pi 3B+ CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00100F42 AMD Phenom(tm) II X4 945 Processor Measured 3013 MHz Windows NT Version 6.1, build 7601, Service Pack 1 Total MB 14845, Free MB 9886, Used MB 4960 LanSpeed Windows 32-Bit Version 1.0, Tue May 22 12:29:16 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 26.20 26.21 25.68 15.02 10.67 8.27 16 25.67 25.34 16.71 11.49 8.77 7.09 Random Read Write From MB 4 8 16 4 8 16 msecs 0.447 0.435 0.445 0.485 0.488 0.489 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.85 1.65 2.73 1.60 3.19 5.55 ms/file 4.83 4.95 6.01 2.55 2.57 2.95 0.831 End of test Tue May 22 12:29:49 2018 ============== Windows 10 Core i7 PC to Raspberry Pi 3B+ ============= CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4 Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz Measured 3711 MHz Windows NT Version 6.2, build 9200, Total MB 14845, Free MB 9885, Used MB 4960 LanSpeed Windows 32-Bit Version 1.0, Fri May 25 11:30:05 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 5.29 12.15 24.53 14.39 10.25 7.95 16 25.02 16.97 16.93 11.44 8.66 6.97 Random Read Write From MB 4 8 16 4 8 16 msecs 0.338 0.335 0.330 0.422 0.422 0.404 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.86 1.61 2.75 1.28 3.24 5.41 ms/file 4.74 5.07 5.96 3.20 2.53 3.03 0.841 End of test Fri May 25 11:30:36 2018
More Below or Go To Start
============ Ubuntu Same Core i7 PC to Raspberry Pi 3B+ =========== CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4 Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz Selected File Path: /media/public/benchmarks/ Total MB 14845, Free MB 10552, Used MB 4293 Linux LAN/WiFi Speed Test 32-Bit Version 1.1, Wed May 23 11:06:39 2018 Copyright (C) Roy Longbottom 2011 8 MB File 1 2 3 4 5 Writing MB/sec 27.59 27.20 27.23 27.16 27.22 Reading MB/sec 38.51 38.82 38.95 38.81 38.71 16 MB File 1 2 3 4 5 Writing MB/sec 27.52 27.49 27.59 27.34 27.35 Reading MB/sec 38.91 39.01 39.08 39.18 39.16 32 MB File 1 2 3 4 5 Writing MB/sec 27.73 27.64 27.53 27.81 3.88 Reading MB/sec 39.14 39.20 39.32 39.34 39.33 --------------------------------------------------------------------- 8 MB Cached File 1 2 3 4 5 Writing MB/sec 27.18 27.19 27.28 27.26 27.27 Reading MB/sec 38.70 38.87 38.85 38.81 38.77 --------------------------------------------------------------------- Bus Speed Block KB 64 128 256 512 1024 Reading MB/sec 21315.19 20780.54 17696.24 15950.70 16038.91 --------------------------------------------------------------------- 1 KB Reads File MB > 2 4 8 16 32 64 128 Random Read msecs 0.50 0.49 0.50 0.50 0.52 0.50 0.50 --------------------------------------------------------------------- 500 Files Write Read Delete File KB MB/sec ms/File MB/sec ms/File Seconds 2 0.49 4.20 0.87 2.35 1.420 4 0.99 4.12 1.70 2.41 1.381 8 1.63 5.02 3.24 2.53 1.444 16 3.30 4.96 5.79 2.83 1.479 32 5.38 6.09 9.67 3.39 2.806 64 9.50 6.90 14.87 4.41 1.479 End of test Wed May 23 11:08:26 2018 ########################### Gentoo 64 bit ############################ Raspberry Pi 3B To Windows 7 PC LanSpeed RasPi 64 Bit 1.0 Wed May 23 12:45:48 2018 Selected File Path: /media/public/ray/ Total MB 266240, Free MB 90549, Used MB 175691 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 11.01 11.29 11.23 8.19 11.44 11.48 16 11.22 11.30 11.30 11.63 11.61 11.56 Random Read Write From MB 4 8 16 4 8 16 msecs 0.022 0.746 0.894 1.58 1.43 1.47 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.86 1.62 2.58 1.34 2.23 2.58 ms/file 4.78 5.05 6.36 3.07 3.68 6.34 0.833 End of test Wed May 23 12:46:27 2018
More Below or Go To Start
================== Raspberry Pi 3B+ To Windows 7 PC ================== LanSpeed RasPi 64 Bit 1.0 Wed May 23 17:30:12 2018 Selected File Path: /media/public/ Total MB 266240, Free MB 90548, Used MB 175692 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 29.76 34.98 35.11 9.04 23.03 26.44 16 33.73 35.52 35.19 24.70 22.54 23.39 Random Read Write From MB 4 8 16 4 8 16 msecs 0.024 0.864 0.706 1.09 1.06 1.04 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.19 2.11 3.81 1.65 3.79 6.63 ms/file 3.44 3.88 4.30 2.48 2.16 2.47 0.254 End of test Wed May 23 17:30:40 2018 ========================== 512 MB Files Gentoo ========================= MBytes/Second Average Gain Rpi Write1 Write2 Write3 Read1 Read2 Read3 Write Read 3B 11.73 11.72 11.72 11.68 11.65 11.70 3B+ 38.84 38.78 38.71 24.98 26.01 26.47 3.31 2.21 ============= Raspberry Pi 3B+ To Windows 10 Core i7 PC ============ LanSpeed RasPi 64 Bit 1.0 Wed May 23 16:32:55 2018 Selected File Path: /media/public/ Total MB 346679, Free MB 298782, Used MB 47897 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 32.10 35.15 34.98 2.49 25.38 7.13 16 33.58 35.30 35.42 13.70 26.50 9.31 Random Read Write From MB 4 8 16 4 8 16 msecs 0.013 0.556 0.775 23.98 15.66 30.23 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 1.00 1.74 2.40 2.31 3.92 5.78 ms/file 4.11 4.71 6.81 1.77 2.09 2.84 0.415 End of test Wed May 23 16:33:30 2018 ============= Raspberry Pi 3B+ To Ubuntu Same Core i7 PC ============ LanSpeed RasPi 64 Bit 1.0 Wed May 23 17:48:32 2018 Selected File Path: /media/public/ Total MB 446040, Free MB 369385, Used MB 76655 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 15.34 34.49 34.35 28.15 25.81 27.69 16 33.67 34.73 34.74 20.64 26.90 27.66 Random Read Write From MB 4 8 16 4 8 16 msecs 0.006 0.067 0.552 0.52 0.52 0.52 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.76 1.47 2.92 1.30 1.89 3.70 ms/file 5.37 5.56 5.62 3.16 4.32 4.43 0.317 End of test Wed May 23 17:49:01 2018
More Below or Go To Start
###################### From Remote PC to Gentoo ###################### Windows 7 PC to Raspberry Pi 3B+ CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00100F42 AMD Phenom(tm) II X4 945 Processor Measured 3013 MHz Windows NT Version 6.1, build 7601, Service Pack 1 Current Drive Details Total MB 28414, Free MB 19561, Used MB 8853 LanSpeed Windows 32-Bit Version 1.0, Thu May 24 12:23:40 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 13.30 25.68 25.36 14.91 10.61 8.23 16 25.17 23.77 23.82 14.48 10.39 8.05 Random Read Write From MB 4 8 16 4 8 16 msecs 0.617 0.613 0.507 0.694 0.651 0.687 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.91 1.52 2.84 1.48 3.00 5.05 ms/file 4.49 5.37 5.76 2.76 2.73 3.24 0.812 End of test Thu May 24 12:24:10 2018 ============= Windows 10 Core i7 PC to Raspberry Pi 3B+ ============= CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4 Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz Measured 3711 MHz Windows NT Version 6.2, build 9200, Current Drive Details Total MB 28414, Free MB 19593, Used MB 8821 LanSpeed Windows 32-Bit Version 1.0, Thu May 24 11:07:33 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 3.92 24.47 25.34 14.60 10.38 7.81 16 17.17 25.31 25.26 14.90 10.56 8.15 Random Read Write From MB 4 8 16 4 8 16 msecs 0.518 0.505 0.499 0.589 0.622 0.609 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.78 1.46 2.64 1.12 2.56 4.66 ms/file 5.28 5.60 6.21 3.67 3.21 3.52 0.849 End of test Thu May 24 11:08:05 2018
More Below or Go To Start
============= Ubuntu Same Core i7 PC to Raspberry Pi 3B+ ============ Selected File Path: /media/public/benchmarks/ Total MB 28414, Free MB 20523, Used MB 7892 Linux LAN/WiFi Speed Test 64-Bit Version 1.1, Wed May 23 18:42:55 2018 Copyright (C) Roy Longbottom 2011 8 MB File 1 2 3 4 5 Writing MB/sec 29.09 17.24 29.09 16.92 28.89 Reading MB/sec 38.83 38.69 39.35 39.39 39.46 16 MB File 1 2 3 4 5 Writing MB/sec 21.62 29.01 17.14 21.58 7.57 Reading MB/sec 39.37 39.62 39.55 39.61 39.56 32 MB File 1 2 3 4 5 Writing MB/sec 24.96 29.04 24.73 15.63 15.19 Reading MB/sec 39.52 39.73 33.69 39.78 39.81 --------------------------------------------------------------------- 8 MB Cached File 1 2 3 4 5 Writing MB/sec 17.18 29.09 28.09 17.10 28.53 Reading MB/sec 38.92 39.37 39.34 39.45 39.16 --------------------------------------------------------------------- Bus Speed Block KB 64 128 256 512 1024 Reading MB/sec 21712.48 22474.03 18532.30 16844.04 16924.12 --------------------------------------------------------------------- 1 KB Reads File MB > 2 4 8 16 32 64 128 Random Read msecs 0.78 0.87 0.63 0.61 0.56 0.55 0.55 --------------------------------------------------------------------- 500 Files Write Read Delete File KB MB/sec ms/File MB/sec ms/File Seconds 2 0.61 3.36 0.76 2.71 1.268 4 1.19 3.45 1.51 2.72 1.282 8 2.22 3.69 3.01 2.72 1.290 16 4.04 4.05 5.30 3.09 1.299 32 3.15 10.40 9.44 3.47 1.296 64 10.81 6.06 14.16 4.63 1.378 End of test Wed May 23 18:44:39 2018


WiFi Next or Go To Start


WiFi Benchmark

These is the same programs as the LAN Benchmarks and run on the same systems, but with the LAN cable disconnected. Running the Pi based benchmark via WiFi, on the Raspberry Pi 3B+, was initially tried under Gentoo, where performance appeared to be more typical of that using a 2.4 GHz hub, or results from the older Pi model 3B. Then I found that my BT Smart Hub had 2.4 and 5 GHz capabilities, with the latter being identified on one of my mobile phones. I moved the Pi closer to the hub, but that made no difference. Then I discovered that the WiFi sometimes needed to be disconnected and reconnected to enable high speed operation. As shown below, there were wide variations is performance, using the Pi and remote systems as host, with some being slower than expectations. At this stage, WiFi no longer worked using my Raspbian Operating System, where results were needed to confirm or contradict the findings.

Numerous suggestions for enabling WiFi were tried, without success. Then, after installation of Raspbian-Stretch, WiFi came alive. As shown in the following results summary, some of the peculiarities also occurred, using Raspbian on the 3B+, including slow reading large files from Windows based programs and slow random writing to Windows 10 but Gentoo failed to achieve 5 GHz type speeds on a number of different tests.

The Pi based benchmark programs were also run on the older model 3B, with relative 3B+ performance gains shown below. Best 3B+/3B performance gains are shown in the detailed results, using larger files and Raspbian, where 3.01 times was shown for writing and 1.60 times fro reading. Performance of random and small file tests was quite similar using Raspbian and Gentoo, but note the highlighted Widows 10 performance in the W10 to 3B+ direction. Then, with the large file tests. the only similarity was 3B+ send (writing to all remote PCs and reading from Ubuntu). In other cases, Gentoo was much slower than the old model 3B on reading. Windows to Raspbian results also suffered from the same problem.
                       16 MB Files MBytes/Second               Average Gain
         Write1   Write2   Write3    Read1    Read2    Read3   Write   Read

 Raspbian
 3B >W7    4.96     5.07     5.06     5.23     6.76     6.63
 3B+>W7   11.34    13.82    14.14     8.98     9.97     9.77    2.60   1.54
 3B+>W10  11.24    13.78    14.19     8.67    10.22     8.57
 3B+>Ubu  11.31    13.30    13.62    10.72    13.25     8.93
 W7 >3B+   9.27     7.34    10.53     5.51     3.92     3.04
 W10>3B+  13.84    13.94    13.77     6.77     4.53     3.41
 Ubu>3B+  12.51    11.27    11.88    14.55    15.53    15.64

 Gentoo
 3B >W7    4.98     5.04     5.11     6.35     6.28     6.11
 3B+>W7   11.54    11.59    11.78     4.27     4.10     4.16    2.31   0.67
 3B+>W10  10.04    11.00    11.38     3.77     4.19     3.67
 3B+>Ubu   9.94    11.04    11.31     4.21     3.78     4.18
 W7 >3B+   3.91     3.97     3.98     2.75     2.13     1.74
 W10>3B+   4.08     4.06     2.57     2.07     1.76     1.53
 Ubu>3B+   4.19     4.20     4.22    10.24    11.51    11.68

 Random         Read milliseconds          Write milliseconds   Average Gain
 From MB        4        8       16        4        8       16   Read  Write

 Raspbian
 3B >W7     3.360    3.445    3.654     3.64     3.39     3.35   
 3B+>W7     2.275    2.755    2.782     2.93     2.68     2.76   1.34	1.24
 3B+>W10   10.197    6.838    2.785    20.95    18.85    16.46
 3B+>Ubu    2.429    2.778    2.829     1.39     1.39     1.39
 W7 >3B+    1.375    1.344    1.329     1.570    1.561    1.539
 W10>3B+    1.275    1.262    1.271     1.526    1.495    1.510
 Ubu>3B+    2.11     2.12     2.12 

 Gentoo
 3B >W7     3.194    3.472    3.750     3.83     3.59     3.61
 3B+>W7     2.824    2.884    2.964     3.11     2.87     2.90   1.20   1.24
 3B+>W10    2.779    2.768    2.740    20.67    20.91    20.38
 3B+>Ubu    2.991    3.160    3.385     1.67     1.61     1.62
 W7 >3B+    1.487    1.421    1.435     1.860    1.779    1.799
 W10>3B+    1.518    1.458    1.400     2.072    2.236    1.980
 Ubu>3B+    2.29     2.29     2.31   

 200 Files      Write ms/file              Read ms/file          Delete  Average Gain
 File KB        4        8       16        4        8       16     secs   Read  Write

 Raspbian
 3B >W7     13.80    15.00    18.23    12.20    16.37    14.66    2.616
 3B+>W7     10.36    11.25    11.22     9.91    10.64    11.11    1.910   1.43   1.37
 3B+>W10    11.18    11.91    13.13    10.03    10.52    11.22    1.259
 3B+>Ubu    13.46    13.46    14.05    24.57    11.88    12.19    1.509
 W7 >3B+    12.76    13.15    13.93     4.73     5.83     6.39    1.969
 W10>3B+    21.54    20.53    23.41     6.04     7.48     7.99    2.735
 Ubu>3B+     9.96    10.59    14.38     6.46     7.37     7.68    2.603

 Gentoo
 3B >W7     14.21    17.01    18.07    12.79    15.85    14.83    2.530
 3B+>W7     10.86    11.88    13.69    11.11    12.51    14.31    2.139   1.35   1.15
 3B+>W10    11.92    12.94    13.30    10.82    12.00    13.92    2.055
 3B+>Ubu    13.98    14.25    15.05    12.75    13.37    15.74    1.711
 W7 >3B+    14.17    14.91    17.20     5.36     6.35     7.71    2.442
 W10>3B+    35.64    27.95    29.61     7.09     7.78     8.88    3.572
 Ubu>3B+     9.75    11.00    12.77     7.16     7.91     8.87    2.516
  
Detailed Results Next or Go To Start
############### Raspbian 32 Bit Detailed Results ################ Raspberry Pi 3B To Windows 7 PC LanSpeed RasPi 1.0 Mon May 28 10:07:39 2018 Selected File Path: /media/public/ray/ Total MB 266240, Free MB 90548, Used MB 175692 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 5.08 5.05 5.04 6.13 6.04 6.01 16 4.96 5.07 5.06 5.23 6.76 6.63 Random Read Write From MB 4 8 16 4 8 16 msecs 3.360 3.445 3.654 3.64 3.39 3.35 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.30 0.55 0.90 0.34 0.50 1.12 ms/file 13.80 15.00 18.23 12.20 16.37 14.66 2.616 End of test Mon May 28 10:08:51 2018 ================= Raspberry Pi 3B+ To Windows 7 PC ================= LanSpeed RasPi 1.0 Mon May 28 10:33:49 2018 Selected File Path: /media/public/ray/ Total MB 266240, Free MB 90547, Used MB 175693 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 11.95 13.23 13.66 9.21 10.59 8.91 16 11.34 13.82 14.14 8.98 9.97 9.77 Random Read Write From MB 4 8 16 4 8 16 msecs 2.275 2.755 2.782 2.93 2.68 2.76 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.40 0.73 1.46 0.41 0.77 1.47 ms/file 10.36 11.25 11.22 9.91 10.64 11.11 1.910 End of test Mon May 28 10:34:39 2018 ================= Raspbian Files 128 MB 3B, 512 MB 3B+ ================= MBytes/Second Average Gain Rpi Write1 Write2 Write3 Read1 Read2 Read3 Write Read 3B 4.62 4.88 4.73 5.71 6.14 5.94 3B+ 14.35 12.93 15.56 9.58 9.42 9.48 3.01 1.60 ============== Raspberry Pi 3B+ To Windows 10 Core i7 PC ============== LanSpeed RasPi 1.0 Tue May 29 10:27:33 2018 Selected File Path: /media/public/ Total MB 346679, Free MB 298782, Used MB 47897 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 11.84 13.09 13.71 9.57 9.33 11.48 16 11.24 13.78 14.19 8.67 10.22 8.57 Random Read Write From MB 4 8 16 4 8 16 msecs 10.197 6.838 2.785 20.95 18.85 16.46 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.37 0.69 1.25 0.41 0.78 1.46 ms/file 11.18 11.91 13.13 10.03 10.52 11.22 1.259 End of test Tue May 29 10:28:24 2018
More Below or Go To Start
============= Raspberry Pi 3B+ To Ubuntu Same Core i7 PC ============= LanSpeed RasPi 1.0 Tue May 29 11:00:24 2018 Selected File Path: /media/public/ Total MB 446040, Free MB 369312, Used MB 76728 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 12.39 12.81 13.06 11.63 8.20 9.75 16 11.31 13.30 13.62 10.72 13.25 8.93 Random Read Write From MB 4 8 16 4 8 16 msecs 2.429 2.778 2.829 1.39 1.39 1.39 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.30 0.61 1.17 0.17 0.69 1.34 ms/file 13.46 13.46 14.05 24.57 11.88 12.19 1.509 End of test Tue May 29 11:01:18 2018 #################### From Remote PC to Raspbian ###################### Windows 7 PC to Raspberry Pi 3B+ Current Drive Details Total MB 14845, Free MB 9892, Used MB 4953 LanSpeed Windows 32-Bit Version 1.0, Tue May 29 11:24:52 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 10.13 10.00 6.91 4.11 3.10 2.51 16 9.27 7.34 10.53 5.51 3.92 3.04 Random Read Write From MB 4 8 16 4 8 16 msecs 1.375 1.344 1.329 1.570 1.561 1.539 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.32 0.62 1.18 0.87 1.40 2.57 ms/file 12.76 13.15 13.93 4.73 5.83 6.39 1.969 End of test Tue May 29 11:25:41 2018 ============== Windows 10 Core i7 PC to Raspberry Pi 3B+ ============= Current Drive Details Total MB 14845, Free MB 9892, Used MB 4953 LanSpeed Windows 32-Bit Version 1.0, Tue May 29 11:30:50 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 3.93 13.26 13.58 5.76 3.94 3.04 16 13.84 13.94 13.77 6.77 4.53 3.41 Random Read Write From MB 4 8 16 4 8 16 msecs 1.275 1.262 1.271 1.526 1.495 1.510 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.19 0.40 0.70 0.68 1.10 2.05 ms/file 21.54 20.53 23.41 6.04 7.48 7.99 2.735 End of test Tue May 29 11:31:45 2018
More Below or Go To Start
============ Ubuntu Same Core i7 PC to Raspberry Pi 3B+ =========== Selected File Path: /media/public/benchmarks/ Total MB 14845, Free MB 10558, Used MB 4287 Linux LAN/WiFi Speed Test 64-Bit Version 1.1, Tue May 29 11:41:37 2018 Copyright (C) Roy Longbottom 2011 8 MB File 1 2 3 4 5 Writing MB/sec 11.58 11.05 9.64 9.03 10.88 Reading MB/sec 10.49 13.18 13.83 14.34 14.87 16 MB File 1 2 3 4 5 Writing MB/sec 12.53 9.54 8.26 4.38 8.92 Reading MB/sec 13.56 14.95 15.09 15.10 14.99 32 MB File 1 2 3 4 5 Writing MB/sec 12.51 11.27 11.88 12.18 10.04 Reading MB/sec 14.55 15.53 15.64 15.64 15.69 --------------------------------------------------------------------- 8 MB Cached File 1 2 3 4 5 Writing MB/sec 14.98 14.39 8.67 13.08 10.69 Reading MB/sec 10.29 13.50 14.16 14.48 14.73 --------------------------------------------------------------------- Bus Speed Block KB 64 128 256 512 1024 Reading MB/sec 20418.32 20311.32 17283.30 15721.35 15757.56 --------------------------------------------------------------------- 1 KB Reads File MB > 2 4 8 16 32 64 128 Random Read msecs 2.09 2.11 2.12 2.12 2.14 2.12 2.09 --------------------------------------------------------------------- 500 Files Write Read Delete File KB MB/sec ms/File MB/sec ms/File Seconds 2 0.21 9.84 0.34 6.08 2.632 4 0.41 9.96 0.63 6.46 2.671 8 0.77 10.59 1.11 7.37 2.521 16 1.14 14.38 2.13 7.68 2.603 32 2.54 12.89 3.42 9.59 2.717 64 4.35 15.07 5.41 12.12 2.734 End of test Tue May 29 11:44:44 2018 ########################### Gentoo 64 bit ############################ Raspberry Pi 3B To Windows 7 PC LanSpeed RasPi 64 Bit 1.0 Mon May 28 11:22:59 2018 Selected File Path: /media/public/ray/ Total MB 266240, Free MB 90548, Used MB 175692 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 4.94 5.06 5.05 6.02 6.31 6.29 16 4.98 5.04 5.11 6.35 6.28 6.11 Random Read Write From MB 4 8 16 4 8 16 msecs 3.194 3.472 3.750 3.83 3.59 3.61 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.29 0.48 0.91 0.32 0.52 1.10 ms/file 14.21 17.01 18.07 12.79 15.85 14.83 2.530 End of test Mon May 28 11:24:12 2018
More Below or Go To Start
================= Raspberry Pi 3B+ To Windows 7 PC ================= LanSpeed RasPi 64 Bit 1.0 Mon May 28 11:47:10 2018 Selected File Path: /media/public/ray/ Total MB 266240, Free MB 90547, Used MB 175693 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 10.42 4.94 11.34 4.27 4.05 4.01 16 11.54 11.59 11.78 4.27 4.10 4.16 Random Read Write From MB 4 8 16 4 8 16 msecs 2.824 2.884 2.964 3.11 2.87 2.90 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.38 0.69 1.20 0.37 0.66 1.14 ms/file 10.86 11.88 13.69 11.11 12.51 14.31 2.139 End of test Mon May 28 11:48:15 2018 ============= Raspberry Pi 3B+ To Windows 10 Core i7 PC ============ LanSpeed RasPi 64 Bit 1.0 Wed May 30 11:50:52 2018 Selected File Path: /media/public/ray/ Total MB 346679, Free MB 298782, Used MB 47897 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 9.92 11.47 10.15 4.08 3.95 3.98 16 10.04 11.00 11.38 3.77 4.19 3.67 Random Read Write From MB 4 8 16 4 8 16 msecs 2.779 2.768 2.740 20.67 20.91 20.38 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.34 0.63 1.23 0.38 0.68 1.18 ms/file 11.92 12.94 13.30 10.82 12.00 13.92 2.055 End of test Wed May 30 11:51:58 2018 ============= Raspberry Pi 3B+ To Ubuntu Same Core i7 PC ============ LanSpeed RasPi 64 Bit 1.0 Mon May 28 12:21:33 2018 Selected File Path: /media/public/ Total MB 446040, Free MB 369282, Used MB 76758 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 10.23 10.89 10.51 3.92 4.20 4.16 16 9.94 11.04 11.31 4.21 3.78 4.18 Random Read Write From MB 4 8 16 4 8 16 msecs 2.991 3.160 3.385 1.67 1.61 1.62 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.29 0.57 1.09 0.32 0.61 1.04 ms/file 13.98 14.25 15.05 12.75 13.37 15.74 1.711 End of test Mon May 28 12:22:39 2018
More Below or Go To Start
###################### From Remote PC to Gentoo ###################### Windows 7 PC to Raspberry Pi 3B+ Current Drive Details Total MB 28414, Free MB 18979, Used MB 9435 LanSpeed Windows 32-Bit Version 1.0, Wed May 30 11:45:22 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 4.09 4.08 4.02 2.72 2.09 1.72 16 3.91 3.97 3.98 2.75 2.13 1.74 Random Read Write From MB 4 8 16 4 8 16 msecs 1.487 1.421 1.435 1.860 1.779 1.799 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.29 0.55 0.95 0.76 1.29 2.13 ms/file 14.17 14.91 17.20 5.36 6.35 7.71 2.442 End of test Wed May 30 11:46:31 2018 ============== Windows 10 Core i7 PC to Raspberry Pi 3B+ ============= Current Drive Details Total MB 28414, Free MB 18979, Used MB 9435 LanSpeed Windows 32-Bit Version 1.0, Wed May 30 11:01:59 2018 MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 4.23 4.20 4.10 2.82 2.23 1.83 16 4.08 4.06 2.57 2.07 1.76 1.53 Random Read Write From MB 4 8 16 4 8 16 msecs 1.518 1.458 1.400 2.072 2.236 1.980 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.11 0.29 0.55 0.58 1.05 1.84 ms/file 35.64 27.95 29.61 7.09 7.78 8.88 3.572 End of test Wed May 30 11:03:20 2018 ============= Ubuntu Same Core i7 PC to Raspberry Pi 3B+ ============ Selected Path /media/public/benchmarks/ MB 28414, Free 20522, Used 7892 Linux LAN/WiFi Speed Test 64-Bit Version 1.1, Mon May 28 12:27:59 2018 8 MB File 1 2 3 4 5 Writing MB/sec 3.96 4.08 4.21 3.36 4.13 Reading MB/sec 9.43 11.46 11.51 11.49 11.68 16 MB File 1 2 3 4 5 Writing MB/sec 4.19 4.20 4.22 4.19 4.13 Reading MB/sec 10.24 11.51 11.68 11.92 11.88 32 MB File 1 2 3 4 5 Writing MB/sec 4.19 4.16 4.14 3.46 4.18 Reading MB/sec 10.96 11.59 11.87 12.16 12.64 --------------------------------------------------------------------- 8 MB Cached File 1 2 3 4 5 Writing MB/sec 4.14 4.11 4.11 4.16 4.11 Reading MB/sec 11.21 11.60 11.81 11.69 11.77 --------------------------------------------------------------------- Bus Speed Block KB 64 128 256 512 1024 Reading MB/sec 21401.17 21459.99 18396.83 16737.76 16867.78 --------------------------------------------------------------------- 1 KB Reads File MB > 2 4 8 16 32 64 128 Random Read msecs 2.27 2.29 2.29 2.31 2.31 2.31 2.31 --------------------------------------------------------------------- 500 Files Write Read Delete File KB MB/sec ms/File MB/sec ms/File Seconds 2 0.22 9.51 0.30 6.82 2.648 4 0.42 9.75 0.57 7.16 2.606 8 0.74 11.00 1.04 7.91 2.650 16 1.28 12.77 1.85 8.87 2.516 32 2.03 16.17 3.12 10.50 2.535 64 2.71 24.17 4.91 13.35 2.537
Stress Tests Next or Go To Start


Stress Tests

The stress test programs have run time parameters that control duration, often with selection of an available test function and/or data size. Performance results are displayed and logged in text files as the benchmarks are running, and a specific output sampling frequency might be available.

During the tests, another program was available to measure CPU MHz and core temperature, on a sampling basis. In view of voltage related problems identified during MultiThreading Benchmarks, measurement of this has been included in new 32 bit and 64 bit versions. As shown below, the voltage option can be of importance in considering heating effects on performance.


MHz, Temperature and Core Voltage Monitor - RPiHeatMHzVolts, RPiHeatMHzVolts64G

These new programs are available in MHzVoltsTemperature.tar.gz. An example of run time parameters are shown below, where seconds are the intervals between sample measurements. Because of the coarse sampling, changes in readings are not necessarily synchronised. As shown in this Raspberry Pi Report, Model 3B+ thermal control has changed from that used with the 3B. Above 70°C, core frequency is reduced from 1.4 GHz to 1,2 GHz, when core voltage is reduced. Then thermal throttling is applied on reaching 80°C. These characteristics are reflected in the following results, measured whilst running four copies of the Integer Stress Test, reported in the next section.

 Command  ./RPiHeatMHzVolts64G passes 60, seconds 16

 Temperature and CPU MHz Measurement  Start at Tue Jul 31 21:14:36 2018

 Seconds
    0.0     1400 scaling MHz,   1400 ARM MHz, core volt=1.3438V, temp=58.0°C
   16.0     1400 scaling MHz,   1400 ARM MHz, core volt=1.3500V, temp=65.0°C
   32.5     1400 scaling MHz,   1400 ARM MHz, core volt=1.3563V, temp=69.3°C
   49.1     1400 scaling MHz,   1200 ARM MHz, core volt=1.3563V, temp=70.4°C
   65.6     1400 scaling MHz,   1199 ARM MHz, core volt=1.2375V, temp=70.9°C
   82.1     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=71.4°C
   98.7     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=72.0°C
  115.2     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=73.1°C
  131.7     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=74.1°C
  148.3     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=74.1°C
  164.9     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=74.7°C
  181.5     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=75.2°C
  197.9     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=75.2°C
  214.4     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=75.2°C
  230.9     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=76.3°C
  247.5     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=77.4°C
  264.0     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=77.4°C
  280.5     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=77.4°C
  297.1     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=77.4°C
  313.6     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=77.4°C
  330.1     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=77.4°C
  346.7     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=78.4°C
  363.2     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=78.4°C
  379.8     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=78.4°C
  396.4     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  413.0     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  429.6     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  446.1     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  462.6     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  479.2     1400 scaling MHz,   1195 ARM MHz, core volt=1.2375V, temp=80.1°C
  495.9     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  512.3     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  528.8     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  545.3     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  561.8     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=78.4°C
  578.3     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.0°C
  594.9     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=79.5°C
  611.6     1400 scaling MHz,   1195 ARM MHz, core volt=1.2375V, temp=79.5°C
  628.1     1400 scaling MHz,   1195 ARM MHz, core volt=1.2375V, temp=79.5°C
  644.6     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=80.6°C
  661.1     1400 scaling MHz,   1200 ARM MHz, core volt=1.2375V, temp=80.6°C
  677.6     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.1°C
  694.2     1400 scaling MHz,   1194 ARM MHz, core volt=1.2375V, temp=80.6°C
  710.7     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  727.3     1400 scaling MHz,   1195 ARM MHz, core volt=1.2375V, temp=80.6°C
  743.8     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  760.3     1400 scaling MHz,   1087 ARM MHz, core volt=1.2375V, temp=80.1°C
  776.8     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  793.3     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  809.8     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  826.3     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=81.1°C
  842.9     1400 scaling MHz,   1087 ARM MHz, core volt=1.2375V, temp=80.6°C
  859.5     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  876.0     1400 scaling MHz,   1087 ARM MHz, core volt=1.2375V, temp=81.1°C
  892.5     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  909.1     1400 scaling MHz,   1140 ARM MHz, core volt=1.2375V, temp=80.6°C
  925.6     1400 scaling MHz,   1087 ARM MHz, core volt=1.2375V, temp=81.1°C
  942.2     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=80.6°C
  958.7     1400 scaling MHz,   1141 ARM MHz, core volt=1.2375V, temp=81.1°C
  
Integer Stress Tests Next or Go To Start


Integer Stress Tests - stressIntPiA7, stressIntPi64 - Four Copies

These have six write/read and six read only tests, with different variations of data patterns. The read phase comprises an equal number of additions and subtractions, with the data being unchanged afterwards and checked for correctness. Speed is measured in MB/second. Results are displayed at 10 second intervals. Run time parameters are provided for KBytes memory used, seconds for each of the twelve tests and log number for use in multitasking. The 32 bit version can be run from a shell script, as shown below, with lxterminal commands to use different terminal windows. This is not available (or not found for installation), at 64 bits in Gentoo, separate terminals needing to be opened for individual execute commands. The 32 bit benchmark can be obtained from Raspberry_Pi_2_Stress_Tests.zip and the 64 bit version in Rpi3-64-Bit-Benchmarks.tar.gz.

Next, in the table, are the first to start results from a 64 bit Gentoo session that reported the above MHz, voltage and temperature measurements (started 3 seconds earlier). The test program is intended to run each part for the same number of seconds, leading to lower pass counts as the CPU speed reduces. This is followed by performance obtained on running a single copy of the benchmarks with L1 cache, L2 cache and RAM based data. In this case, the 32 bit versions are shown to be faster than the 64 bit compilations.

Finally are a range of 3B+ stress tests results, where no heat sink was used and it was installed in a plastic case. The cool ones were from first runs when the room temperature was 23 °C, and the hot ones from a second test when the room temperature was 3 to 4 °C higher. The critical 80 °C was breached in all cases, but not until the Read Only section with the cooler tests. Performance degradation is shown to be quite similar at 32 and 64 bits, at least with Read/Write tests, also on comparing MB/second and CPU MHz reductions.

                           Shell Script

 lxterminal --geometry=80x15 -e ./RPiHeatMHzVolts passes 60, seconds 16
 lxterminal --geometry=80x15 -e ./stressIntPiA7 KB 16 Secs 80 Log 21
 lxterminal --geometry=80x15 -e ./stressIntPiA7 KB 16 Secs 80 Log 22
 lxterminal --geometry=80x15 -e ./stressIntPiA7 KB 16 Secs 80 Log 23
 lxterminal --geometry=80x15 -e ./stressIntPiA7 KB 16 Secs 80 Log 24

   Gentoo Integer Stress Test RPi 64 Tue Jul 31 21:14:39 2018

  16 KBytes Cache or RAM Space, 80 Seconds Per Test, 12 Tests

 Write/Read
  1    2748 MB/sec  Pattern 00000000 Result OK    6708355 passes
  2    2529 MB/sec  Pattern FFFFFFFF Result OK    6173859 passes
  3    2504 MB/sec  Pattern A5A5A5A5 Result OK    6113617 passes
  4    2508 MB/sec  Pattern 55555555 Result OK    6124102 passes
  5    2515 MB/sec  Pattern 33333333 Result OK    6141199 passes
  6    2504 MB/sec  Pattern F0F0F0F0 Result OK    6113284 passes
 Read
  1    2806 MB/sec  Pattern 00000000 Result OK   13702300 passes
  2    2826 MB/sec  Pattern FFFFFFFF Result OK   13797100 passes
  3    2740 MB/sec  Pattern A5A5A5A5 Result OK   13378700 passes
  4    2676 MB/sec  Pattern 55555555 Result OK   13068900 passes
  5    2656 MB/sec  Pattern 33333333 Result OK   12967300 passes
  6    2658 MB/sec  Pattern F0F0F0F0 Result OK   12977800 passes

          Single Core Speeds, 32 Bit Raspbian and 64 Bit Gentoo

        Write/Read MB/scond                 Read 
        32 Bit            64 Bit            32 Bit            64 Bit
 KB     16    64  2048    16    64  2048    16    64  2048    16    64  2048
  
 3B+  3883  3786  1681  2991  2910  1480  4246  3625  1907  3344  2985  1800

#################################################################################

           Raspbian 32 Bit                     Gentoo 64 Bit
           Cool              Hot               Cool              Hot
           MB/s   MHz    °C  MB/s   MHz    °C  MB/s   MHz    °C  MB/s   MHz    °C

 Write/Read      1400  44.0        1400  59.1        1400  47.8        1399  65.5
        1  3699  1400  68.2  3240  1200  75.2  2927  1400  69.8  2589  1200  78.4
        2  3415  1200  70.9  3297  1200  79.0  2611  1200  70.9  2509  1141  81.1
        3  3218  1200  72.9  3247  1195  80.6  2536  1200  72.0  2375  1087  81.7
        4  3299  1200  73.1  3045  1141  81.1  2525  1200  75.2  2302  1034  81.7
        5  3288  1200  75.2  2929  1087  81.7  2533  1200  75.8  2263  1034  81.7
        6  3291  1200  76.3  2882  1033  81.7  2533  1200  77.4  2231  1034  81.7
 Read
        1  3620  1200  75.8  3343  1141  81.7  2832  1200  77.4  2646  1141  81.7
        2  3602  1200  78.4  3153  1034  81.7  2841  1200  78.4  2539  1034  81.7
        3  3592  1195  79.5  3015   980  82.2  2829  1195  80.1  2470  1034  81.7
        4  3567  1141  80.5  2938   926  82.2  2790  1141  80.6  2444  1033  82.2
        5  3500  1141  80.6  2922   980  82.7  2733  1141  80.6  2414  1034  82.2
        6  3432  1087  80.6  2876   980  82.2  2679  1087  80.1  2152   980  82.2

1 Core RW  3883  1400        3883  1400        2991  1400        2991  1400
1 Core Rd  4246  1400        4246  1400        3344  1400        3344  1400
MIn RW     3218  1200        2882  1033        2525  1200        2231  1034
Min Rd     3432  1087        2876   926        2679  1087        2152   980
%1 Core RW   83    86          74    74          84    86          75    74
%1 Core Rd   81    78          68    66          80    78          64    70

Floating Stress Tests Next or Go To Start


Floating Point Stress Tests - burninfpuPi2, burninfpuPi64 - Four Copies

These use the same program test code as MP-MFLOPS Benchmark, but just for a single CPU core, and the loop for repeat passes included in the main functions, to enable faster execution (up to 17.2 GFLOPS with 4 cores, compared with 11.6). The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2, 8 or 32 operations per input data word. The same variables are used for each word and final results are checked for consistency, any errors being reported. The benchmark has input parameters for KWords, Section 1, 2 or 3 (for 2, 8 or 32 operations per word) and log number (0 to 99). The 32 bit benchmark can be obtained from Raspberry_Pi_2_Stress_Tests.zip and the 64 bit version in Rpi3-64-Bit-Benchmarks.tar.gz.

Below is the start of the program output where, in this case, each pass carries out the same number of calculations, resulting in longer time when the core MHz reduces. This is followed by performance, in MFLOPS, for the three increasing operations per word, using L1 cache, L2 cache and RAM, via both 32 bit and 64 bit compilations. Note that the latter obtains the highest maximum speeds, but is slower using certain test functions.

No Heatsink

The next table provides results of fifteen minute stress tests, on the Raspberry Pi 3B+, in a plastic case and no heatsink, using 32 bit Raspbian and 64 bit Gentoo. In this case, the cool runs followed immediately powering on with room temperatures around 23°C and the hot ones shortly after the others finished. With the different timing procedures, the MFLOPS, MHz and temperature measurements shown are at approximately the same time. MFLOPS is the average speed over 15 to 17 minutes, temperature and MHz instantaneously at around 16 minute intervals, with both varying up and down within a test.

Although the MFLOPS speed is slower, the 32 bit test appears to generate higher temperatures, with earlier degradation to 1200 MHz and further reductions on exceeding 80°C (worst cases 81.7°C, 1033 MHz, 2681 from 3439 MFLOPS). The 32 bit version was compiled to use NEON four way SIMD instructions, with the other employing the more recent 64 bit SIMD vector functions, apparently less demanding power wise.


             Part example one of four 64 bit programs

   Command  ./burninfpuPi64 KWords 4, Section 2, minutes 15, Log 21

            Burn-In-FPU RPi 64 Sat Aug 11 13:04:10 2018

 Using 16 KBytes, 8 Operations Per Word, For Approximately 15 Minutes

   Pass     4 Byte  Ops/   Repeat    Seconds   MFLOPS          First   All
             Words  Word   Passes                            Results  Same

      1       4000     8  1996000      15.03     4251    0.539296687   Yes
      2       4000     8  1996000      15.56     4104    0.539296687   Yes
      3       4000     8  1996000      16.58     3851    0.539296687   Yes
      4       4000     8  1996000      17.16     3721    0.539296687   Yes
      5       4000     8  1996000      17.40     3671    0.539296687   Yes
      6       4000     8  1996000      17.74     3600    0.539296687   Yes
      7       4000     8  1996000      17.87     3574    0.539296687   Yes
      8       4000     8  1996000      17.72     3605    0.539296687   Yes
      9       4000     8  1996000      17.59     3630    0.539296687   Yes
     10       4000     8  1996000      17.55     3640    0.539296687   Yes


          Single Core MFLOPS, 32 Bit Raspbian and 64 Bit Gentoo

  Ops/Word   2     2     2      8     8     8     32    32    32
  K Bytes   16    64  2048     16    64  2048     16    64  2048             
  K Words    4    16   512      4    16   512      4    16   512

  32 Bit  1788  1672   413   3439  3365  1636   2011  2000  1846
  64 Bit  2070  1924   405   4360  4278  1617   1781  1775  1696       

 
  
More Below or Go To Start
Pi 3B+ Floating Point Stress Tests, 16 KBytes, 8 Operations Per Word - No Heatsink Raspbian 32 Bit Gentoo 64 Bit Cool Hot Cool Hot Secs MFLOPS MHz °C MFLOPS MHz °C MFLOPS MHz °C MFLOPS MHz °C 0 1400 47.8 1400 55.8 1400 45.1 1400 61.2 16 3396 1400 62.8 3257 1200 69.8 4315 1400 54.2 4289 1400 67.7 33 3377 1400 67.1 2960 1199 70.4 4336 1400 59.6 3755 1200 70.9 49 3400 1400 69.8 2844 1200 70.9 4343 1400 62.3 3667 1200 71.4 66 3322 1199 69.8 2839 1200 71.4 4311 1400 64.5 3689 1200 72.5 82 3181 1400 70.4 2838 1200 73.1 4296 1400 65.5 3692 1200 73.1 99 3100 1200 70.4 2845 1200 73.6 4295 1400 68.8 3673 1200 74.1 116 3028 1200 70.9 2865 1200 74.1 4319 1400 69.3 3671 1199 74.1 132 3003 1200 70.9 2926 1200 75.2 4291 1200 69.8 3669 1200 74.1 149 2939 1200 70.9 2934 1200 75.2 4150 1400 69.8 3686 1200 75.2 166 2942 1200 70.9 2927 1200 75.8 4058 1200 70.9 3697 1200 75.8 182 2940 1200 70.9 2926 1200 76.3 4012 1200 69.8 3660 1200 75.8 199 2942 1200 71.4 2902 1200 76.8 3977 1200 70.9 3662 1199 76.3 215 2931 1200 72.0 2914 1200 76.8 3927 1200 70.9 3676 1200 76.3 232 2929 1200 72.5 2888 1200 76.3 3898 1200 70.9 3678 1200 76.8 249 2913 1200 73.1 2890 1200 77.4 3689 1200 70.4 3679 1200 76.3 265 2913 1200 73.1 2917 1200 77.9 3712 1200 70.4 3692 1200 77.4 282 2933 1200 73.1 2899 1200 77.9 3823 1199 70.9 3630 1200 76.8 299 2935 1200 73.6 2892 1200 77.9 3799 1199 70.9 3688 1199 76.8 315 2934 1200 74.1 2911 1200 78.4 3783 1200 70.9 3687 1200 77.4 332 2931 1200 74.1 2912 1200 79.5 3733 1200 70.4 3688 1200 77.4 349 2829 1200 74.1 2898 1200 79.0 3747 1200 70.9 3675 1200 76.8 365 2860 1200 74.7 2907 1200 79.5 3715 1200 70.9 3678 1200 77.9 382 2932 1200 74.7 2912 1200 79.5 3705 1199 70.9 3691 1200 76.8 399 2928 1200 75.2 2893 1200 79.5 3686 1200 71.4 3690 1200 77.9 415 2930 1200 75.2 2906 1200 80.1 3698 1199 72.0 3686 1200 78.4 432 2926 1200 75.2 2893 1195 79.5 3703 1200 71.4 3692 1200 77.9 448 2927 1200 75.8 2881 1200 80.1 3689 1200 71.4 3691 1200 78.4 465 2938 1200 76.3 2863 1141 80.1 3680 1200 72.0 3666 1200 77.9 482 2937 1200 76.3 2875 1195 79.5 3686 1200 72.0 3659 1200 78.4 498 2933 1200 76.8 2857 1200 80.6 3685 1200 72.0 3669 1200 78.4 515 2935 1200 76.3 2834 1141 80.6 3709 1200 73.1 3667 1200 78.4 532 2919 1200 77.4 2840 1141 80.6 3688 1200 72.5 3691 1200 78.4 548 2914 1200 77.4 2802 1195 80.6 3692 1200 73.1 3638 1200 79.5 565 2922 1200 77.4 2815 1195 80.6 3686 1200 72.5 3655 1200 78.4 581 2839 1200 77.4 2815 1141 80.1 3699 1200 73.1 3645 1200 78.4 598 2862 1200 77.4 2810 1141 80.1 3665 1200 73.1 3674 1200 79.0 615 2840 1200 77.9 2805 1141 80.6 3652 1200 73.6 3671 1200 78.4 631 2862 1200 78.4 2809 1141 80.6 3700 1200 73.6 3684 1200 78.4 648 2865 1200 78.4 2791 1195 80.6 3704 1200 73.6 3678 1200 79.0 665 2844 1200 79.0 2758 1141 80.6 3648 1200 74.1 3683 1200 79.0 681 2790 1200 77.9 2764 1141 80.6 3697 1200 74.1 3683 1200 78.4 698 2849 1200 79.5 2765 1141 80.6 3605 1200 74.1 3693 1200 79.0 714 2927 1200 78.4 2768 1141 80.6 3684 1200 73.6 3654 1200 79.5 731 2929 1200 79.5 2758 1141 80.6 3680 1200 74.1 3655 1200 79.5 748 2908 1200 79.5 2777 1141 80.6 3647 1200 74.1 3642 1200 79.0 764 2898 1200 79.5 2775 1141 80.6 3695 1200 74.1 3693 1200 79.0 781 2903 1200 79.5 2746 1141 80.6 3704 1200 74.1 3602 1195 79.5 797 2911 1200 79.0 2767 1141 80.6 3674 1199 75.2 3693 1200 79.5 814 2884 1199 79.0 2758 1141 80.6 3679 1200 75.2 3693 1195 79.5 830 2898 1195 79.5 2741 1141 80.6 3689 1200 75.2 3689 1200 79.0 847 2880 1200 79.5 2744 1140 80.6 3692 1200 75.2 3660 1200 79.5 864 2875 1200 79.5 2720 1087 80.6 3684 1200 75.2 3664 1200 79.5 880 2879 1195 79.5 2684 1141 80.6 3671 1200 75.8 3675 1200 79.5 897 2860 1200 80.1 2704 1033 80.6 3700 1200 74.7 3682 1200 79.5 914 2858 1195 80.1 2709 1195 80.6 3674 1200 75.2 3685 1200 80.1 930 2874 1200 80.1 2681 1141 81.1 3694 1200 75.8 3706 1200 79.5 947 2931 1195 80.1 2706 1141 81.1 3683 1200 75.8 3706 1200 79.5 964 2933 1141 80.1 2714 1141 80.6 3681 1200 76.3 3720 1200 79.5 980 2874 1195 80.1 2709 1034 80.6 3672 1200 75.8 3713 1200 77.4 997 2931 1195 80.1 2747 1087 81.7 3713 1200 76.3 3798 1200 77.4 Min 2858 2681 3671 3660 1 CP Max 3439 3439 4360 4360 Min % Max 83 78 84 84

Livermore Loops Stress Test Next or Go To Start


Livermore Loops Stress Tests - liverloopsPiA7R, liverloopsPi64 - Four Copies

The Livermore Loops Benchmark was converted to act as a stress test, following wrong numeric results being produced on an overclocked PC, using a Pentium Pro CPU. To run the reliability test, a seconds parameter is required that arranges for the initial repeat passes for each of the 72 test functions, required to produce that running time. A single log file includes results from all tests. For separate logs, copies of the program can be started from different folders. The 32 bit benchmark can be obtained from Raspberry_Pi_2_Stress_Tests.zip and the 64 bit version in Rpi3-64-Bit-Benchmarks.tar.gz

Whilst running, results from all tests are displayed, with the logged summary shown below. Later are processor MHz and temperature measurements for both 32 bit and 64 bit programs. The cool versions were immediately after powering on, with a room temperature around 22°C, with the hot tests following shortly afterwards. As this program has 72 different variations in code executed, temperature can go up and down, with maximum just about 80°C. For most of the time, in all cases, CPU MHz was at 1200 MHz, and this is reflected in little difference in hot and cold overall performance ratings (provided below). Here, single core performance results generally indicate faster speeds proportional to MHz speed increase.

   
         Example Log File Entry ./liverloopsPi64 Seconds 12

 Livermore Loops Benchmark armv8 64 Bit via C/C++ Fri Aug 17 12:33:02 2018

 Reliability test  12 seconds each loop x 24 x 3
 
 Part 1 of 3 start at Fri Aug 17 12:33:03 2018
 Part 2 of 3 start at Fri Aug 17 12:38:31 2018
 Part 3 of 3 start at Fri Aug 17 12:43:14 2018

 Numeric results were as expected

 MFLOPS for 24 loops
  530.4  296.7  513.6  450.1  198.5  191.2  629.0  424.8  450.5  229.1  145.8  208.1
  105.1  128.3  247.8  219.5  374.4  440.5  284.2  237.5  260.8   78.5  307.1  176.3

           Overall Ratings
           Maximum Average Geomean Harmean Minimum
             629.0   274.7   243.8   213.4    76.8
====================================================================================
 64b Hot     622.0   272.0   241.8   212.4    77.7
 64b 1 Core  720.6   320.2   285.6   251.9    94.4

 32b Cool    428.4   210.4   187.2   164.4    66.0
 32b Hot     386.3   209.5   187.0   164.5    66.2
 32b 1 Core  462.5   243.8   215.2   185.7    65.6

            Raspbian 32 Bit              Gentoo 64 Bit
           Cool         Hot              Cool         Hot
     Secs   MHz    °C   MHz    °C         MHz    °C   MHz    °C

        0  1400  53.7  1400  58.0        1400  52.6  1399  60.1
       15  1400  65.5  1199  69.8        1400  60.1  1400  69.3
       31  1400  69.8  1200  70.4        1400  66.6  1200  70.4
       46  1200  70.9  1200  73.1        1200  70.9  1200  74.1
       62  1200  70.4  1200  72.5        1200  70.9  1200  76.8
       77  1199  70.4  1200  71.4        1200  69.8  1200  74.1
       93  1200  70.4  1200  72.5        1200  70.9  1200  74.1

      280  1200  72.0  1200  74.1        1200  70.9  1200  74.1
      296  1200  72.0  1200  74.1        1200  72.0  1200  75.8
      311  1200  73.6  1200  75.2        1200  70.9  1200  75.2
      327  1200  74.1  1200  76.3        1200  72.0  1200  76.3
      343  1200  73.6  1200  76.3        1200  74.1  1200  77.4
      358  1200  75.8  1200  77.4        1200  73.6  1200  77.4
      374  1200  75.2  1200  78.4        1200  77.4  1195  79.5
      389  1200  74.7  1200  76.3        1200  75.2  1195  79.5

      607  1200  76.3  1200  77.4        1199  74.1  1200  76.8
      623  1200  75.8  1200  77.9        1200  75.2  1200  77.9
      639  1200  77.4  1200  79.0        1200  75.8  1200  78.4
      654  1200  77.4  1200  79.5        1200  78.4  1200  79.5
      670  1200  77.4  1200  78.4        1200  78.4  1141  80.1
      685  1200  76.3  1200  78.4        1200  76.3  1199  79.5
      701  1200  76.8  1200  78.4        1200  76.8  1200  79.0

      763  1200  76.3  1200  78.4        1200  76.8  1200  78.4
      779  1200  75.2  1200  77.9        1200  77.4  1200  78.4
      794  1200  76.8  1200  78.4        1200  75.2  1200  77.9
      810  1200  75.8  1200  78.4        1200  76.3  1199  77.4
      826  1200  75.2  1200  77.4        1200  76.3  1200  77.9
      841  1200  75.2  1200  77.4        1200  77.4  1200  77.4
      857  1200  75.2  1200  76.3        1200  74.1  1200  77.4
      872  1200  76.3  1200  77.9        1200  75.8  1200  76.8
   

OpenGL Stress Test Next or Go To Start


OpenGL Stress Tests - videogl32, videogl64 - One Copy

These use the OpenGL GLUT Benchmarks that have command parameters for window width and height, plus running time in minutes. Default, with no size parameters, is current monitor resolution, in this case 1920 x 1080. The first exercise was to execute short runs to determine system loading of the various test functions. The commands, from different terminal windows, are shown below. Examples of the usual RPiHeatMHzVolts program and stress test outputs are provided later. For those who do not know, example vmstat results are shown. CPU utilisation is the sum of user and system entries (us+sy), where this relates to four core loading. So, 12% is the same as 48% of a single core. In this care, maximum temperatures and CPU utilisation, along with slowest FPS speed, provide the heaviest loading, using the Tiled Kitchen function.

Hot Test - The Tiled Kitchen test was then run for 15 minute test periods, when room temperature was about 22°C. As indicated, a first run indicated a constant 20 FPS speed (rounded up or down), nearly reaching 70°C, where MHz reduces to 1200. However, the short term FPS, displayed during the tests, sometimes indicated 19 FPS, indicting that the MHz can vary quite rapidly. The hot run followed almost immediately afterwards, when the display recorded between 17 and 20 FPS. The unsynchronised variations in temperatures, voltage and CPU MHz again suggest rapid variations. Recorded temperature reached 70.9°C.

Extended Power Cable Test - As mentioned with MultiThreading Benchmarks, the CPU cores can run slowly with longer than normal power supply cables. A short videogl64 test was run with a one metre extension cable, on the 2.5A power supply. This time, core voltage measurements were included, indicating 1.2 volts, instead of 1.35 (therabouts), with 600 MHz and 8 FPS. The Pi 3B+ deserves a commendation for actually running in these circumstances (a permanent way of running cool?).

  Commands   ./RPiHeatMHzVolts64G passes 7 seconds 10
             ./videogl64  test n, mins 1, where n = 1 to 6
             ./vmstat 10 7  - for 7 samples every 10 seconds, example output next

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 1  0      0 530084  22812 216736    0    0     0    15 3029 1131  9  3 89  0  0

                                 Raspbian 32 bit       Gentoo 64 bit
  Test                           FPS    %CPUt   °C     FPS    %CPUt   °C

  1 Few Objects                   49     48    58.5     54     64    62.8
  2 All Objects, No Textures      47     60    59.1     52     80    63.4
  3 Few Objects, With Textures    36     80    60.1     39     96    64.5
  4 All Objects, With Textures    32    100    61.2     35     96    65.5
  5 WireFrame Kitchen             23    116    61.2     27    124    66.6
  6 Tiled Kitchen                 14    112    61.2     20    116    66.6 

  6 Tiled Kitchen 15 minutes      14    112    69.3     20    116    69.8      

                                      Hot Run

 OpenGL Reliability Test 64 Bit Version 1, Mon Aug 20 17:12:38 2018

 Display 1920 x 1080  Tiled Kitchen, Test for 15 minutes

 Normal Output                               Part RPiHeatMHzVolts Results

 Start                                      1400 ARM MHz, core volt=1.3438V, temp=62.8°C
 Test 6  Tiled Kitchen, 30 seconds, 20 FPS  1400 ARM MHz, core volt=1.3500V, temp=67.7°C
 Test 6  Tiled Kitchen, 30 seconds, 20 FPS  1400 ARM MHz, core volt=1.3500V, temp=69.3°C
 Test 6  Tiled Kitchen, 30 seconds, 20 FPS  1400 ARM MHz, core volt=1.2375V, temp=69.8°C
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS  1400 ARM MHz, core volt=1.3500V, temp=69.8°C
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS  1200 ARM MHz, core volt=1.2375V, temp=69.8°C
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS   Continued 19 FPS next 14 entries
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS   5 at 1200 MHz, 3 at 1.2375V  
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS   Temperatures 69.8°C and 70.9°C up and down

 Test 6  Tiled Kitchen, 30 seconds, 18 FPS  1400 ARM MHz, core volt=1.2375V, temp=69.8°C
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS  1400 ARM MHz, core volt=1.3500V, temp=70.4°C
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS  Coninued 19 FPS 10 entries to end
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS  2 at 1200 MHz, 5 at 1.2375V
 Test 6  Tiled Kitchen, 30 seconds, 19 FPS  Temperatures 69.8°C and 70.9°C up and down
 
 Extended Power Cable Test OpenGL Reliability Program 64 Bit Version 1

 Test 6  Tiled Kitchen, 30 seconds, 8 FPS
 Test 6  Tiled Kitchen, 30 seconds, 8 FPS

 Temperature and CPU MHz Measurement

 Using 70 samples at 1 second intervals

 Seconds
    0.0     1400 scaling MHz,   1400 ARM MHz, core volt=1.2000V, temp=52.6°C
    1.0     1400 scaling MHz,    600 ARM MHz, core volt=1.2000V, temp=52.6°C
    2.4     1400 scaling MHz,    600 ARM MHz, core volt=1.2000V, temp=52.6°C
    3.8     1400 scaling MHz,   1400 ARM MHz, core volt=1.2000V, temp=52.6°C

    To end 53.2°C to 55.3°C, mainly 600 MHz, 1.2000V, some 1400 MHz, 1.3438V 
  

OpenGL + CPU Stress Tests Next or Go To Start


OpenGL + Three Copies CPU Stress Tests

Next we have the 64 bit stress tests running the OpenGL program at the same time as three copies of the Integer, Floating Point or Livermore Loops Stress Tests. All reached the critical 80°C barrier, where CPU frequency reduces from 1200 MHz. Room temperature was around 22°C, with the higher floating point test starting point not maintaining the disadvantage after a short time. The details are based on approximate 30 second intervals, average overall OpenGL Frames Per Second and CPU performance, with CPU MHz and temperature averages derived from not necessarily accurate samples.

The stand alone OpenGL Tests CPU utilisation report indicated that more than one core was being used for some of the time. This probably lead to greater floating point and OpenGL performance reductions, compared with the integer tests, with similar average temperatures. The run using Livermore Loops was the least affected.

These tests were run with the main board in a simple plastic case and the CPU having no heatsink, The following section includes repeats of the tests, with the system in a FLIRC case, where the whole aluminium case becomes the heatsink, and lead to significantly lower CPU temperatures during earlier Raspberry Pi 3B stress tests - see Raspberry Pi 2 and 3 Stress Tests.htm and Raspberry Pi 32 Bit and 64 Bit Benchmarks and Stress Tests.htm - Pi 1. Pi 2 and Pi 3.

                    Raspberry Pi 3B+ 64 Bit Stress Tests

    Secs     stressIntPi64+OGL     burninfpuPi64+OGL     liverloopsPi64+OGL
              MHz   FPS     °C      MHz   FPS     °C      MHz   FPS     °C

       0     1400         55.8     1400         66.1     1400         49.9
      30     1200    16   69.8     1200     9   72.5     1400    17   67.7
      60     1200    17   73.1     1200    11   75.2     1200    17   70.9
      90     1200    16   75.2     1200    10   76.3     1200    16   70.9
     120     1200    16   76.3     1200    10   75.8     1200    16   70.4
     150     1200    16   77.4     1200    11   77.4     1200    16   70.9
     180     1200    16   78.4     1200    10   77.9     1200    16   73.1
     210     1200    16   79.5     1200    10   78.4     1200    15   73.6
     240     1200    16   80.1     1200    11   79.5     1200    16   74.1
     270     1141    16   80.6     1200    12   80.1     1200    16   74.1
     300     1034    15   81.1     1195    13   80.6     1200    16   74.7
     330     1141    15   80.6     1195    11   80.6     1200    16   79.0
     360     1141    15   81.7     1087    11   80.6     1200    16   77.9
     390     1034    15   81.1     1141    12   80.6     1200    16   77.4
     420     1034    15   81.7     1141    11   81.1     1200    15   78.4
     450     1034    14   80.6     1141    11   80.6     1200    15   79.5
     480     1141    15   80.6     1141    10   80.6     1200    17   79.0
     510      980    15   81.7     1141    11   81.1     1200    16   78.4
     540     1034    15   80.6     1034    11   80.6     1200    16   79.0
     570     1034    14   81.7     1087    11   80.6     1195    16   80.6
     600     1034    15   81.1     1141    11   81.7     1034    17   80.6
     630     1034    14   81.7     1087    11   81.1     1141    16   81.1
     660     1034    14   81.1     1087    10   80.6     1195    16   80.6
     690     1034    14   81.7     1034     9   81.1     1200    15   79.5
     720     1034    14   81.7     1140    11   81.7     1141    16   80.6
     750      980    14   81.7     1141    10   81.7     1141    16   79.5
     780     1034    14   81.7     1034    10   81.1     1141    16   80.6
     810     1034    14   82.7     1034     8   82.2     1141    16   80.6
     840     1034    14   82.2     1141    10   81.1     1141    16   81.1
     870     1034    14   82.2     1033    10   81.7     1200    16   76.8
     900     1141    13   79.5     1141    10   81.1     1200    16   75.2

 Average     1093  14.9   80.0     1137  10.5   79.8     1189  16.0   76.9
 %Av/Max       78    75              81    53              85    80

 Performance     MB/sec                MFLOPS                MFLOPS

  Average Was      2376                  2880                 229.7
          Max      3168                  4360                 285.6
            %      75.0                  66.1                    80    

         Max is typical average performance testing a single core
   

FLIRC Case Stress Tests or Go To Start


FLIRC Case Stress Tests

The FLIRC case has box like extension under the lid that presses down on the processor, via a supplied thermal pad, enabling the whole aluminium case become a heatsink. The Pi 3B+ was fitted in a new one and the 64 bit integer and floating point plus OpenGL tests repeated, essentially starting at the same room temperature as the earlier exercise. The first integer test was run shortly after powering on. The next one shown was for a third run, the second one being spoiled by accidentally allowing power saving to stop the display. The floating point sessions followed after half an hour, with 9 minutes in between.

Performance using the FLIRC case was clearly superior to that from using a plastic case with no heatsink filled on the Pi board but, with varying starting conditions, it is difficult to be precise. In the latest results, there is little sign of core temperatures reaching 70°C, with CPU MHz almost always at 1400 MHz. Average graphics Frames Per Second were quite close to the maximum possible, with all cores fully utilised (at 19 FPS), 27% to 75% faster than using the plastic case. Also average MB/second, from the Integer tests, was 24% to 27% faster. and floating point MFLOPS providing 42% to 44% improvement.

                             Raspberry Pi 3B+ 64 Bit Stress Tests

       stressIntPi64+OGL    stressIntPi64+OGL    burninfpuPi64+OGL    burninfpuPi64+OGL
  Secs     MHz   FPS     °C     MHz   FPS     °C     MHz   FPS     °C     MHz   FPS     °C

      0    1400         37.6    1400         53.7    1400         46.2    1400         50.5
     30    1400    18   47.2    1400    19   60.1    1400    19   54.2    1400    19   56.4
     60    1400    19   49.9    1400    19   63.9    1400    19   56.4    1400    19   60.1
     90    1400    19   51.5    1400    19   64.5    1400    19   58.5    1400    19   61.8
    120    1400    19   52.1    1400    19   66.6    1400    19   58.5    1400    18   62.3
    150    1399    19   53.7    1400    19   66.6    1400    19   60.1    1400    19   63.4
    180    1400    19   54.2    1400    18   68.2    1399    18   60.7    1400    18   63.4
    210    1400    19   55.8    1400    19   67.7    1400    19   61.8    1400    19   64.5
    240    1400    19   55.8    1400    18   68.8    1400    18   62.3    1400    19   65.0
    270    1399    19   56.9    1399    19   68.8    1400    19   62.3    1400    19   65.5
    300    1400    19   58.0    1400    19   69.3    1400    19   62.8    1400    19   66.1
    330    1400    18   58.0    1400    19   69.3    1400    19   62.8    1400    19   66.6
    360    1400    19   59.1    1400    19   69.3    1400    19   63.4    1400    18   66.6
    390    1400    19   59.1    1400    18   69.8    1400    18   64.5    1400    19   67.1
    420    1400    19   60.1    1200    18   69.3    1400    18   65.0    1400    19   67.7
    450    1400    19   60.7    1400    18   70.4    1400    18   65.0    1400    19   67.7
    480    1400    19   61.2    1400    19   69.8    1400    19   65.5    1400    19   67.7
    510    1400    19   60.7    1400    19   69.8    1400    19   65.5    1400    18   68.8
    540    1400    19   61.2    1400    19   69.8    1400    19   66.6    1400    19   68.8
    570    1400    19   61.2    1400    19   69.8    1400    19   66.6    1400    19   69.3
    600    1400    19   62.3    1400    18   69.8    1400    18   66.6    1400    19   68.8
    630    1400    19   62.3    1200    18   70.4    1400    18   67.1    1400    18   68.8
    660    1400    19   63.4    1400    18   70.4    1400    18   67.7    1400    19   69.3
    690    1400    19   64.5    1399    18   69.8    1400    18   68.2    1400    19   69.3
    720    1400    19   64.5    1400    18   69.8    1400    18   67.7    1400    19   69.3
    750    1400    19   65.0    1400    18   70.4    1400    18   67.7    1400    19   69.8
    780    1400    19   65.5    1400    18   69.8    1400    18   67.7    1400    19   69.3
    810    1400    19   65.5    1400    18   70.4    1400    18   68.8    1200    19   69.8
    840    1400    19   66.6    1400    18   69.8    1400    18   68.8    1400    18   69.8
    870    1400    19   66.1    1400    18   69.8    1400    18   69.3    1400    19   69.8
    900    1400    19   67.1    1400    17   70.4    1400    17   69.3    1400    19   69.8

 Average   1400  18.9   59.6    1387  18.4   68.8    1400  18.4   64.4    1393  18.8   66.8
 %Av/Max    100    95             99    92            100    92             99    94

 Performance   MB/sec               MB/sec               MFLOPS               MFLOPS

 Average Was     3025                 2954                 4162                 4096
         Max     3168                 3168                 4360                 4360
           %     95.5                 93.3                 95.5                 93.9

          Max is typical average performance testing a single core
   


Assembly Code Next or Go To Start


Assembly Code

As shown in the code below, I have disassembled my MemSpeed type benchmarks. There are calculations using intrinsic functions and normal four way unrolled C code. each of the four Vector Multiply Accumulate intrinsic statements should lead to execution of four multiplies and four adds (total of 16 floating point operations). The C code loop has four multiples and four adds, but the compilers might be expected to unroll this further, where appropriate (they didn’t - is there a parameter to force this?). This lead to the fastest speeds being produced by intrinsics, using assembly code instructions shown below.

NeonSpeed - At 32 bit working the Vector Multiply Accumulate intrinsic were directly converted to NEON vmla.f32 instructions using quad word registers. The 64 bit compiler converted the intrinsics to A64 instructions “Floating-point fused multiply-add to accumulator” , using 128 bit vector registers. Next are instructions generated for normal C code, using neon and funsafe compiler directives at 32 bits and standard parameters at 64 bits, acting on single precision calculations. At 32 bits, a single SIMD NEON instruction is used - vfma.32 (Vector Fused Multiply Accumulate) with four calculations. At 64 bits, vfma is generated again. The difference in speed is apparent from using a single SIMD instruction in the loop, compared with four with intrinsics.

MemSpeed - Single Precision vs Double Precision - For the 32 bit version, using the NEON compiling parameter shown, NEON instructions were not generated, four scalar Floating-point multiply-accumulate (fmacs or fmacd) were produced instead, producing the slowest speeds. Adding that funsafe parameter produced the same vfma.f32 NEON instruction as NeonSpeed for four single precision calculations. But four vfma.f64 were generated for double precision. Yes these are NEON instructions but SISD (Single Instruction Single Data), each with data in 64 bit scalar registers.

64 Bit MemSpeed - For the four sets of calculations, the fmla vector instructions were again produced, requiring two for double precision and speed closer to that from single precision calculations.

   Program Code
   NEON Intrinsics                    MemSpeed and NEONSpeed C Code for Compilation
{                                     Single and Double Precision
   x41 = vld1q_f32(ptrx1);
   x42 = vld1q_f32(ptrx2);          for (m=0; m<kd; m=m+inc)
   x43 = vld1q_f32(ptrx3);          {
   x44 = vld1q_f32(ptrx4);            xn[m]   = xn[m]   + sumn * yn[m];
   y41 = vld1q_f32(ptry1);            xn[m+1] = xn[m+1] + sumn * yn[m+1];
   y42 = vld1q_f32(ptry2);            xn[m+2] = xn[m+2] + sumn * yn[m+2];
   y43 = vld1q_f32(ptry3);            xn[m+3] = xn[m+3] + sumn * yn[m+3];            }
   y44 = vld1q_f32(ptry4);          (
   z41 = vmlaq_f32(x41, y41, c4);
   z42 = vmlaq_f32(x42, y42, c4);
   z43 = vmlaq_f32(x43, y43, c4);
   z44 = vmlaq_f32(x44, y44, c4);
   vst1q_f32(ptrx1, z41);
   vst1q_f32(ptrx2, z42);
   vst1q_f32(ptrx3, z43);
   vst1q_f32(ptrx4, z44);
   ptrx1 = ptrx1 + 16;
   ptry1 = ptry1 + 16;
   ptrx2 = ptrx2 + 16;
   ptry2 = ptry2 + 16;
   ptrx3 = ptrx3 + 16;
   ptry3 = ptry3 + 16;
   ptrx4 = ptrx4 + 16;
   ptry4 = ptry4 + 16;
}
 ######################################################################

             NEON Speed Intrinsics 
   32 Bit                             64 Bit
   1173 MFLOPS                        1277 MFLOPS
.L75:                               .L13:
   add     r0, r3, #48                ldr     q4, [x3, -16]
   add     ip, r3, #32                add     x3, x3, 64
   add     lr, r3, #16                ldr     q3, [x3, -64]
   add     r10, r2, #48               add     x1, x1, 64
   add     r7, r2, #32                ldr     q2, [x3, -48]
   add     r4, r2, #16                ldr     q1, [x3, -32]
   vld1.32 {d24-d25}, [r3]            cmp     x3, x2
   vld1.32 {d18-d19}, [r0]            ldr     q16, [x1, -64]
   vld1.32 {d20-d21}, [ip]            ldr     q7, [x1, -48]
   vld1.32 {d22-d23}, [lr]            ldr     q6, [x1, -32]
   vld1.32 {d26-d27}, [r2]            ldr     q5, [x1, -16]
   vld1.32 {d6-d7}, [r10]             fmla    v4.4s, v0.4s, v16.4s
   vld1.32 {d30-d31}, [r7]            fmla    v3.4s, v0.4s, v7.4s
   vld1.32 {d28-d29}, [r4]            fmla    v2.4s, v0.4s, v6.4s
   vmla.f32        q9, q3, q8         fmla    v1.4s, v0.4s, v5.4s
   vmla.f32        q10, q15, q8       str     q4, [x3, -80]
   vmla.f32        q11, q14, q8       str     q3, [x3, -64]
   vmla.f32        q12, q13, q8       str     q2, [x3, -48]
   add     r1, r1, #1                 str     q1, [x3, -32]
   add     r2, r2, #64                bne     .L13
   cmp     r1, r5
   vst1.32 {d24-d25}, [r3]
   vst1.32 {d22-d23}, [lr]
   add     r3, r3, #64
   vst1.32 {d20-d21}, [ip]
   vst1.32 {d18-d19}, [r0]
   bne     .L75
                             More Below or Go To Start
###################################################################### NEON Speed Normal -mfpu=neon-vfpv4 -funsafe-math-optimizations -march=armv8-a 797 MFLOPS 681 MFLOPS 32 Bit 64 bit .L54: .L37: vld1.32 {q9}, [r2] ldr q0, [x0, x26] vld1.32 {q8}, [r3] add w1, w1, 1 add r1, r1, #1 ldr q1, [x0, x28] add r2, r2, #16 cmp w24, w1 cmp r1, r4 fmla v0.4s, v1.4s, v2.4s vfma.f32 q8, q9, q7 str q0, [x0, x26] vst1.32 {q8}, [r3] add x0, x0, 16 add r3, r3, #16 bhi .L37 bcc .L54 ###################################################################### MemSpeed 32 Bit Single and Double Precision Parameters -mfpu=neon-vfpv4 Single Precision Double Precision MFLOPS 532 mFLOPS 238 .L45: .L31: mov ip, r2 fldd d5, [r2, #-24] flds s15, [r3] fldd d6, [r3, #-24] flds s11, [ip] fldd d7, [r3, #-16] flds s12, [r3, #-12] fldd d4, [r3, #-8] flds s13, [r3, #-8] mov r6, r2 flds s14, [r3, #-4] fmacd d6, d5, d8 flds s8, [r2, #-12] fldd d3, [r3] flds s9, [r2, #-8] add r2, r2, #32 flds s10, [r2, #-4] fstd d6, [r3, #-24] fmacs s15, s11, s30 fldd d6, [r2, #-48] fmacs s12, s8, s30 fmacd d7, d6, d8 fmacs s13, s9, s30 fstd d7, [r3, #-16] fmacs s14, s10, s30 fldd d7, [r2, #-40] add r2, r2, #16 fmacd d4, d7, d8 fmrs ip, s15 fstd d4, [r3, #-8] fsts s12, [r3, #-12] fldd d7, [r6] fsts s13, [r3, #-8] fmacd d3, d7, d8 fsts s14, [r3, #-4] fmrrd r8, r9, d3 str ip, [r3], #16 @ float strd r8, [r3], #32 cmp r3, r6 cmp r3, r1 bne .L45 bne .L31 ###################################################################### More MemSpeed 32 Bit Single and Double Precision Parameters -mfpu=neon-vfpv4 -funsafe-math-optimizations Single Precision Double Precision MFLOPS 695 MFLOPS 236 MLOPS .L44: .L28: vld1.64 {d16-d17}, [r3:64] fldd d17, [r2, #-24] vld1.64 {d18-d19}, [r1:64] fldd d16, [r3, #-24] add r2, r2, #1 fldd d18, [r3, #-16] add r1, r1, #16 vfma.f64 d16, d17, d8 cmp r4, r2 mov r4, r2 add r3, r3, #16 fldd d17, [r3, #-8] vfma.f32 q8, q9, q7 add r2, r2, #32 vstr d16, [r3, #-16] fcpyd d19, d16 vstr d17, [r3, #-8] fldd d16, [r3] bhi .L44 fstd d19, [r3, #-24] fldd d19, [r2, #-48] vfma.f64 d18, d19, d8 fstd d18, [r3, #-16] fldd d18, [r2, #-40] vfma.f64 d17, d18, d8 fstd d17, [r3, #-8] fldd d17, [r4] vfma.f64 d16, d17, d8 fmrrd r4, r5, d16 strd r4, [r3], #32 cmp r3, r1 bne .L28
More Below or Go To Start
###################################################################### MemSpeed 64 Bit Single and Double Precision Parameters - -march=armv8-a Single Precision Double Precision MFLOPS 726 MFLOPS 602 .L56: .L34: ldr q0, [x27, x0] ldr q5, [x2, 16] add w1, w1, 1 add w1, w1, 1 ldr q1, [x23, x0] ldr q1, [x0, 16] cmp w21, w1 cmp w28, w1 fmla v0.4s, v1.4s, v2.4s ldr q3, [x2], 32 str q0, [x27, x0] add x0, x0, 32 add x0, x0, 16 ldr q0, [x0, -32] bhi .L56 fmla v1.2d, v5.2d, v2.2d fmla v0.2d, v3.2d, v2.2d str q1, [x0, -16] str q0, [x0, -32] bhi .L34

MP-MFLOPS on Raspberry Pi 3B+

The 32 bit compilations uses 12 scalar add and multiply instructions and 10 using fused multiply accumulate NEON type, but limited to scalar operation (SISD - Single Instructions Single Data). All the others use NEON or 64 bit vector SIMD instructions (Multiple Data), carrying out four calculations simultaneously at single precision, with 128 operations in the execution loops, or half these at double precision. Each has its own variation of fused multiply and add or subtract instructions.

In the original single precision benchmarks, the NEON version produced significantly faster performance, where the compiler converted the 32 intrinsic calculating functions into 22 instructions, with those fused operations, and a total in-loop count of 27. Performance of the first 64 bit version was degraded through making use of only 12 vector registers, for a programming function involving 23 variables, necessitating frequent load instructions. The gcc 7 compiler made use of 25 vector registers with out of loop loads to achieve similar performance as the hand code NEON benchmark. Both the 64 bit double precision benchmarks included the higher efficient code, with external data loading, but best speed was, as expected, half that for single precision SIMD calculations.


   Function triadplus2
   for(i=0; i<n; i++)
   x[i] = (x[i]+a)*b-(x[i]+c)*d+(x[i]+e)*f-(x[i]+g)*h+(x[i]+j)*k
   -(x[i]+l)*m+(x[i]+o)*p-(x[i]+q)*r+(x[i]+s)*t-(x[i]+u)*v+(x[i]+w)*y;

 ######################################################################
 
   gcc 4.9 32 bit
   SP MFLOPS                          DP MFLOPS
   797 1t 3134 4T                     798 1T 3119 4T

.L21:                              .L21:
   flds      s23, [r3]                fldd      d17, [r1]
   fadds     s15, s8, s23             faddd     d16, d17, d2
   fadds     s24, s10, s23            faddd     d18, d17, d0
   fadds     s31, s6, s23             faddd     d25, d17, d4
   fadds     s30, s4, s23             faddd     d24, d17, d6
   fnmuls    s15, s15, s7             fnmuld    d16, d3, d16
   fadds     s29, s3, s23             faddd     d23, d17, d15
   fadds     s28, s1, s23             faddd     d22, d17, d13
   fadds     s27, s0, s23             faddd     d21, d17, d11
   vfma.f32  s15, s9, s24             faddd     d20, d17, d9
   fadds     s26, s17, s23            faddd     d19, d17, d31
   fadds     s25, s18, s23            vfma.f64  d16, d18, d1
   fadds     s24, s20, s23            faddd     d18, d17, d29
   fadds     s23, s21, s23            faddd     d17, d17, d27
   vfma.f32  s15, s5, s31             vfma.f64  d16, d25, d5
   vfma.f32  s15, s14, s30            vfms.f64  d16, d24, d7
   vfma.f32  s15, s2, s29             vfma.f64  d16, d23, d14
   vfma.f32  s15, s13, s28            vfms.f64  d16, d22, d12
   vfma.f32  s15, s16, s27            vfma.f64  d16, d21, d10
   vfma.f32  s15, s12, s26            vfms.f64  d16, d20, d8
   vfma.f32  s15, s19, s25            vfma.f64  d16, d19, d30
   vfma.f32  s15, s11, s24            vfms.f64  d16, d18, d26
   vfma.f32  s15, s22, s23            vfma.f64  d16, d17, d28
   fstmias   r3!, {s15}               fstmiad   r1!, {d16}
   cmp       r3, r2                   cmp       r1, r0
   bne       .L9                      bne      .L21
    
More Below or Go To Start
###################################################################### gcc 6 64 bit SP MFLOPS DP MFLOPS 1793 1T to 6981 4T 1405 1T to 4398 4T .L65: .L84: ldr q16, [x2, x5] ldr q16, [x2, x0] add w6, w6, 1 add w3, w3, 1 ldr q15, [sp, 64] cmp w3, w6 cmp w3, w6 fadd v15.2d, v16.2d, v14.2d ldr q17, [sp, 80] fadd v17.2d, v16.2d, v12.2d ldr q0, [sp, 112] fmul v15.2d, v15.2d, v13.2d fadd v15.4s, v16.4s, v15.4s fmls v15.2d, v17.2d, v11.2d fmul v15.4s, v15.4s, v17.4s fadd v17.2d, v16.2d, v10.2d ldr q17, [sp, 96] fmla v15.2d, v17.2d, v9.2d fadd v17.4s, v16.4s, v17.4s fadd v17.2d, v16.2d, v8.2d fmls v15.4s, v17.4s, v0.4s fmls v15.2d, v17.2d, v31.2d ldr q0, [sp, 128] fadd v17.2d, v16.2d, v30.2d fadd v17.4s, v16.4s, v0.4s fmla v15.2d, v17.2d, v29.2d ldr q0, [sp, 144] fadd v17.2d, v16.2d, v28.2d fmla v15.4s, v17.4s, v0.4s fmls v15.2d, v17.2d, v0.2d ldr q0, [sp, 160] fadd v17.2d, v16.2d, v27.2d fadd v17.4s, v16.4s, v0.4s fmla v15.2d, v17.2d, v26.2d ldr q0, [sp, 176] fadd v17.2d, v16.2d, v25.2d fmls v15.4s, v17.4s, v0.4s fmls v15.2d, v17.2d, v24.2d ldr q0, [sp, 192] fadd v17.2d, v16.2d, v23.2d fadd v17.4s, v16.4s, v0.4s fmla v15.2d, v17.2d, v22.2d ldr q0, [sp, 208] fadd v17.2d, v16.2d, v21.2d fmla v15.4s, v17.4s, v0.4s fadd v16.2d, v16.2d, v19.2d ldr q0, [sp, 224] fmls v15.2d, v17.2d, v20.2d fadd v17.4s, v16.4s, v0.4s fmla v15.2d, v16.2d, v18.2d ldr q0, [sp, 240] str q15, [x2, x0] fmls v15.4s, v17.4s, v0.4s add x0, x0, 16 ldr q0, [sp, 256] bcc .L84 fadd v17.4s, v16.4s, v0.4s ldr q0, [sp, 272] fmla v15.4s, v17.4s, v0.4s ldr q0, [sp, 288] fadd v17.4s, v16.4s, v0.4s fmls v15.4s, v17.4s, v14.4s fadd v17.4s, v16.4s, v13.4s fmla v15.4s, v17.4s, v12.4s fadd v17.4s, v16.4s, v11.4s fadd v16.4s, v16.4s, v9.4s fmls v15.4s, v17.4s, v10.4s fmla v15.4s, v16.4s, v8.4s str q15, [x2, x5] add x5, x5, 16 bhi .L65 ###################################################################### gcc 7 SP MFLOPS DP MFLOPS 2800 1T to 10608 4T 1403 1T 4492 4T .L51: .L44: ldr q15, [x2, x3] ldr q15, [x3, x2] add w4, w4, 1 add w4, w4, 1 cmp w4, w6 cmp w4, w5 fadd v0.4s, v15.4s, v14.4s fadd v7.2d, v15.2d, v14.2d fadd v17.4s, v15.4s, v12.4s fadd v16.2d, v15.2d, v12.2d fmul v0.4s, v0.4s, v13.4s fmul v7.2d, v7.2d, v13.2d fmls v0.4s, v17.4s, v11.4s fmls v7.2d, v16.2d, v11.2d fadd v17.4s, v15.4s, v10.4s fadd v16.2d, v15.2d, v10.2d fmla v0.4s, v17.4s, v9.4s fmla v7.2d, v16.2d, v9.2d fadd v17.4s, v15.4s, v8.4s fadd v16.2d, v15.2d, v8.2d fmls v0.4s, v17.4s, v31.4s fmls v7.2d, v16.2d, v31.2d fadd v17.4s, v15.4s, v30.4s fadd v16.2d, v15.2d, v30.2d fmla v0.4s, v17.4s, v29.4s fmla v7.2d, v16.2d, v29.2d fadd v17.4s, v15.4s, v16.4s fadd v16.2d, v15.2d, v28.2d fmls v0.4s, v17.4s, v28.4s fmls v7.2d, v16.2d, v27.2d fadd v17.4s, v15.4s, v27.4s fadd v16.2d, v15.2d, v26.2d fmla v0.4s, v17.4s, v26.4s fmla v7.2d, v16.2d, v25.2d fadd v17.4s, v15.4s, v25.4s fadd v16.2d, v15.2d, v24.2d fmls v0.4s, v17.4s, v24.4s fmls v7.2d, v16.2d, v23.2d fadd v17.4s, v15.4s, v23.4s fadd v16.2d, v15.2d, v22.2d fmla v0.4s, v17.4s, v22.4s fmla v7.2d, v16.2d, v21.2d fadd v17.4s, v15.4s, v21.4s fadd v16.2d, v15.2d, v20.2d fadd v15.4s, v15.4s, v19.4s fadd v15.2d, v15.2d, v18.2d fmls v0.4s, v17.4s, v20.4s fmls v7.2d, v16.2d, v19.2d fmla v0.4s, v15.4s, v18.4s fmla v7.2d, v15.2d, v17.2d str q0, [x2, x3] str q7, [x3, x2] add x3, x3, 16 add x2, x2, 16 bcc .L51 bcc .L44
More Below or Go To Start
###################################################################### gcc6 neon SP MFLOPS C code 2999 1T to 11563 4T for(i=0; i<n; i=i+4) .L41: { ldr q1, [x1] x41 = vld1q_f32(ptrx1); ldr q0, [sp, 64] z41 = vaddq_f32(x41, a41); fadd v18.4s, v20.4s, v1.4s z41 = vmulq_f32(z41, b41); fadd v17.4s, v22.4s, v1.4s z42 = vaddq_f32(x41, c41); fadd v0.4s, v0.4s, v1.4s z42 = vmulq_f32(z42, d41); fadd v16.4s, v24.4s, v1.4s z41 = vsubq_f32(z41, z42); fadd v7.4s, v26.4s, v1.4s z42 = vaddq_f32(x41, e41); fadd v6.4s, v28.4s, v1.4s z42 = vmulq_f32(z42, f41); fadd v5.4s, v30.4s, v1.4s z41 = vaddq_f32(z41, z42); fmul v0.4s, v0.4s, v19.4s z42 = vaddq_f32(x41, g41); fadd v4.4s, v10.4s, v1.4s z42 = vmulq_f32(z42, h41); fadd v3.4s, v12.4s, v1.4s z41 = vsubq_f32(z41, z42); fadd v2.4s, v14.4s, v1.4s z42 = vaddq_f32(x41, j41); fadd v1.4s, v8.4s, v1.4s z42 = vmulq_f32(z42, k41); fmls v0.4s, v21.4s, v18.4s z41 = vaddq_f32(z41, z42); fmla v0.4s, v23.4s, v17.4s z42 = vaddq_f32(x41, l41); fmls v0.4s, v25.4s, v16.4s z42 = vmulq_f32(z42, m41); fmla v0.4s, v27.4s, v7.4s z41 = vsubq_f32(z41, z42); fmls v0.4s, v29.4s, v6.4s z42 = vaddq_f32(x41, o41); fmla v0.4s, v31.4s, v5.4s z42 = vmulq_f32(z42, p41); fmls v0.4s, v9.4s, v1.4s z41 = vaddq_f32(z41, z42); fmla v0.4s, v4.4s, v11.4s z42 = vaddq_f32(x41, q41); fmls v0.4s, v3.4s, v13.4s z42 = vmulq_f32(z42, r41); fmla v0.4s, v2.4s, v15.4s z41 = vsubq_f32(z41, z42); str q0, [x1], 16 z42 = vaddq_f32(x41, s41); cmp x1, x0 z42 = vmulq_f32(z42, t41); bne .L41 z41 = vaddq_f32(z41, z42); z42 = vaddq_f32(x41, u41); z42 = vmulq_f32(z42, v41); z41 = vsubq_f32(z41, z42); z42 = vaddq_f32(x41, w41); z42 = vmulq_f32(z42, y41); z41 = vaddq_f32(z41, z42); vst1q_f32(ptrx1, z41); ptrx1 = ptrx1 + 4; }


System ID Next or Go To Start


System ID

The benchmarks obtain information on CPU hardware characteristics and version of Linux from files /proc/cpuinfo and /proc/version. Below are details provided from 32 bit Raspbian and 64 bit Gentoo and they are the same for both Raspberry Pi 3B and 3B+. Raspbian BogoMIPS appears to depend on the CPU MHz frequency governor, with the default “On-Demand” setting, 38.4 was indicated but 89.6 with the “Performance” option or 76.8 using an earlier version of Raspbian.

                  32 bit Raspbian                      64 Bit Gentoo

processor         0 to 3                               0 to 3
model name        ARMv7 Processor rev 4 (v7l)
BogoMIPS          38.40 or 89.6                        38.40
Features          half thumb fastmult vfp edsp neon    fp asimd evtstrm crc32 cpuid
                  vfpv3 tls vfpv4 idiva idivt vfpd32
                  lpae evtstrm crc32
CPU implementer   0x41                                 0x41
CPU architecture: 7                                    8
CPU variant       0x0                                  0x0
CPU part          0xd03                                0xd03
CPU revision      4                                    4

Linux version     4.14.34-v7+ (dc4@dc4-XPS13-9333)     4.14.31-v8-b36f4e9e1984+
                  (gcc version 4.9.3 (crosstool-NG     (sakaki@chiyo) (gcc version 6.4.0
                  crosstool-ng-1.22.0-88-g8460611))    (Gentoo 6.4.0-r1 p1.3)) #1 SMP
                  #1110 SMP Mon Apr 16 15:18:51 BST    PREEMPT Sun Apr 1 14:15:34 BST
                  2018                                 2018
   


Go To Start