|
NOTE - These benchmarks generally ran successfully on devices controlled by up to Android 7. They could be installed, using Android 8, but failed to run due to a minor incompatibility. The benchmarks have been regenerated, excluding this problem. The new versions can be downloaded from
a Researchgate PDF file
This also includes details and results from later technology, including Cortex-A73 CPU with Android 8.
ContentsDownload Benchmark AppsAll have an option to save results via Email. The first set automatically select benchmark code for ARM, Intel or MIPS processors at run time, for 32 bit architecture or 64 bit when supported. Following are older 32 bit benchmarks that are still relevant. GeneralMy original Android benchmarks were compiled to only run on ARM CPUs using 32 bit instructions. These are available from a copy in British Library Archives or from here. The newer ones automatically select benchmark code for ARM, Intel or MIPS processors at run time, for 32 bit architecture or 64 bit when supported. These were produced using a later version of the gcc compiler. When evaluating performance differences of 64 bit operation, those at 32 bits should be produced by the same compiler version. These are in the 32 bit zip file that can be downloaded from above. It should be noted that these are recognised, by Android, as identical to the 64 bit versions, that might need to be reinstalled. The version is identified in the output display.The original ARM native code benchmarks will run on Intel CPUs, but slowly, via an Android based compatibility layer, called Houdini, that maps ARM instructions into those for X86 processors. The new ones use native Intel instructions. After installing Android 5.0, on the Intel tablet, the original ARM native code benchmarks were rerun. As shown in the results below, significant speed gains could be obtained. The latest benchmarks were compiled using gcc 4.8, via Eclipse Android Development Tools. The project files, with source code, are in Android Intel-ARM Benchmarks.zip. Limited tests show that these projects can also be used to produce the benchmarks via Android Studio. The zip file now includes the projects for the above earlier tests, in an folder named Old. All Java and native C/C++ based benchmarks use the same Java front end to run the benchmarks and display the results, an example being below. There are Run, Information and Save buttons, the latter to eMail results to me and/or whoever. The results are also saved in text based log files that also identifies system characteristics. As indicated, results identify whether 32 bit or 64 bit code has been executed. ![]() New Version of android benchmarks.htm - The revised version of this report will contain results from running a wide range of the 32/64 bit benchmarks on a particular tablet or phone, plus any from newer top end devices. For detail and results from the original benchmarks see the last report. Strategy - These benchmarks, based on 50 years experience, do not attempt to provide an overall performance rating (the Lies, Damned Lies and Benchmarks type), as it is meaningless in representing the diverse variety of user activities. The programs are intended to identify best and worst performance characteristics that might explain why a particular application is fast or slow. CPU Benchmarks - The first set the Classic Benchmarks that were the original programs that set standards of performance for computers, comprising Whetstone, Dhrystone, Linpack (including NEON-Linpack) and Livermore Loops. Memory Benchmarks - Next are programs that measure performance with data from caches and RAM. MemSpeed (including NeonSpeed variant), BusSpeed and RandMrm all use the same range of data sizes beteen 4 KB and 64 MB. Then there is a Fast Fourier Transform benchmark with multiple data sizes. MultiThreading Benchmarks - These all measure performance using 1, 2, 4 and 8 threads. The first are MP-Whetstone, MP-Dhrystone and MP-Linpack (including NEON-Linpack-MP). The next batch all use memory sized 12.8 KB, 128 KB and 12.8 MB, comprising MP-MFLOPS (including NEON-MFLOPS MP), MP-BusSpeed and MP-RandMem. Older Benchmarks - These include graphics and SD drive benchmarks. Windows 10 Tablet - The C code part of the benchmarks has been used as the basis of programs compiled, as 32 bits and 64 bits, to run on Intel processors via Windows. Results are included below for comparison purposes, but performance might not be the same as that from Android versions running on the same Intel processor model (See system W1). The benchmark execution files are in WinTablet.zip. March 2016 - A second Windows 10 tablet was obtained, using the same Atom CPU, with the added dual boot option to use Android. This uses a 64 bit Linux kernel but, unfortunately, Android is a 32 bit variety. Results for both Operating Systems are in=ncluded below. October 2016 - Results now include some using Remix OS for PC that runs Android applications on compatible Intel-based PCs. These include using this for a second boot option on one of the Windows 10 tablets. May 2017 - Android 7.0 results included, with all 32 bit benchmarks being run on Cortex-A53 based P37. All processor dependent benchmark results were essentially the same as those from Android 6, except Java varieties, where the Whetstone benchmark speed improved considerably. June 2017 - Floating point and integer arithmetic stress tests were produced. These are multithreaded programs, where number of threads, data size and running time can be defined, plus operations per word for the floating point tests. Unlike previous benchmarks, these display results continuously, over 10 second periods.
Logged ConfigurationFollowing are examples of ARM and Intel based system information included in the log files.
Whetstone Benchmark - NativeWhetstone2.apk, Java Whetstone.apk, WhetsNN.exeThis provides an overall rating in MWIPS, plus separate results for the eight test procedures in MFLOPS (floating point) and MOPS (functions and integer). For full details and results via Windows. Linux, Android and via different programming languages, see all modern Whetstone Benchmark results (including Windows tablet versions running on desktop PCs), also British Library Archived Files Results up to 2012 and Whetstone Benchmark History and Results from the 1960’s. Below are results from the original benchmark for comparison with the new one, compiled for 32 bit systems. The initial aim was to show performance improvements of using native code on Intel Atom processors, rather than via the Houdini compatibility translation, where speeds of system A1 were around twice as fast. Note original ARM version (from here) performance differences on A1, Intel Atom based tablet, following upgrade to Android 5.0. The downside of the later gcc 4.8 compilation were much slower MWIPS ratings using ARM CPUs. This was due to the extremely slow speeds on the EXP tests that dominate overall running time. On a given platform, as other CPU only benchmarks, performance tends to be proportional to CPU MHz. Considering this, the particular code appears to suit the Qualcomm Snapdragon 800 and shows no real advantage of the ARM v8-A53 over the V7 varieties. In fact, the EXP test also uses the SQRT function. A test in Livermore Loops Benchmark also uses this function in a test that produces unexpectedly worse minimum speed on the same systems (T7, T11, T22 - ARM/Intel 32 Bit Version). Java results are also included, particularly to show the effects of Android 5 using ART virtual machine instead of Dalvik. For this particular benchmark, there are gains and losses, but all are slower than the native compiled versions. A5 and W2 Dual Boot Tablet - Differences in results from Microsoft and Android compilers are reflected. Atom Z8300 results for W2 are slower than W1. Later, similar results were obtained. 2016 - Note fast Core i7 results using Android via REMIX for PC and slow Java speeds with Android 6.0, that might be due to later Java, as shown with Intel/Windows Version results.
Dhrystone Benchmark - Dhrystone2i.apk, Dhry2NN.exeThe Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). Further details of the Dhrystone benchmark, and results from Windows and Linux based PCs, can be found in all modern Dhrystone Benchmark results (including Windows tablet versions running on desktop PCs) with those up to late 2012 also in British Library Archives. The shown ratio, MIPS/MHz, is often quoted, with this depending on compiler optimisation (or over-optimisation) but is normally constant using the same benchmark on the same range of processors. Using native x86 code, performance of the Intel Atom based tablet A1 is 30% faster than the original ARM to Intel translated program but, on the other systems, the newer 32 bit compilations are slower. At least tablet T22 is nearly twice as fast when compiled for 64 bit operation. Following an upgrade to Android 5.0, A1 ARM to Intel translation produced performance equivalent to native code. Original can be obtained from here. 2016 - Note faster Android operation at 64 bits and REMIX Android on Core i7 outstanding speeds similar to Windows versions.
Linpack Benchmark - LinpackDP2.apk, LinpackSP2.apk, LinpackJava.apk,
|
System ARM MHz Android LinpackDP LinpackSP NEONLinpack LinpackJava See MFLOPS MFLOPS SP MFLOPS MFLOPS Original ARM Version T7 v7-A9 1200 4.1.2 151.05 201.30 376.00 56.44 T22 v8-A53 1300 5.0.2 156.70 184.09 393.34 86.09 T11 v7-A15 1700 4.2.2 459.17 803.04 1334.90 143.06 T21 QU-800 2150 4.4.3 389.52 751.95 1250.14 340.44 A1 Z3745 1866 4.4.2 168.16 296.63 443.42 252.49 A1 Z3745 1866 5.0 253.83 293.20 680.85 166.09 A5 ## Z8300 1840 5.1 238.04 318.00 746.36 174.67 R1=Atom Z8300 1840 6.0.1 781.17 37.65 R2 Core i7 3900 6.0.1 3717.42 222.23 ARM/Intel 32 Bit Version T7 v7-A9 1200 4.1.2 159.34 199.84 346.78 T7 v7-A9 1200 5.1.1 160.25 198.96 346.12 89.50 T22 v8-A53 1300 5.0.2 172.28 180.64 407.08 T22 v8-A53 1300 5.1 178.04 187.03 421.86 91.28 P37 v8-A53 1500 6.0.1 207.64 219.03 480.21 23.25 P37 v8-A53 1500 7.0 208.00 220.13 474.21 112.14 T11 v7-A15 1700 4.2.2 826.36 952.88 1411.86 See above T21 QU-800 2150 4.4.3 629.92 790.83 1325.00 See above A1 Z3745 1866 4.4.2 362.63 408.87 900.17 See above A1 Z3745 1866 5.0 363.98 406.59 900.64 See above A5 ## Z8300 1840 5.1 609.39 644.32 942.12 See above R1=Atom Z8300 1840 6.0.1 632.56 682.08 1000.00 See above R2 Core i7 3900 6.0.1 3442.00 1838.99 N/A See above ARM/Intel 64 Bit Version T22 v8-A53 1300 5.0.2 338.00 479.69 505.12 T22 v8-A53 1300 5.1 347.55 492.78 520.79 See above P33 QU-810 2000 5.0.2 1277.76 R1=Atom Z8300 1840 6.0.1 875.82 1473.16 N/A See above R2 Core i7 3900 6.0.1 5152.85 3950.31 N/A See above Intel/Windows 32 Bit Version W1 Atom Z8300 1840 Win 10 615.80 See 64b W2 ## Z8300 1840 Win 10 613.50 See 64b PC Core i7 3900 Win 10 3453.72 See 64b Intel/Windows 64 Bit Version W1 Atom Z8300 1840 Win 10 638.75 254.73 W2 ## Z8300 1840 Win 10 636.00 265.66 PC Core i7 3900 Win 10 3603.86 465.32 ## A5 and W2 Same Dual Boot Tablet =Atom R1 and w1 Same Tablet R2 and PC same System |
The Livermore Loops comprise 24 kernels of numerical application with speeds calculated in MFLOPS. A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. As for other of these benchmarks, details and results from various hardware and software platforms are provided in all modern Livermore Loops Benchmark results (including Windows tablet versions running on desktop PCs), with results up to late 2012 in British Library Archives.
MFLOPS/MHz - The first set of the following comparisons are derived from shown MFLOPS of the 24 kernels for each system. divided by CPU MHz, and compared to those from T7 Cortex-A9 CPU. They can indicate the effectiveness of particular levels of hardware and compiler technology. The low minimum speeds occur in the only loop that uses the SQRT function, where the Whetstone Benchmark is also slow on the same systems. The second Cortex-A53 is running under 64 bit Android that might make a difference. Performance of the sytems with better minimum values appear enhanced by the slow T7 Cortex-A9. On average values for ARM CPUs, Qualcomm 800 and Cortex-A15 are somewhat faster. The Intel CPUs are faster on a per MHz basis, with Core i7 being far superior. Note that Android and Windows performance is quite similar for the latter.
64 Bit vs 32 Bit - At least as far as average speeds are concerned, working at 32 bits and 64 bits produces similar performance on Intel based devices but 64 bits can be much faster with ARM processors. Note that Intel CPUs can use the same SSE type SIMD instructions at both settings.
Native 32 Bit vs Original Code - The original benchmarks were compiled for ARM CPUs, producing Intel instructions via the Houdini conversion layer. In this case, performance was much better using native code compilation. ARM speeds were effected by using a later version of the compiler.
Original ARM only version can be obtained from here.
MFLOPS/MHz vs Cortex-A9 Avg Min Max T11 Cortex-A15 Android 32 1.38 0.90 2.51 T22 Cortex-A53 Android 32 0.83 0.93 0.92 P37 Cortex-A53 Android 32 0.95 2.17 0.96 T21 Qualcomm 800 Android 32 1.13 2.34 1.63 A1 Atom Z3745 Android 32 1.57 3.71 1.67 A5 Atom Z8300 Android 32 1.61 3.24 1.82 R1 Atom Z8300 Android 32 1.62 3.07 1.96 W2 Atom Z8300 Windows 32 1.66 4.47 1.62 R2 Core i7 Android 32 3.23 4.10 4.22 PC Core i7 Windows 32 3.68 5.07 4.52 64 Bit / 32 Bit Avg Min Max T22 Cortex-A53 Android 64/32 1.47 3.61 1.96 R1 Atom Z8300 Android 64/32 1.04 1.08 0.95 R2 Core i7 Android 64/32 1.18 1.00 1.72 W2 Atom Z8300 Windows 64/32 0.97 0.78 1.09 PC Core i7 Windows 64/32 0.96 0.74 1.06 Native/Original A1 Atom Z3745 Android 32 1.92 2.49 3.17 T7 Cortex-A9 Android 32 1.01 0.97 0.39 T11 Cortex-A15 Android 32 1.13 0.91 0.38 T21 Qualcomm 800 Android 32 1.08 1.00 1.12
System CPU MHz Android MFLOPS 24 Loops Original ARM Version ---------------------------------------------------------------- A1 Z3745 1866 4.4.2 9.5 secs 201.2 257.3 237.5 205.6 122.5 180.0 308.3 450.0 535.3 370.4 104.8 77.1 Max Average Geomean Harmean Min 80.0 95.1 153.8 136.4 202.0 268.9 535.8 201.9 172.4 146.7 48.8 179.5 209.7 145.0 95.0 254.2 51.3 A1 Z3745 1866 5.0 9.9 secs 374.9 274.8 327.6 295.6 247.9 227.8 468.5 538.6 569.2 396.2 167.9 141.9 Max Average Geomean Harmean Min 109.6 114.5 210.5 150.5 250.6 333.4 569.8 266.6 233.5 199.8 59.9 287.9 238.0 261.3 114.9 372.8 64.0 T7 v7-A9 1200 4.1.2 10.0 secs 241.7 233.4 383.5 388.7 98.4 147.1 293.1 258.5 314.6 181.1 99.1 95.3 Max Average Geomean Harmean Min 80.6 68.1 171.6 226.9 346.2 176.9 391.9 202.1 181.3 160.9 68.1 202.6 184.9 119.5 102.1 200.9 88.5 T11 v7-A15 1700 4.2.2 10.0 secs 646.8 671.1 839.9 789.7 176.2 671.6 1078.4 1243.4 1018.8 367.0 130.0 165.9 Max Average Geomean Harmean Min 117.6 210.7 370.5 521.1 657.3 625.4 1252.8 476.0 375.8 288.8 90.8 270.8 269.1 458.3 196.3 432.5 112.7 T21 QU-800 2150 4.4.3 10.0 secs 570.4 624.2 915.6 861.4 175.5 545.4 636.9 911.1 750.6 293.9 130.5 207.0 Max Average Geomean Harmean Min 115.0 159.8 330.5 327.1 608.7 592.8 1075.5 437.1 356.7 284.4 100.3 330.2 267.3 244.2 153.8 356.2 106.2 ARM/Intel 32 Bit Version ------------------------------------------------------------ A1 Z3745 1866 4.4.2 9.5 secs 484.6 529.2 1031.2 929.2 274.5 365.6 661.9 873.1 825.6 479.1 612.9 520.7 Max Average Geomean Harmean Min 156.8 324.4 339.4 497.8 693.1 481.8 1031.2 480.0 429.8 378.6 154.7 373.0 329.1 388.6 181.8 650.1 169.2 A5 ## Z8300 1840 5.1 9.6 secs 689.4 701.4 1108.3 873.6 230.1 488.4 662.2 770.0 876.7 404.9 439.6 428.2 Max Average Geomean Harmean Min 141.2 280.7 293.4 466.1 540.3 432.7 1108.3 495.8 433.6 370.6 133.2 313.9 307.8 649.7 176.1 662.0 148.3 T11 v7-A15 1700 4.2.2 10.0 secs 496.9 814.9 843.7 801.7 175.5 188.6 1223.8 1411.4 760.3 452.5 132.7 120.7 Max Average Geomean Harmean Min 107.1 264.7 34.3 529.0 592.6 728.2 1411.4 471.2 342.1 219.5 34.3 275.2 266.8 530.7 198.8 502.8 117.8 T21 QU-800 2150 4.4.3 10.1 secs 640.9 814.9 813.8 808.4 201.6 182.0 643.0 1158.9 779.9 351.4 133.1 176.2 Max Average Geomean Harmean Min 113.6 178.4 286.5 294.7 516.7 667.5 1159.4 446.9 356.0 280.3 112.3 327.5 281.7 297.9 153.6 613.1 117.0 T7 v7-A9 1200 4.4.2 10.2 secs 245.2 268.8 394.7 390.7 118.2 157.2 297.4 308.1 344.7 226.7 90.8 74.7 Max Average Geomean Harmean Min 85.6 81.7 26.9 227.5 338.9 240.3 396.6 207.6 175.6 136.1 26.8 204.9 180.6 179.9 110.8 271.4 78.5 P37 v8-A53 1500 6.0.1 9.8 secs 201.7 293.7 331.7 327.5 135.5 137.1 346.5 474.9 451.5 271.6 149.7 74.9 Max Average Geomean Harmean Min 81.2 104.5 236.3 278.4 411.1 294.2 474.9 237.4 208.3 179.9 72.7 208.0 245.7 148.2 128.8 351.7 99.9 P37 v8-A53 1500 7.0 9.7 secs 198.6 295.5 331.3 325.1 131.7 140.5 341.5 475.4 452.1 241.8 149.5 74.8 Max Average Geomean Harmean Min 81.8 105.2 237.0 279.0 412.9 295.1 475.4 237.0 208.1 180.0 72.9 208.8 238.9 131.1 133.1 353.3 100.4 ARM/Intel 32 Bit Version Then 64 Bit ------------------------------------------------ T22 v8-A53 1300 5.0.2 9.7 secs 163.4 243.4 272.1 270.3 109.5 111.2 282.2 389.0 360.6 219.6 124.0 61.8 Max Average Geomean Harmean Min 67.6 87.4 27.3 224.2 340.1 241.9 393.4 188.3 158.3 124.6 27.1 168.5 198.8 120.2 120.6 277.7 79.1 R1=Atom Z8300 1840 6.0.1 9.4 secs 746.6 767.9 1194.9 986.9 249.3 520.5 722.7 840.9 978.5 370.5 451.5 450.1 Max Average Geomean Harmean Min 151.3 301.3 331.4 524.9 608.1 465.5 1194.9 501.0 435.1 366.6 126.1 352.1 316.8 578.8 181.8 695.3 166.3 R2 Core i7 3900 6.0.1 8.4 secs 3664.3 3433.9 2498.9 2509.6 552.5 2201.3 4618.0 5337.8 5345.9 2426.9 1307.3 1888.8 Max Average Geomean Harmean Min 670.6 1211.5 2033.5 1804.4 2382.0 3571.5 5441.5 2259.0 1845.3 1445.9 356.9 840.6 968.8 2967.6 1112.8 1591.4 356.9 ARM/Intel 64 Bit Version ------------------------------------------------------------ T22 v8-A53 1300 5.0.2 9.7 secs 451.4 191.4 243.2 272.4 144.9 144.5 749.4 411.1 453.6 261.1 138.0 206.1 Max Average Geomean Harmean Min 122.5 130.1 215.0 249.8 411.6 395.4 772.2 265.9 232.5 206.3 97.8 241.7 248.1 152.8 118.7 317.2 103.7 R1=Atom Z8300 1840 6.0.1 9.4 secs 881.6 742.9 1130.2 928.7 236.9 554.1 869.1 795.4 854.7 433.5 198.4 604.5 Max Average Geomean Harmean Min 215.7 292.9 320.3 520.5 628.6 528.5 1130.2 524.1 451.1 380.6 136.1 321.3 290.4 692.1 205.4 698.3 164.8 R2 Core i7 3900 6.0.1 8.9 secs 9376.3 3394.8 2496.3 2523.0 559.6 2219.9 8891.9 5719.5 5828.1 2749.2 439.9 3146.1 Max Average Geomean Harmean Min 1182.7 1272.5 2282.8 2332.7 2379.4 5722.9 9376.3 2933.0 2172.0 1556.1 357.0 1068.6 966.6 2966.5 1435.5 1590.7 357.0 Intel/Windows 32 Bit Version -------------------------------------------------------- W1 Atom Z8300 1840 MHz Win10 721.4 702.3 862.7 988.7 245.3 489.6 875.8 794.8 980.5 441.3 201.1 446.7 Max Average Geomean Harmean Min 201.0 240.8 299.8 499.9 603.5 459.3 988.7 504.9 448.8 395.8 189.6 446.6 336.0 607.8 199.0 705.3 277.8 W2 ## Z8300 1840 MHz Win10 749.7 731.3 894.0 988.1 251.2 489.2 883.7 797.3 968.3 434.5 200.5 454.0 Max Average Geomean Harmean Min 202.9 240.7 301.1 521.4 604.3 457.0 988.1 503.5 446.9 393.5 183.5 443.6 333.7 587.1 200.6 697.1 276.4 PC Core i7 3900 MHz Win10 4752.7 3624.0 2593.7 2764.2 564.5 1590.3 5071.8 5284.2 5569.6 2784.2 441.5 1939.4 Max Average Geomean Harmean Min 931.3 1205.6 2284.2 2372.1 2435.2 3500.0 5821.7 2512.1 2102.9 1712.0 441.5 1068.8 1880.4 2819.7 1529.1 1590.4 1616.5 Intel/Windows 64 Bit Version -------------------------------------------------------- W1 Atom Z8300 1840 MHz Win10 655.2 651.9 728.1 688.9 217.3 457.6 732.2 735.7 965.5 378.3 170.7 381.6 Max Average Geomean Harmean Min 196.6 196.4 213.0 434.4 522.6 420.6 965.5 433.6 375.8 320.0 117.2 385.6 283.1 572.9 156.7 584.5 129.7 W2 ## Z8300 1840 MHz Win10 743.0 734.7 834.0 808.1 233.4 547.9 878.6 857.4 1074.4 440.8 201.7 450.4 Max Average Geomean Harmean Min 215.4 228.6 247.9 500.4 608.1 484.8 1074.4 500.3 433.7 369.8 143.4 440.7 327.7 650.4 180.8 682.2 151.6 PC Core i7 3900 MHz Win10 4566.1 3465.7 2459.1 2748.1 565.1 2308.3 6142.4 5354.0 5195.9 2518.0 417.8 1838.7 Max Average Geomean Harmean Min 941.0 1096.4 2166.0 2180.9 2291.5 3357.0 6142.4 2514.4 2014.9 1500.5 324.9 1005.1 1780.7 2871.2 1311.6 1600.5 324.9 ## A5 and W2 Same Dual Boot Tablet =Atom R1 and W1 Same Tablet R2 and PC same System |
This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing double precision MB/second by 8 and 16, for the two tests, and single precision speeds by 4 and 8. Assembly listings for integer tests show that Millions of Instructions Per Second (MIPS) can be found by multiplying MB/second by 0.78 with 2 adds and 0.66 for the other test. Cache sizes are indicated by varying performance as memory usage changes. For more details and older results see here, with results up to 2013 in British Library Archives.
The native ARM/Intel results, on Intel Atom based A1, averaged 44% faster than the original translated speeds via L1 cache data, 27% using L2 and 14% from RAM. Running under Android 5.0, the translated benchmark speeds were similar to the new version, in most cases. (Original ARM only version can be obtained from here).
Initial measurement, running the new 32 bit version on ARM CPUs, produced similar results to the original benchmark.
First results, to provide 64/32 bit comparisons on ARM CPUs, were on Tablet T22, where average 64/32 bit speed ratios, were 2.20 times, using cached data, and 1.58 times from RAM.
The benchmark is based on, and is similar to, my original Windows MemSpeed bencmarks, where details and results can he found here. These can be compared with the new Windows tablet version, from later compiler, with 32 bit and 64 bit results included below. Android results R1 an R2 are via via REMIX for Intel PCs, running at 64 bits.
Dual Booting - Results include those for Windows and Android running on the same system. They are dual, boot A5 and W2, alternative boot W1 and R1 and alternative boot PC and R2.
Following the results are processor technology comparisons with the ARM Cortex-A9 CPU, based on MB/second divided by CPU MHz, demonstrating that each has its strengths and weaknesses. See comments in comparison table.
Results are dependent on the particular compiler used. Those for the Windows version were produced by an earlier compiler and are relatively slow at 64 bits. An example of differences is for the first test, with a source code loop, in double precision, that contains four multiples and four adds. Assembly code produced for Intel CPUs has four scalar SSE2 multiplies and four adds at 32 bits, with two SIMD SSE2 instructions of each at 64 bits. Those for ARM has four fmacd floating-point multiply-accumulate to double precision registers at 32 bits and two fmla fused multiply-add instructions to vector registers at 64 bits. The result is much faster performance at 64 bits.
In principle, SIMD instructions could also be used at 32 bits for Intel, but fmla is only available at 64 bits with ARM.
This benchmark carries out the same calculations as the MemSpeed Benchmark measuring data reading speeds in Mega Bytes per second, with functions accessing arrays of cache and RAM based data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m] single precision floating point with x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can calculated by dividing single precision MB/second by 4 and 8, for the two tests. The first set of calculations use normal functions followed by some using NEON Intrinsic Functions. The last two columns are NEON only results. For further details and results see android neon benchmarks.htm.
On tablet A1, with the Intel Atom CPU, the 32 bit native code version produced some significant performance gains over the original ARM benchmark (available from here), but rerunning this via Android 5.0 produced much faster speeds, some better than native code compilation.
The later compiler produced some slower and some faster speeds on ARM based tablets.
Details are provided for the 64 bit version on T22. As with NEON-Linpack, many results from 32 bit and 64 bit compilations, via NEON intrinsic functions, were similar. With normal code, the 64 bit compilations were up to near four times faster than those at 32 bits.
Following the results are further MB per second/CPU MHz comparisons. Subject to variations due to cache occupancy, the comparisons for normal calculations are the same as MemSpeed. Then, more modern processors performed relatively better, using NEON instructions.
See comments in
comparison table.
This benchmark (based on PC version with details and results here) is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum speed can be estimated by multiplying this minimum value by 16. With this burst rate, measured speed at 32 word and 16 word increments are likely to be the same. Cache sizes are indicated by varying speed as memory use changes. Note, with smallest L1 cache demands, measured speed can be low due to overheads when reading little data. For more details and further results see here, with results up to 2013 in British Library Archives.
Comparing results from different versions, on a particular system, there can be unusual differences on burst reading speeds. Those quoted here are for the most important measurements for reading all data.
On Intel Atom based tablet A1, there was little difference between the old ARM version, with conversion, to the new 32 bit native code program, nor using Android 5.0 instead of 4.4.
Average revised 32 bit version performance improvements, via caches/RAM, were 8%/17% for T7 Cortex-A9, 11%/27% for T11 Cortex-A15 and 27%/-8% on T21 Snapdragon 800. Corresponding T22 Cortex-A53 64/32 bit improvements were 61%/25%.
After the results are further MB per second/CPU MHz comparisons, for this integer data streaming benchmark that can demonstrate maximum data transfer speed from RAM. As the latter might not be dependent on CPU speed, direct MB/second comparisons are also provided. These are dependent on bus speed, 32 bit or 64 bit bus width and whether one or two channels are available, one problem being that is it is often difficult to identify what is provided. Note that multithreaded benchmarks might be needed to fully utilise memory bandwidth - see later results.
Results of the Windows version are also included for a tablet and, for comparison purposes, a desk top PC with 4 memory channels. Intel systems have 64 bit bus widths.
Intel CPUs - Results on Atom Z8300 are similar via different compilers/Operating System, using Android A5, REMIX/Android R1 and R2, plus Windows W1 and W2. Of those available 32 bit and 64 bit versions have similar performance. RAM speeds tend to be faster than those on ARM based systems, due to 64 bit bus widths. As would be expected, Core i7 speeds are superior, based on MB/second per MHz and, particularly, on RAM MB/second comparisons. See also comments in comparison table.
ARM CPUs - With 32 bit versions, MB/second per MHz comparisons, with the older Cortex-A9, tend to be worse using L1 cache but better from L2 and RAM. The only 64 bit version results available are for T22, Cortex-A53, demonstrating faster L1 cache based tests, with lower improvements from L2 and RAM.
RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches. For more details and further results see here, with results up to 2013 in British Library Archives.
On tablet A1, with the Intel Atom processor, results for the new 32 bit version were essentially the same as the Houdini instruction conversion of original ARM code via Android 5, both averaging 30% improvement, over the original Android 4 speeds on read only tests, but similar with reading and writing. The latter pattern of improvements were also apparent for 64 bit versus 32 bit benchmark modes on tablet T22, with the ARM Cortex-A53 processor, but only using cache based data. The later 32 bit benchmark produced inconsistent gains and some losses, running on the other ARM compatible systems (up to October 2015).
The benchmark code is the same as used on the Windows and Linux PC versions, with details and results here, where some of these results are also included.
Further MB per second/CPU MHz comparisons are provided below, showing the usual variability in performance.
See comments in
comparison table.
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results are displayed and saved in a log file (FFT-tests.txt), with FFT running time in milliseconds. Besides Android, the bechmarks are available to run via Windows and Linux. Two versions are available FFT1, original version and with optimised C code as FFT3c. Further details, results, and links for benchmarks and source code are in FFTBenchmarks.htm. The Android benchmarks are only available in the later 32 or 64 bit mode. Example results are below.
Version 3 Improvements - All systems produced significant gains, using the optimised benchmark, but some struggled running the smaller FFTs.
64 Bit Differences - Initially, only one tablet was available that runs at 64 bits, a Lenovo TAB 2 A8-50F using Android 5. In this case, 64 bit and 32 bit results were similar for the non-optimised version, but averaged 40% faster with the more efficient code. Later results, using Intel CPUs, produced similar performance via 32 bit and 64 bit versions.
Double and Single Precision - Using 64 bit DP numbers, instead of 32 bit for SP, can produce much slower speeds when a lower level cache space is exceeded and also though using more RAM based data. Other than these, there are slower and faster results.
Android Upgrades - First identified upgrades to Android 5, indicated better average performance but with wide variations on individual tests.
Intel/Windows 10 - 32 bit and 64 bit Intel/Windows results are now included for Atom and Core i7 CPUs.
A5 and W2 Dual Boot Tablet - Android and Windows speeds are again generally, similar, except for Version 3, where W2 is faster. Again W2 results using RAM are slower than W1.
Intel CPU Windows and REMIX/Android performance was quite similar.
Single Precision and Double Precision Results in milliseconds T7 Nexus 7 T11 VOYO A15 T21 Kindle HDX 7 Cortex-A9 1.2 GHz Cortex-A15 1.7 GHz Qualcomm 800 2.1 GHz L1/L2 KB 32/1024 32/2048 16/2048 Android 4.1.2 Android 5.0.2 Android 4.2.2 Android 4.4.3 32 Bit 32 Bit 32 Bit 32 Bit K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.64 0.38 0.18 0.21 0.10 0.17 0.14 0.18 2 0.77 0.97 0.40 0.67 0.22 0.36 0.33 0.53 4 1.14 1.77 1.13 1.86 0.57 0.90 1.03 1.30 8 3.28 4.40 3.26 5.12 2.12 2.31 2.50 3.09 16 7.76 9.39 7.74 9.69 4.71 5.97 1.95 2.20 32 17.80 22.26 18.09 22.73 10.76 11.37 4.18 5.77 64 61.05 140.58 41.64 84.68 20.10 49.70 14.61 20.01 128 153.19 289.15 139.98 274.54 77.67 213.70 33.19 60.52 256 450.16 645.72 444.09 645.70 408.51 448.95 107.49 310.93 512 1084.11 1457.85 1102.20 1438.29 782.85 1101.70 584.54 497.23 1024 2388.33 3129.21 2388.56 3185.93 1799.89 2280.30 875.95 963.37 Version 3c.0 1 0.66 0.21 0.27 0.25 0.23 0.08 0.35 0.07 2 1.09 0.55 0.65 0.65 0.50 0.17 0.81 0.19 4 2.67 1.38 1.67 1.45 1.07 0.41 1.66 0.41 8 3.56 3.09 4.30 3.23 2.41 0.90 1.08 0.90 16 7.78 9.08 8.33 10.35 5.26 3.23 3.36 2.66 32 17.85 22.02 19.23 25.38 11.88 8.88 6.54 6.07 64 39.52 52.11 46.41 58.90 23.75 23.08 12.57 13.56 128 89.73 118.45 103.31 128.44 49.74 53.11 27.41 33.09 256 203.34 258.56 221.99 267.12 100.25 120.66 63.39 72.55 512 437.25 552.00 464.30 558.13 226.76 264.30 150.38 156.30 1024 918.32 1175.65 933.05 1182.49 505.68 586.18 306.32 337.07 T22 Lenovo TAB 2 A8-50F P37 Lenovo Moto G4 ARM Cortex-A53 1.3 GHz ARM Cortex-A53 1.5 GHz L1/L2 KB 32/512 32/512 Android 5.0.2 Android 6.0.1 Android 7.0 64 Bit 32 Bit 32 Bit 32 Bit K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.20 0.21 0.21 0.21 0.21 0.21 0.17 0.18 2 0.44 0.50 0.43 0.53 0.45 0.51 0.38 0.40 4 1.06 1.26 1.03 1.24 1.16 1.33 0.90 1.17 8 2.52 3.03 2.52 2.85 2.62 2.59 2.29 2.45 16 5.89 6.41 5.68 6.60 5.06 6.09 4.95 5.64 32 14.09 25.29 13.05 30.59 14.10 30.26 11.25 27.12 64 49.97 109.32 45.80 92.16 52.78 113.24 40.72 105.27 128 188.37 256.98 153.25 221.98 173.52 256.88 160.31 236.64 256 447.62 583.33 362.62 504.60 409.24 578.50 383.80 544.43 512 826.77 1019.84 840.44 1107.14 917.86 1265.79 876.99 1198.03 1024 1846.27 2299.97 1835.82 2423.72 2047.09 2750.92 1972.58 2683.18 Version 3c.0 1 0.17 0.20 0.34 0.20 0.28 0.17 0.29 0.16 2 0.37 0.48 0.74 0.47 0.65 0.39 0.64 0.38 4 2.55 1.07 1.62 1.06 1.42 0.85 1.44 0.86 8 1.93 2.40 3.63 2.33 3.35 1.95 3.25 1.95 16 4.59 5.64 8.07 9.12 8.20 8.13 6.95 7.86 32 10.68 15.40 18.20 22.93 15.99 18.95 15.93 19.43 64 28.17 36.16 45.33 50.41 37.84 43.62 37.29 42.46 128 66.87 82.23 101.38 112.46 84.06 96.71 83.55 95.01 256 148.69 193.91 222.13 264.79 190.32 217.23 186.20 213.21 512 347.25 424.72 501.52 550.88 425.97 474.15 416.25 462.13 1024 760.74 960.28 1085.65 1206.83 928.38 1026.33 897.72 1001.54 Intel CPUs Android Dual Boot with W2 A1 Asus MemoPad 7 A5 Teclast X98 Plus Atom Z3745 1.86 GHz Atom Z8300 1.84 GHz L1/L2/L324/1024 KB 24/1024/0 Android 4.4.2 Android 5.0 Android 5.1 32 Bit 32 Bit 32 Bit K Size SP DP SP DP SP DP Version 1.0 1 0.09 0.11 0.10 0.09 0.09 0.12 2 0.21 0.29 0.16 0.23 0.18 0.31 4 0.61 0.66 0.48 0.52 0.61 0.57 8 1.35 1.17 1.07 1.17 1.17 1.56 16 3.20 2.57 2.38 2.59 3.15 3.34 32 5.41 5.75 5.30 6.02 6.65 9.20 64 11.74 29.95 11.77 28.31 15.62 45.48 128 67.54 99.31 54.05 97.58 49.67 110.14 256 194.13 225.94 189.11 219.98 222.78 264.65 512 438.49 501.59 433.06 487.49 521.72 602.38 1024 970.84 1121.61 968.37 1116.94 1187.13 1433.75 Version 3c.0 1 0.09 0.08 0.10 0.08 0.15 0.13 2 0.21 0.20 0.16 0.20 0.20 0.21 4 0.50 0.43 1.66 0.43 0.45 0.52 8 1.12 0.96 0.87 0.96 0.97 1.05 16 2.64 2.86 2.01 2.34 2.14 2.61 32 4.87 5.56 4.51 5.73 4.82 6.53 64 11.11 15.03 10.01 14.47 11.10 17.79 128 27.29 34.77 26.80 33.71 29.95 43.74 256 62.57 72.93 61.16 72.04 77.43 86.13 512 132.64 157.56 131.10 152.68 152.95 185.74 1024 282.99 332.37 274.01 363.60 314.54 460.91 Intel CPUs - Windows or Windows and Android W2 Teclast X98 Plus Atom Z8300 1.84 GHz KB 24/1024/0 Windows 10 32 Bit 64 Bit K Size SP DP SP DP Version 1.0 1 0.11 0.12 0.10 0.12 2 0.24 0.34 0.22 0.33 4 0.65 0.74 0.72 0.74 8 1.46 1.66 1.37 1.68 16 3.25 3.61 3.21 3.78 32 7.33 8.10 6.98 7.97 64 16.40 28.29 15.96 29.96 128 38.56 121.13 76.10 136.39 256 232.47 266.35 259.73 298.24 512 565.20 597.42 596.50 629.28 1024 1205.59 1450.84 1288.20 1439.44 Version 3c.0 1 0.08 0.09 0.09 0.08 2 0.19 0.23 0.18 0.19 4 0.45 0.51 0.48 0.43 8 1.00 1.12 1.08 0.93 16 2.67 2.68 2.51 2.50 32 5.54 5.59 5.74 6.06 64 10.64 14.72 12.54 14.77 128 32.82 36.71 28.28 36.95 256 66.71 77.48 67.25 78.47 512 157.72 153.43 150.14 168.63 1024 332.39 365.36 300.79 370.48 W1 Pipo W1S Tablet R1/W1 Pipo W1S Tablet Atom Z8300 1.84 GHz Atom Z8300 1.84 GHz L1/L2/L3 KB 24/1024/0 KB 24/1024/0 Windows 10 REMIX/Android 32 bit 64 bit 32 bit 64 bit K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.11 0.12 0.10 0.12 0.31 0.37 0.29 0.37 2 0.24 0.45 0.23 0.35 0.84 0.85 0.65 1.04 4 0.67 0.75 0.63 0.74 1.52 1.46 1.91 2.37 8 1.44 1.80 1.50 1.69 2.56 2.65 4.26 5.31 16 3.29 3.71 3.16 3.65 4.46 3.59 7.42 6.24 32 7.32 7.83 5.94 6.98 6.12 7.93 8.26 6.98 64 14.36 31.51 13.95 25.44 13.03 35.52 17.14 32.47 128 46.45 120.79 50.90 115.44 69.30 105.02 73.44 117.75 256 209.39 235.36 203.02 266.34 228.05 244.75 237.24 295.39 512 455.89 534.68 491.49 576.91 536.19 620.66 502.33 626.71 1024 1024.78 1195.81 1040.39 1182.20 1086.25 1287.63 1039.91 1209.47 Version 3c.0 1 0.08 0.08 0.08 0.09 0.16 0.08 0.26 0.08 2 0.19 0.20 0.20 0.22 0.37 0.21 0.60 0.23 4 0.46 0.44 0.46 0.48 0.89 0.46 1.45 0.44 8 1.20 0.97 1.06 1.07 1.58 1.03 3.21 0.97 16 2.27 2.26 2.26 2.25 3.21 2.53 7.37 2.29 32 5.11 5.54 5.31 5.83 5.28 6.13 11.42 5.62 64 12.48 14.29 11.22 15.59 12.13 18.74 13.93 14.66 128 27.62 34.25 27.47 31.65 31.28 37.99 28.97 31.81 256 71.32 70.99 62.74 67.95 72.23 81.63 57.01 66.84 512 143.07 144.60 140.50 146.76 155.62 196.93 122.36 140.30 1024 298.00 322.13 289.98 334.07 295.55 450.03 271.67 302.49 PC 2015 Top End Desktop PC R2/PC Corei7-4820K 3.9 GHz Corei7-4820K 3.9 GHz L1/L2/L332/256/10 MB 32/256/10 MB Windows 10 REMIX/Android 32 bit 64 bit 32 bit 64 bit K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.018 2 0.04 0.04 0.04 0.04 0.05 0.04 0.03 0.041 4 0.09 0.12 0.08 0.12 0.10 0.13 0.13 0.181 8 0.26 0.31 0.25 0.30 0.29 0.32 0.38 0.398 16 0.65 0.77 0.62 0.76 0.71 0.81 0.88 0.936 32 1.59 1.96 1.51 1.93 1.69 1.99 2.11 2.506 64 4.33 4.87 3.91 4.78 4.06 4.41 4.78 5.037 128 9.94 10.57 9.21 10.60 9.19 9.92 9.31 9.772 256 21.87 22.00 21.01 22.06 20.68 21.92 19.70 21.974 512 45.09 55.15 44.72 58.29 45.07 52.85 43.68 56.312 1024 105.75 199.77 111.23 199.11 106.39 188.55 110.34 176.725 Version 3c.0 1 0.02 0.02 0.01 0.01 0.02 0.02 0.01 0.018 2 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.04 4 0.07 0.08 0.06 0.07 0.07 0.08 0.06 0.09 8 0.16 0.18 0.14 0.16 0.16 0.17 0.22 0.199 16 0.37 0.41 0.33 0.38 0.39 0.45 0.47 0.402 32 0.81 0.86 0.73 0.82 0.85 0.96 1.11 0.873 64 1.76 1.86 1.56 1.75 1.82 2.05 2.18 1.888 128 3.77 4.05 3.38 3.76 3.94 4.36 4.45 4.047 256 8.24 9.36 7.38 8.78 8.47 9.78 8.66 9.282 512 19.09 22.96 17.28 22.50 19.52 24.29 17.74 23.361 1024 45.68 57.37 42.19 56.66 47.35 57.59 43.23 56.682 |
For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same shared code, with separate variables. In this case, performance of each of the eight test functions and overall MWIPS ratings is invariably (nearly) proportional to the number of CPU cores available. The driving program checks that calculations on every thread produce consistent numeric results.
The gcc 4.8 based ARM/Intel version, running on the Intel Atom tablet, is rated at twice the speed of the original, due to the use of native code. The fixed point results indicate overoptimisation, but the test uses little of the overall time, this being mainly dependent on the Cos, Exp and third MFLOPS tests. Running the original ARM converted code version via Android 5.0, mainly produced better performance, but an overall lower rating, due to slower Cos and Exp tests, same as stand alone version above.
Also the same as the stand alone version, the new native ARM program was generally slower, running on tablets T7, T11 and T21,
On T22, with the Cortex-A53 CPU, the new 32 bit single thread tests appeared to be slower than the stand alone version, but that was not the case at 64 bits, apparently indicating a 64 bit performance gain.
A5 and W2 Dual Boot Tablet - Android and Windows speeds are significantly different, on some tests, because of the different compilers, particularly due to optimisation, but these tests do not affect the overall MWIPS results much. The latter averages 18% faster via Android but both show 2 and 4 thread performance gains of around 1.9 and 3.5 times.
Intel CPU Windows and REMIX Android, 32 bit and 64 bit versions - overall MWIPS ratings were all quite similar on a Core i7 (PC/R2) and also on an Atom (W1/R1), but there were variations an individual tests, due to different compilers and instructions used.
MP Efficiency - For those with four cores, average throughput, compared with one core, was 4.0 times on the Core i7 with REMIX and Windows, 3.5 times Atom with Windows, and 2.7 times REMIX, 3.7 times Android, then 3.9 times with ARM/Android. Core i7 (with Hyperthreading) recorded 6.9 timed with 8 threads, and the 8 core P37 6.5 times (1 to 4 cores at 1.5 GHz and 5 to 8 at at 1.2 GHz).
##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz Android MP-Whetstone Benchmark V1.0 17-Oct-2012 13.49 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1033.7 247.4 235.4 266.0 25.3 15.0 448.4 630.9 513.5 2T 2058.1 456.3 473.0 532.4 50.0 30.1 898.1 1198.4 1026.6 4T 4122.8 831.9 944.7 1064.6 100.7 60.1 1797.0 2392.2 2053.4 8T 4163.2 1016.0 948.2 1069.5 101.8 60.9 1808.0 2414.2 2051.5 Overall Seconds 5.28 1T, 5.34 2T, 5.42 4T, 10.81 8T #################### T7 ARM-Intel ##################### ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.32 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 602.2 242.3 242.3 140.2 27.2 4.9 482.8 1425.2 239.1 2T 1208.7 481.2 484.2 280.8 55.0 9.9 970.0 2869.6 478.7 4T 2398.7 805.4 966.7 562.5 109.5 19.5 1938.2 5722.5 957.1 8T 2429.1 974.6 1076.2 562.4 110.9 19.7 1981.5 5816.1 963.6 Overall Seconds 4.94 1T, 4.93 2T, 5.08 4T, 9.93 8T #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz Android MP-Whetstone Benchmark V1.1 06-Sep-2013 12.49 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1308.2 345.9 379.0 294.1 30.8 17.2 1351.4 1265.7 843.1 2T 2886.6 782.1 782.6 614.0 80.1 34.3 2775.2 2463.7 1667.5 4T 3086.0 998.6 788.1 610.6 79.2 44.5 3472.0 2526.4 2191.4 8T 2930.0 788.2 843.5 616.5 80.5 35.0 2846.0 2799.1 1686.2 Overall Seconds 3.54 1T, 3.30 2T, 6.62 4T, 13.16 8T #################### T11 ARM-Intel #################### ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.23 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 837.2 340.1 341.7 191.2 39.1 6.2 1521.1 2532.8 629.3 2T 1676.2 596.2 683.2 387.3 77.8 12.4 3056.9 5055.1 1263.6 4T 1697.7 687.5 869.4 394.5 78.1 12.4 2980.7 6518.4 1258.8 8T 1685.2 685.9 691.0 389.7 78.3 12.4 3086.3 5113.7 1262.0 Overall Seconds 4.06 1T, 4.07 2T, 8.12 4T, 16.19 8T #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android MP-Whetstone Benchmark V1.1 06-Jul-2015 10.42 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1877.1 645.2 642.6 524.1 44.0 22.3 1364.7 1572.1 898.9 2T 3668.6 1220.2 1262.4 1021.9 85.9 43.8 2663.5 3078.4 1753.4 4T 7426.9 2375.5 2474.7 2097.7 175.7 88.2 5052.6 6240.4 3555.0 8T 7706.6 2692.2 2746.2 2186.9 180.1 90.3 5822.5 6902.7 3681.3 Overall Seconds 4.44 1T, 4.62 2T, 4.64 4T, 9.00 8T #################### T21 ARM-Intel #################### ARM/Intel MP-Whetstone Benchmark V1.1 22-Jul-2015 12.02 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1598.0 512.1 508.7 311.7 43.6 22.1 1142.9 2123.3 598.4 2T 3161.2 960.0 996.7 614.2 86.7 43.8 2258.9 3820.9 1194.7 4T 6348.0 1593.5 2019.5 1231.5 174.2 88.5 4471.1 8139.4 2398.3 8T 6419.6 2058.2 2077.5 1252.6 175.0 88.7 4520.9 8875.0 2409.0 Overall Seconds 4.88 1T, 5.00 2T, 5.05 4T, 9.92 8T ###################### P37 32 Bit ###################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 Single Channel RAM, LPDDR3 933 MHz, 7.5 GB/second 8 x 32 KB L1 cache, 512 KB shared L2 cache ARM/Intel MP-Whetstone Benchmark V1.2 14-Nov-2016 11.41 Compiled for 32 bit ARM v7a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1050.5 304.5 268.3 171.7 35.2 17.7 459.4 905.5 338.1 2T 2134.1 540.5 524.8 350.5 68.1 34.9 1316.8 1881.0 679.3 4T 4214.0 1090.4 1022.0 689.4 136.1 70.4 2283.5 3850.4 1348.4 8T 7490.8 1969.8 1759.1 1243.8 244.5 125.3 4038.0 6074.2 2392.9 Overall Seconds 4.67 1T, 4.65 2T, 4.71 4T, 5.75 8T Android 7.0 ARM/Intel MP-Whetstone Benchmark V1.2 11-May-2017 10.28 Compiled for 32 bit ARM v7a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1069.2 300.7 252.9 176.7 31.5 19.4 646.9 942.2 338.6 2T 2103.2 543.2 490.9 343.7 64.1 38.7 1101.2 1830.5 675.9 4T 4212.2 1072.1 958.5 686.7 128.7 77.5 2251.5 3802.1 1354.9 8T 7564.2 1931.6 1744.2 1242.6 231.8 137.1 4243.9 6856.4 2461.7 Overall Seconds 3.99 1T, 4.06 2T, 4.06 4T, 4.94 8T ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.30 Compiled for 32 bit ARM v7a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 676.4 275.9 281.9 147.9 35.4 5.3 600.3 901.0 285.5 2T 1362.5 533.8 561.7 298.0 70.9 10.8 1203.1 1838.9 574.0 4T 2698.6 903.9 1071.7 594.4 141.2 21.5 2346.1 3305.5 1138.5 8T 2830.1 1463.2 1393.0 614.2 152.5 21.9 3243.9 4418.3 1171.4 Overall Seconds 4.95 1T, 4.94 2T, 5.11 4T, 10.09 8T ###################### T22 64 Bit ###################### ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.34 Compiled for 64 bit ARM v8a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1524.8 328.6 348.8 297.6 37.3 19.9 1462579 1867.2 1238.0 2T 3062.5 688.8 697.9 596.0 75.5 39.8 2097113 3726.7 2481.3 4T 6085.4 1214.9 1360.5 1185.4 150.5 79.4 2449153 7055.0 4951.8 8T 6222.4 1495.2 1545.6 1204.2 152.2 80.6 3869846 9218.8 5154.1 Overall Seconds 4.92 1T, 4.90 2T, 5.05 4T, 9.97 8T #################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MP-Whetstone Benchmark V1.1 04-Feb-2015 11.39 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 953.7 363.0 382.4 267.8 21.0 13.2 413.1 1842.4 392.3 2T 1921.2 726.0 663.5 541.4 42.6 27.0 816.1 3662.6 793.3 4T 3820.6 1419.2 1514.6 1081.5 84.1 54.0 1543.8 6292.4 1588.5 8T 4003.8 1912.9 1872.4 1114.1 86.5 56.4 2053.1 8292.6 1599.7 Overall Seconds 4.88 1T, 4.87 2T, 4.96 4T, 10.05 8T ################## A1 V1 Android 5.0 ################### Android MP-Whetstone Benchmark V1.1 05-Nov-2015 11.06 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 748.8 405.9 411.8 367.0 11.3 11.1 898.0 2129.1 459.8 2T 1468.5 822.0 827.5 744.8 22.4 22.2 1088.8 4228.4 924.5 4T 2781.0 1242.8 1638.6 1415.5 40.3 44.3 3404.6 8283.2 1852.1 8T 3050.7 1854.5 1831.0 1566.7 45.4 45.3 4519.7 10332.5 1844.5 Overall Seconds 5.00 1T, 5.09 2T, 5.72 4T, 10.30 8T #################### A1 ARM-Intel ###################### ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 17.35 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1916.9 691.4 691.3 497.2 35.3 27.6 10209.8 2787.3 1351.8 2T 3800.3 1377.6 1381.2 980.0 70.1 54.7 20248.0 5252.8 2748.7 4T 7604.9 2713.2 2711.8 1977.1 140.2 110.0 33906.3 9526.5 5550.8 8T 7798.1 3141.5 3627.2 2064.2 141.2 110.2 59590.6 12743.7 5711.5 Overall Seconds 4.94 1T, 5.00 2T, 5.06 4T, 10.11 8T ########### A5 ARM-Intel Dual Boot With W2 ############# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Android 5.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 ARM/Intel MP-Whetstone Benchmark V1.2 14-Apr-2016 17.09 Compiled for 32 bit Intel x86 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 2121.9 695.0 695.7 483.5 39.6 34.8 10102.2 2700.8 1358.9 2T 4123.2 1319.0 1351.2 903.1 78.9 67.2 19593.6 5336.0 2604.5 4T 7368.1 2394.0 2375.9 1668.8 139.0 119.8 35711.8 9359.2 4603.0 8T 7391.0 2397.4 2769.0 1658.4 137.7 121.8 36643.4 9953.9 4670.9 Overall Seconds 4.88 1T, 5.04 2T, 5.84 4T, 11.52 8T #################### W1 REMIX 32 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-Whetstone Benchmark V1.2 21-Oct-2016 14.34 Compiled for 32 bit Intel x86 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1929.0 566.4 615.3 440.7 38.1 28.7 9518.0 2440.1 1235.3 2T 3528.5 912.9 1188.8 832.1 65.0 57.9 13330.0 4114.1 2272.6 4T 5295.0 1821.0 1784.7 1305.4 95.6 88.5 23671.1 6465.3 3461.3 8T 6406.2 2158.8 2247.6 1588.9 128.2 117.4 24747.2 8243.7 4403.3 Overall Seconds 4.81 1T, 5.38 2T, 7.72 4T, 14.07 8T #################### W1 REMIX 64 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-Whetstone Benchmark V1.2 11-Nov-2016 21.33 Compiled for 64 bit Intel x86_64 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 2189.0 524.1 488.1 402.0 44.7 41.7 1351656.1 1894.8 1758.8 2T 4036.7 1108.5 1178.5 780.0 78.1 73.2 4361015.9 4752.1 3140.7 4T 5652.4 1694.5 1270.9 1191.6 111.8 95.4 2680231.8 5593.2 4688.4 8T 7075.1 2126.0 2068.2 1522.4 147.6 134.8 3600866.1 6987.4 5694.7 Overall Seconds 4.84 1T, 5.22 2T, 8.26 4T, 14.49 8T ################# W1 Windows 10 32 bit ################# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 MP-Whetstone Benchmark From C/C++ 18.00.21005.1 for x86 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1816.7 568.3 580.3 477.8 34.9 26.9 1395.8 1100.4 7327.8 2T 3469.7 1145.9 1086.9 905.6 66.1 52.6 2684.4 2118.7 13383.7 4T 6337.0 2026.1 2029.6 1658.4 121.2 95.1 4886.7 3800.8 24933.3 8T 6900.2 2162.4 2326.0 1870.2 134.7 98.8 6089.9 4071.4 29659.9 Overall Seconds 4.80 1T, 5.02 2T, 5.53 4T, 13.07 8T ################# W1 Windows 10 64 bit ################# MP-Whetstone Benchmark From C/C++ 18.00.21005.1 for x64 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1994.3 537.7 536.4 476.9 42.3 28.8 1420.0 1099.2 7305.8 2T 3760.6 1080.6 1075.4 894.9 79.9 53.3 2842.5 2115.5 12762.4 4T 6946.5 1850.0 1883.3 1655.9 146.8 101.3 4946.3 3787.9 25246.0 8T 7556.2 1891.4 2159.3 1867.7 163.1 104.8 5362.5 4283.3 26001.8 Overall Seconds 4.89 1T, 5.19 2T, 5.66 4T, 13.26 8T ######## W2 Windows 10 32 bit Dual Boot With A5 ######## Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 MP-Whetstone Benchmark From C/C++ 18.00.21005.1 for x86 Start of test Fri Apr 15 16:28:12 2016 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1776.5 561.5 581.1 466.3 34.1 26.2 1402.2 1093.2 6981.4 2T 3364.9 1014.1 1020.8 832.8 65.6 51.6 2643.0 2027.2 12415.1 4T 6316.1 1987.1 2016.5 1655.2 121.2 94.2 4860.8 3793.2 24941.8 8T 6563.4 2372.8 2031.4 1850.4 122.8 96.6 5667.8 3844.8 28561.7 Overall Seconds 4.75 1T, 5.06 2T, 5.39 4T, 11.56 8T ######## W2 Windows 10 64 bit Dual Boot With A5 ######## MP-Whetstone Benchmark From C/C++ 18.00.21005.1 for x64 Start of test Fri Apr 15 16:38:09 2016 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1954.1 506.3 538.0 469.7 40.4 29.1 1411.3 1091.8 7280.9 2T 3615.7 1011.5 989.7 873.6 77.1 51.7 2477.6 1907.0 13107.0 4T 6941.8 1877.9 1879.3 1652.7 147.1 100.9 4946.8 3789.6 25046.5 8T 7124.5 2128.2 1975.4 1705.5 149.7 103.3 5058.7 4284.8 28862.8 Overall Seconds 4.95 1T, 5.36 2T, 5.59 4T, 11.72 8T ================================================================== Top end 2015 PC - Core i7-4820K at 3.9 GHz ================================================================== 32 Bit MP-Whetstone Benchmark From C/C++ 18.00.21005.1 for x86 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 5273.9 1114.8 1119.2 921.1 129.4 90.7 3404.0 5351.3 22213.6 2T 11031.8 2238.4 2304.9 1938.0 271.1 189.4 6973.5 11713.2 46821.3 4T 21347.8 4713.1 4718.0 3879.9 493.4 375.2 14335.7 21161.6 89584.4 8T 39679.6 9374.0 9397.5 7687.6 874.8 726.5 24631.8 23418.6 93465.8 Overall Seconds 4.97 1T, 4.76 2T, 4.99 4T, 5.59 8T 64 Bit MP-Whetstone Benchmark From C/C++ 18.00.21005.1 for x64 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 6200.6 1236.5 1236.2 870.8 206.0 108.8 3359.1 4767.4 23413 2T 13050.4 2603.8 2606.2 1891.4 432.6 217.5 7076.8 10041.6 46840 4T 25336.0 5195.2 5211.7 3707.1 832.8 422.9 13626.9 16962.6 78346 8T 46141.7 10293.2 10379.0 7242.4 1332.7 814.2 24394.5 23451.3 93588 Overall Seconds 4.82 1T, 4.60 2T, 4.91 4T, 5.50 8T #################### PC REMIX 32 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-Whetstone Benchmark V1.2 21-Oct-2016 12.50 Compiled for 32 bit Intel x86 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 5425.2 1343.1 1343.4 868.1 131.8 87.8 55255 11089 4899 2T 10969.5 2773.7 2475.7 1735.9 274.5 175.4 114023 23300 10637 4T 22989.7 5587.5 5609.8 3889.2 547.5 362.3 131855 44619 19739 8T 41099.9 10957 10752 7683.9 881.4 702.7 235813 46954 23348 Overall Seconds 4.91 1T, 4.80 2T, 4.74 4T, 5.76 8T #################### PC REMIX 64 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-Whetstone Benchmark V1.2 11-Nov-2016 14.38 Compiled for 64 bit Intel x86_64 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 6033.0 1343.3 1342.8 831.9 162.5 109.3 33291632 11076 5540 2T 12746.2 2673.1 2827.1 1834.6 330.7 231.9 30432979 23301 9592 4T 25953.4 5598.0 5642.9 3788.8 662.1 473.8 44736026 34693 23308 8T 46218.9 11093 11108 7685.5 1035 889.3 99650183 46841 23415 Overall Seconds 5.14 1T, 5.07 2T, 5.04 4T, 6.10 8T |
For further details see Dhrystone Benchmark above and the following, including further results Android MultiThreading Benchmark Apps. This multithreading benchmark runs using 1, 2, 4 and 8 threads, executing multiple copies of the same program. An initial calibration, using a single thread, determines the number of passes needed for an overall execution time of 1 second. Then all threads are run using the same pass count, running time being extended when there are more threads than CPUs. The same calculations are carried out on each thread. Separate data arrays are used for each thread but some variables can be used by all threads. The latter is probably responsible for failure to increase throughput much, using multiple threads or, in the case of A1, with the Atom CPU, reduced throughput using more than one thread.
On all the initial results shown, there was little difference in performance between the original and the new 32 bit version but T22, with the Cortex-A53, produced significant gains at 64 bits.
T21, the Kindle Fire with a Quad Core Qualcomm Snapdragon 800 CPU, failed to run using the new ARM/Intel version, and obtained a rather excessive score with 8 threads via the original benchmark (but similar to a possible 4 x 2850).
ARM vs Intel MP - Note that the systems using ARM processors increased performance with multiple threads but those with Intel CPUs did not.
32 Bit vs 64 Bit - The latter was typically 70% faster via Android and REMIX/Android but much less using the Windows compilations.
VAX MIPS or DMIPS Threads System CPU MHz Android 1 2 4 8 None See Original ARM Version A1 Z3745 1866 x4 4.4.2 2360 1394 1334 1321 1840 A1 Z3745 1866 x4 5.0 2411 1633 1313 1298 2488 T7 v7-A9 1200 x4 4.1.2 1584 2749 3836 3569 1610 T22 v8-A53 1300 x4 5.0.2 1686 2943 4232 4323 1683 T11 v7-A15 1700 x2 4.2.2 2271 4281 4326 4171 3189 T21 QU-800 2150 x4 4.4.3 2850 4395 7736 11821 3854 ARM/Intel 32 Bit Version A1 Z3745 1866 x4 4.4.2 2365 1322 1323 1319 2451 A5 ## z8300 1840 x4 5.1 2256 1155 1163 1054 2318 T7 v7-A9 1200 x4 4.1.2 1464 2399 3575 3737 1317 T22 v8-A53 1300 x4 5.0.2 1412 2559 4038 4291 1423 P37 v8-A53 1500 x8 6.0.1 1720 2923 4839 2618 1649 P37 v8-A53 1500 x8 7.0 1575 2899 4955 2697 1722 T11 v7-A15 1700 x2 4.2.2 2295 4057 3902 4096 2551 T21 QU-800 2150 x4 4.4.3 Failed to run 3319 P38 v8-A57 2700 x4 6.0.1 3094 5612 6849 3776 +V8-A53 1300 x4 R1=Atm Z8300 1840 x4 6.0.1 2174 1150 1170 1139 2390 R2 Core i7 3900 x4 6.0.1 9919 5685 5305 6076 10489 ARM/Intel 64 Bit Version T22 v8-A53 1300 x4 5.0.2 2548 4311 5560 5613 2569 R1=Atm Z8300 1840 x4 6.0.1 3900 1677 1709 1666 3769 R2 Core i7 3900 x4 6.0.1 16740 7595 7271 8612 17003 Intel/Windows 32 Bit Version W1 Z8300 1840 x4 Win10 3284 1477 1235 1313 3044 W2 ## Z8300 1840 x4 Win10 2521 1730 1333 1285 2906 PC Core i7 3900 x4 Win10 12776 7175 6116 7876 12090 Intel/Windows 64 Bit Version W1 Z8300 1840 x4 Win10 3745 1625 1400 1436 3291 W2 ## Z8300 1840 x4 Win10 3717 1566 1386 1441 3195 PC Core i7 3900 x4 Win10 15129 8535 7278 8769 11686 ## A5 and W2 Same Dual Boot Tablet =Atm R1 and W1 Same Tablet R2 and PC Same PC R1 and R2 Android via REMIX |
This is a multithreading version of the above. Further details and results can be found in here. The benchmark is run on 100x100, 500x500 and 1000x1000 matrices using 0, 1, 2 and 4 separate threads, the programming code for zero theads being the same as the above example. Multithreading performance, using this standard linear equation solver, is severely degraded, due to overheads, the zero thread results being the only ones of real use and the others fairly constant, probably running one thread at a time and limited by RAM speed.
Performance of A1, with the Intel CPU and using native Intel compilation, is shown to be twice as fast as the Houdini ARM to Intel converted version, except at N = 1000, which is mainly dependent on calculations from data in RAM. Then, when running the ARM only version, using Android upgraded to 5.0, the performance difference was considerably reduced.
On ARM CPUs, speeds obtained from 32 bit and 64 bit compilations were similar, due to the programs use a limited number of identical NEON intrinsic functions. For the same reason, the new ARM/Intel version produced similar results as the original.
32 Bit vs 64 bit - Results from 64 bit versions were generally slightly faster than those compiled for 32 bits.
Android vs Windows - Intel based Android and REMIX/Android speeds were around three times faster than Windows results on the Atom CPU and twice as fast on the Core i7.
The program checks that the same numeric results are produced, irrespective of the number of threads used, at each matrix size. Then, due to rounding effects, these are slightly different from ARM and Intel hardware, as shown below.
MFLOPS 0 to 4 Threads, N 100, 500, 1000 ##################### T7 Original ###################### Android Linpack NEON SP MP Benchmark 31-Jan-2013 12.14 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Threads None 1 2 4 N 100 413.47 45.95 48.22 48.34 N 500 253.08 187.51 189.69 189.94 N 1000 148.76 135.49 136.08 136.17 #################### T7 ARM-Intel ##################### ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.40 Threads None 1 2 4 N 100 385.49 28.79 29.06 29.25 N 500 272.07 184.85 183.70 183.18 N 1000 147.09 131.92 132.44 130.05 #################### T11 Original ##################### Android Linpack NEON SP MP Benchmark 13-Aug-2013 23.28 T11 Samsung EXYNOS 5250 1.7 GHz Cortex-A15, Android 4.2.2 Threads None 1 2 4 N 100 1399.82 54.86 55.31 54.66 N 500 1154.21 434.16 434.06 436.97 N 1000 571.26 482.57 487.25 485.80 #################### T11 ARM-Intel #################### ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.44 Threads None 1 2 4 N 100 1497.90 61.13 63.13 61.87 N 500 1399.10 491.49 489.29 494.69 N 1000 586.14 499.00 504.97 497.49 #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android Linpack NEON SP MP Benchmark 26-Jul-2015 11.46 Threads None 1 2 4 N 100 1311.08 12.38 12.93 15.05 N 500 2271.56 344.04 419.52 381.73 N 1000 837.30 540.99 523.52 564.87 #################### T21 ARM-Intel #################### ARM/Intel Linpack NEON SP MP Benchmark 26-Jul-2015 11.51 Threads None 1 2 4 N 100 1308.07 14.89 11.77 11.63 N 500 2341.17 407.96 481.02 415.12 N 1000 901.21 551.80 566.77 564.31 ###################### P37 32 Bit ###################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 Single Channel RAM, LPDDR3 933 MHz, 7.5 GB/second 8 x 32 KB L1 cache, 512 KB shared L2 cache ARM/Intel Linpack NEON SP MP Benchmark 1.2 14-Nov-2016 12.09 Compiled for 32 bit ARM v7a Threads None 1 2 4 N 100 555.85 26.39 26.62 26.78 N 500 459.23 224.55 207.08 217.47 N 1000 359.47 270.92 275.58 272.08 Android 7.0 ARM/Intel Linpack NEON SP MP Benchmark 1.2 09-May-2017 11.18 Compiled for 32 bit ARM v7a Threads None 1 2 4 N 100 560.74 25.96 26.35 26.41 N 500 501.69 234.14 237.16 236.78 N 1000 393.49 305.86 310.71 309.85 ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.52 Compiled for 32 bit ARM v7a Threads None 1 2 4 N 100 460.74 22.35 23.16 23.82 N 500 480.63 336.52 339.94 303.66 N 1000 470.02 405.86 403.01 405.98 ###################### T22 64 Bit ###################### ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.57 Compiled for 64 bit ARM v8a Threads None 1 2 4 N 100 548.67 27.70 33.93 37.00 N 500 470.04 285.95 297.79 301.67 N 1000 519.02 441.84 443.47 441.91 #################### A1 Original ####################### Android Linpack NEON SP MP Benchmark 07-Feb-2015 18.42 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Threads None 1 2 4 N 100 452.39 21.00 23.48 17.48 N 500 663.38 275.56 88.66 312.71 N 1000 617.04 380.60 191.26 195.61 ################## A1 V1 Android 5.0 ################### Android Linpack NEON SP MP Benchmark 05-Nov-2015 11.49 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 662.21 25.84 25.59 25.43 N 500 1022.76 317.51 310.52 311.49 N 1000 861.75 549.32 558.52 547.91 #################### A1 ARM-Intel ###################### ARM/Intel Linpack NEON SP MP Benchmark 1.2 06-Nov-2015 22.11 Compiled for 32 bit Intel x86 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 979.81 49.01 42.69 45.34 N 500 1160.24 369.43 349.04 334.87 N 1000 716.94 560.86 535.46 486.61 ########## A5 ARM-Intel Dual Boot With W2 ############ Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Android 5.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 ARM/Intel Linpack NEON SP MP Benchmark 1.2 14-Apr-2016 17.22 Compiled for 32 bit Intel x86 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 1131.44 16.52 16.05 17.00 N 500 1427.56 234.84 231.15 266.46 N 1000 874.35 474.20 423.36 577.54 #################### W1 REMIX 32 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 Android Linpack NEON SP MP Benchmark 11-Nov-2016 21.35 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 764.63 23.72 18.72 8.77 N 500 1387.27 153.52 153.30 145.98 N 1000 880.43 360.42 357.60 348.40 ARM/Intel Linpack NEON SP MP Benchmark 1.2 21-Oct-2016 14.38 Compiled for 32 bit Intel x86 Threads None 1 2 4 N 100 1095.33 53.33 57.76 57.01 N 500 1589.75 493.68 512.28 511.92 N 1000 886.08 638.19 635.86 638.70 #################### W1 REMIX 64 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel Linpack NEON SP MP Benchmark 1.2 14-Aug-2016 22.33 Compiled for 64 bit Intel x86_64 Threads None 1 2 4 N 100 1221.20 60.54 65.60 64.04 N 500 1405.14 567.66 554.66 568.40 N 1000 1058.21 729.60 734.22 747.03 ################# W1 Windows 10 32 bit ################# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 Linpack Single Precision MultiThreaded Benchmark 32 Bit, N=500, Wed Dec 23 21:01:12 2015 Threads 0 1 2 4 MFLOPS 740.71 256.40 226.44 163.99 Linpack Double Precision MultiThreaded Benchmark 32 Bit, N=500, Wed Dec 23 21:00:30 2015 Threads 0 1 2 4 MFLOPS 480.73 194.42 196.76 148.52 ################# W1 Windows 10 64 bit ################# Linpack Single Precision MultiThreaded Benchmark 64 Bit, N=500, Wed Dec 23 21:17:19 2015 Threads 0 1 2 4 MFLOPS 707.50 263.47 240.46 197.31 Linpack Double Precision MultiThreaded Benchmark 64 Bit, N=500, Wed Dec 23 21:16:42 2015 Threads 0 1 2 4 MFLOPS 488.12 205.02 202.39 165.47 ######## W2 Windows 10 32 bit Dual Boot With A5 ######## Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 Linpack Single Precision MultiThreaded Benchmark 32 Bit, N=500, Fri Apr 15 16:23:55 2016 Threads 0 1 2 4 MFLOPS 626.40 231.31 183.87 129.48 Linpack Double Precision MultiThreaded Benchmark 32 Bit, N=500, Fri Apr 15 16:23:21 2016 Threads 0 1 2 4 MFLOPS 412.89 221.03 148.56 94.62 ######## W2 Windows 10 64 bit Dual Boot With A5 ######## Linpack Single Precision MultiThreaded Benchmark 64 Bit, N=500, Fri Apr 15 16:36:10 2016 Threads 0 1 2 4 MFLOPS 662.15 241.59 228.59 195.97 ResidN 3.96 3.96 3.96 3.96 Linpack Double Precision MultiThreaded Benchmark 64 Bit, N=500, Fri Apr 15 16:35:42 2016 Threads 0 1 2 4 MFLOPS 527.64 195.54 180.62 154.02 #################### PC REMIX 32 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, Android Linpack NEON SP MP Benchmark 11-Nov-2016 14.40 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 3829.87 113.83 90.99 52.76 N 500 6053.91 1024.25 1014.78 985.31 N 1000 6601.66 2628.01 2568.70 2522.01 ARM/Intel Linpack NEON SP MP Benchmark 1.2 21-Oct-2016 12.51 Compiled for 32 bit Intel x86 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 4738.29 284.27 288.92 289.43 N 500 7078.15 3328.75 3287.02 3288.17 N 1000 7556.05 5459.01 5478.02 5461.30 #################### PC REMIX 64 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel Linpack NEON SP MP Benchmark 1.2 11-Nov-2016 14.42 Compiled for 64 bit Intel x86_64 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 5622.61 318.61 317.19 320.32 N 500 7355.32 3448.71 3577.17 3541.12 N 1000 7734.14 5566.40 5622.47 5653.65 #################### PC Windows 32 Bit ################## Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Windows 10, Linpack Single Precision MultiThreaded Benchmark 32 Bit, N=500, Tue Nov 15 11:29:25 2016 Threads 0 1 2 4 MFLOPS 4018.79 1674.30 1583.93 1199.23 Linpack Double Precision MultiThreaded Benchmark 32 Bit, N=500, Tue Nov 15 11:29:03 2016 Threads 0 1 2 4 MFLOPS 3307.45 1521.69 1453.19 1185.62 #################### PC Windows 64 Bit ################## Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Windows 10, Linpack Single Precision MultiThreaded Benchmark 64 Bit, N=500, Tue Nov 15 11:37:57 2016 Threads 0 1 2 4 MFLOPS 4036.32 1891.33 1782.15 1345.03 Linpack Double Precision MultiThreaded Benchmark 64 Bit, N=500, Tue Nov 15 11:37:24 2016 Threads 0 1 2 4 MFLOPS 3370.00 1692.80 1590.42 1304.35 ################### Numeric Results ################### NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1 Single Precision N 100 500 1000 ARM NR 1.60 3.96 11.32 RE 3.80277634e-05 4.72068787e-04 2.70068645e-03 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07 X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04 XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05 Intel NR 1.68 3.96 11.39 RE 4.00543213e-05 4.72545624e-04 2.71725655e-03 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07 X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04 XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05 Double Precision Intel SSE2 5.76 1.27986510e-012 2.22044605e-016 5.59552404e-014 3.39728246e-014 |
This is a multithreading version of the above. and here for further results. In the original MP-BusSpdi benchmark, all threads read data from the beginning. With large shared caches, this could lead to exaggerated data transfer speeds for RAM based data, using multiple threads. The revised MP-BusSpd2i attempts to avoid this by arranging for threads to have staggered starting points, but each still reading all the data, besides having a much longer running time for consistent scores. Performance using a single thread is similar to the non-threaded version and it is clear that multiple threads are needed to demonstrate maximum throughput. As usual, maximum RAM speeds can be estimated from burst transfer results, such as 16 times Inc16 MB/second. some results are provided below.
MP-BusSpdi.apk can be downloded from here.
Using A1, with the Intel Atom CPU, the initial Houdini ARM to Intel conversion speeds were slightly slower than the results from the native code compilations, but this was made up on running via Android 5.
Results for the original version, running on ARM CPUs, are not all shown, as they were similar to those for the new version. See here. On T22, with the Cortex-A53, performance could be more than twice as fast, reading all data, using the 64 bit compilation.
The problem associated with shared caches is probably best identified by wide variations in the burst reading tests, that are not apparent in the long running versions (see T7 and T21 below ).
Following the main tables are comparisons of the Read All speeds,for the revised benchmarks. They are based on MB/second/MHz for cached based data and MB/second using RAM.
MP Efficiency - The L1 cache based 4 thread gains over 1 thread ratios shown indicate more than 3.5 times on ARM CPUs but much less from Intel processors, but can be similar using L3 cache. There were also some significant gains reading data from RAM. However, this was influenced by relatively faster Intel speed, using one thread.
64 Bit vs 32 Bit - Windows tests indicated similar performance but 64 bit compilations were much faster than at 32 bits via Android, even using Intel CPUs via REMIX.
Some of the above might be due to the different compilers used.
#################### T7 ARM-Intel ##################### T7, ARM Cortex-A9 1.2 GHz, DDR3-1333, 5.3 GB/s Android 4.1.2, 4 x 32 KB L1 cache, 1 MB shared L2 cache ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.35 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2853 3392 3376 3511 3551 3494 2T 2857 3389 3542 5540 5730 5595 4T 7257 10326 10289 10997 11373 11100 8T 6584 10325 10485 11175 11322 11189 122.9 1T 362 379 347 546 623 978 2T 516 530 508 726 1227 1840 4T 598 658 548 1181 1556 2657 8T 721 733 736 1181 1548 2653 12288 1T 58 57 84 123 173 334 2T 111 111 182 248 348 664 4T 87 85 276 463 687 1290 8T 154 107 147 429 441 1242 Total Elapsed Time 12.7 seconds ########## T7 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.59 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2166 2774 3181 3307 3377 3263 2T 3924 5188 5207 5754 5759 5805 4T 7570 10011 10252 11165 11375 11777 8T 3510 4786 9011 8318 11351 11544 122.9 1T 383 409 359 558 663 983 2T 525 541 520 741 1241 1814 4T 739 752 753 1219 1590 2776 8T 735 741 753 1218 1607 2737 49152 1T 56 51 81 126 172 330 2T 65 67 107 196 335 620 4T 70 68 108 215 426 835 8T 70 68 109 215 428 851 Total Elapsed Time 48.2 seconds Maximum RAM Speed Estimate = 68 x 16 = 1088 MB/second #################### T11 ARM-Intel #################### T11 Samsung EXYNOS 5250 1.7 GHz Cortex-A15, Android 4.2.2 Dual core, 2 x 32 KB L1 cache, 1 MB shared L2 cache ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.45 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2165 3591 4256 5587 5998 6109 2T 4121 6469 9530 11381 11846 11936 4T 4106 6438 8827 6793 9802 12080 8T 4098 6390 9534 10141 10996 11603 122.9 1T 464 740 1173 2395 3276 3340 2T 579 989 1934 3994 5431 5792 4T 579 988 1930 3873 5469 5821 8T 580 985 1915 3999 5408 5812 12288 1T 134 172 211 462 602 1904 2T 269 343 387 934 1217 2685 4T 252 231 374 768 991 2625 8T 231 254 367 781 1104 2782 Total Elapsed Time 12.1 seconds ########## T11 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 17.07 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3499 4539 5499 5505 6134 6045 2T 3775 7202 8377 10605 10457 11319 4T 3982 6676 7687 9326 9707 10807 8T 2546 3643 7891 8003 10725 11097 122.9 1T 672 901 1336 2784 3274 3334 2T 568 969 1931 3894 5427 5221 4T 574 971 1912 3831 5256 4811 8T 559 971 1917 3878 5387 5162 49152 1T 140 142 193 575 989 1499 2T 221 223 342 769 1379 2355 4T 228 223 344 783 1382 2376 8T 223 223 342 787 1385 2352 Total Elapsed Time 49.9 seconds Maximum RAM Speed Estimate = 223 x 16 = 2568 MB/second #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s L1 caches 4 x 16 KB, L2 cache shared 2048 KB Android MP-BusSpd v7 Benchmark V1.1 29-Jun-2015 18.37 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2580 2206 5048 5176 5679 5989 2T 4062 5175 9340 9868 10971 11281 4T 4688 10324 16552 17196 21714 23708 8T 8467 9834 16698 18183 21936 23693 122.9 1T 1152 1052 2068 3035 3927 5723 2T 1710 1840 3094 5001 7963 11475 4T 2047 2002 5031 9267 14698 22920 8T 2235 2275 5223 9348 14234 21783 12288 1T 262 382 508 867 1466 2661 2T 464 766 1049 1754 3186 5735 4T 612 1018 1796 3149 5892 9095 8T 575 680 1277 2308 4987 7948 Total Elapsed Time 12.7 seconds Impossible Maximum RAM Speed 1018 x 16 = 16288 MB/second #################### T21 ARM-Intel #################### ARM/Intel MP-BusSpd v7 Benchmark V1.1 23-May-2015 17.05 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 1840 2073 3512 3554 4829 5243 2T 3432 4591 7128 7651 9120 9821 4T 4398 7855 13752 15428 18530 20235 8T 6692 9507 13857 16110 18143 18796 122.9 1T 860 753 2011 2841 3205 5282 2T 1505 1609 3076 5038 8089 10421 4T 1924 1981 4299 7588 14614 20754 8T 1909 1988 4264 7980 13884 19027 12288 1T 270 379 538 856 1626 2859 2T 471 677 1098 1849 3304 5924 4T 549 787 1066 1874 6274 10781 8T 713 853 1649 2258 4664 8321 Total Elapsed Time 13.1 seconds ########## T21 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.39 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2247 2616 4010 4443 4909 5614 2T 3558 4725 7241 9048 9747 10892 4T 6074 8303 13442 16937 18525 21068 8T 3998 5106 14314 13615 18200 20740 122.9 1T 874 1198 2024 2935 4529 5345 2T 1686 1702 3174 5357 7688 10545 4T 1988 2139 4465 8171 14969 21169 8T 1972 2139 4468 8195 15261 21132 49152 1T 292 406 516 899 1663 2929 2T 449 541 962 1569 2851 4776 4T 495 605 1109 2439 4161 8243 8T 530 564 1156 2149 4172 7907 Total Elapsed Time 48.0 seconds Maximum RAM Speed Estimate = 605 x 16 = 9680 MB/second #################### P37 32 Bit V1.2 #################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 Single Channel RAM, LPDDR3 933 MHz, 7.5 GB/second 8 x 32 KB L1 cache, 512 KB shared L2 cache ARM/Intel MP-BusSpd2 Benchmark V1.2 14-Nov-2016 12.11 Compiled for 32 bit ARM v7a MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2060 2433 2430 2487 2555 2625 2T 3966 4727 4886 4964 5091 5167 4T 6843 8675 9208 9581 10025 10254 8T 5360 6326 13507 10947 15929 16546 122.9 1T 666 672 1231 2000 2368 2524 2T 1029 1036 1993 3570 4766 5089 4T 1062 1098 2144 4166 7694 9835 8T 1737 1793 3540 6473 10502 14201 49152 1T 164 172 339 658 1247 2014 2T 289 307 591 1124 2192 3839 4T 410 353 813 1692 3015 6058 8T 429 426 842 1495 2949 5790 Total Elapsed Time 56.3 seconds Maximum RAM Speed Estimate = 426 x 16 = 6816 MB/second Android 7.0 ARM/Intel MP-BusSpd2 Benchmark V1.2 11-May-2017 10.35 Compiled for 32 bit ARM v7a MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2151 2396 2448 2516 2589 2632 2T 4042 4460 4824 4893 5336 5192 4T 6828 8657 9409 9755 10120 10339 8T 5401 6897 13508 11464 15960 16792 122.9 1T 674 692 1267 2019 2402 2584 2T 1031 1043 1999 3591 4737 5047 4T 1064 1164 2168 4185 7761 9879 8T 1734 1857 3429 6438 10447 15287 49152 1T 163 172 337 674 1236 2098 2T 297 282 566 1101 2175 3735 4T 431 390 751 1470 3053 5716 8T 406 369 786 1621 2897 6031 Total Elapsed Time 57.0 seconds ###################### T22 32 Bit ###################### T22, Tab 2 A8-50, 1.3 GHz quad core 64 bit ARM Cortex-A53 Single Channel RAM, LPDDR3 666 MHz, 5.3 GB/second 4 x 32 KB L1 cache, 512 KB L2 cache ARM/Intel MP-BusSpd Benchmark V1.2 12-Aug-2015 16.13 Compiled for 32 bit ARM v7a MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 1849 2140 2079 2211 2270 2297 2T 3663 4252 4294 4400 4370 4580 4T 4630 5574 5691 5893 6015 6083 8T 5331 5775 6033 6622 7968 8023 122.9 1T 597 621 1119 1815 2135 2237 2T 869 943 1644 2992 3740 4412 4T 949 951 1922 3736 6468 7779 8T 948 978 1911 3717 6464 7542 12288 1T 123 174 344 678 1215 1840 2T 243 310 672 1332 2383 3974 4T 302 285 594 1282 2271 4606 8T 279 295 654 1198 2749 4660 Total Elapsed Time 12.8 seconds ########## T22 Long Version ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.14 Compiled for 32 bit ARM v7a MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 1877 2124 2176 2266 2296 2343 2T 3625 4198 4341 4468 4536 4613 4T 5733 7541 8293 8830 8024 9042 8T 2985 3829 7438 6117 8108 8923 122.9 1T 604 625 1142 1846 2150 2284 2T 924 950 1793 3277 4270 4504 4T 962 989 1939 3765 6798 8862 8T 965 993 1933 3748 6651 8239 49152 1T 165 175 344 677 1285 1979 2T 234 238 482 961 1907 3547 4T 266 298 562 1224 2296 4478 8T 272 275 538 1098 2149 4282 Total Elapsed Time 48.8 seconds Maximum RAM Speed Estimate = 298 x 16 = 4768 MB/second ###################### T22 64 Bit ###################### ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.18 Compiled for 64 bit ARM v8a MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2610 2472 2586 2727 2748 5841 2T 4404 4681 4994 5369 5420 11297 4T 6546 8125 9105 10243 10319 20610 8T 3380 4023 7919 7146 9871 19852 122.9 1T 604 621 1110 1872 2446 5100 2T 919 948 1855 3433 4853 10037 4T 961 974 1984 3924 7491 14935 8T 963 942 1931 3915 7572 14689 49152 1T 173 177 340 692 1300 2653 2T 266 241 479 968 1883 3724 4T 304 277 556 1130 2126 4328 8T 279 278 544 1138 2179 4275 Total Elapsed Time 49.4 seconds #################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s 4 x 24 KB L1, 2 x 1 MB L2 Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 13.02 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3990 4458 6123 6512 6438 6729 2T 3894 5699 8948 10299 11800 12555 4T 5046 7109 11952 14750 15533 23304 8T 4533 7464 13097 16970 21674 22225 122.9 1T 1304 1613 2291 2661 3667 5063 2T 2568 3145 4529 5365 7440 10147 4T 4117 4801 7963 7495 8239 18911 8T 3130 5016 7355 8543 11648 15845 12288 1T 190 265 601 1203 2316 3832 2T 244 448 995 1771 3599 6575 4T 427 584 860 1741 3439 7449 8T 395 510 855 1613 3547 6776 Total Elapsed Time 13.5 seconds ################## A1 V1 Android 5.0 ################### Android MP-BusSpd v7 Benchmark V1.1 05-Nov-2015 11.52 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5509 6152 6796 6937 7060 7056 2T 4635 6757 9294 11284 12612 13486 4T 4545 9383 15861 21378 15369 23493 8T 4473 8723 15965 18476 23438 22747 122.9 1T 1467 1782 2386 2737 3799 5299 2T 2225 3460 4683 5421 7507 10514 4T 2493 5703 8165 9941 11313 11259 8T 4119 5481 6992 8726 12919 17166 12288 1T 213 253 589 1176 2309 3903 2T 252 396 842 1668 3325 6759 4T 404 437 1130 1659 4562 6911 8T 414 507 836 1902 3607 6670 Total Elapsed Time 13.9 seconds #################### A1 ARM-Intel ###################### ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.28 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5925 6494 6778 6979 7047 7026 2T 3966 7029 9689 11689 12856 13654 4T 4438 8698 16739 22057 23946 25729 8T 4455 8619 15787 19934 22576 20804 122.9 1T 1490 1975 2360 2802 3818 5330 2T 2881 3798 4647 5531 7536 10546 4T 4452 6338 5910 10217 14650 19903 8T 4096 5075 6264 9213 12610 15821 12288 1T 206 273 593 1198 2343 3935 2T 276 455 842 1821 3319 6591 4T 445 730 1401 2076 4457 7525 8T 424 539 954 1829 3688 7064 Total Elapsed Time 13.0 seconds ########## A1 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.50 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5431 6110 6780 6262 6655 7313 2T 3550 4464 7375 9825 11777 12442 4T 2027 4442 4399 8841 17611 23509 8T 983 2477 5063 4433 8568 15867 122.9 1T 1499 1991 2357 2839 3818 5382 2T 2816 3808 4708 5592 7557 10677 4T 4316 6313 7991 9816 14335 19993 8T 4235 5610 7917 8791 12828 19661 49152 1T 215 275 611 1183 2328 3922 2T 276 435 787 1671 3323 6507 4T 398 455 884 1754 3490 6971 8T 376 511 867 1746 3512 7510 Total Elapsed Time 48.6 seconds Maximum RAM Speed Estimate = 511 x 16 = 8176 MB/second ########### A5 ARM-Intel Dual Boot With W2 ############# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Android 5.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 ARM/Intel MP-BusSpd2 Benchmark V1.2 14-Apr-2016 17.28 Compiled for 32 bit Intel x86 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5322 6275 6475 6901 6959 6925 2T 4625 4163 6792 8964 10879 11027 4T 2221 3775 4091 8006 15158 19631 8T 1178 1840 3907 3884 8002 15691 122.9 1T 1438 1891 2342 2601 3477 4957 2T 2509 3489 4597 5115 6807 9275 4T 3591 4849 6905 8356 11204 14596 8T 3868 5327 7014 7860 10754 15998 49152 1T 179 205 391 802 1372 3023 2T 238 310 495 1204 2397 4559 4T 240 336 653 1170 2008 4969 8T 291 321 681 1316 2378 5329 Total Elapsed Time 50.3 seconds Maximum RAM Speed Estimate = 336 x 16 = 5376 MB/second #################### W1 REMIX 32 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-BusSpd Benchmark V1.2 21-Oct-2016 14.29 Compiled for 32 bit Intel x86 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5659 5848 5977 6263 6100 6481 2T 4075 6144 7960 9632 10899 11283 4T 3766 6335 7923 9544 10679 11425 8T 3531 6367 7693 7739 8336 7918 122.9 1T 1389 1492 2456 2702 1564 5013 2T 2080 2904 2943 3073 4785 7541 4T 1995 2761 4446 4114 5075 8011 8T 1673 2504 2711 3097 6693 8366 12288 1T 190 230 453 877 1681 2396 2T 222 246 405 1287 2291 3926 4T 180 299 588 1469 2951 5002 8T 303 380 701 1265 2476 6796 Total Elapsed Time 14.2 seconds Maximum RAM Speed Estimate = 380 x 16 = 6080 MB/second #################### W1 REMIX 64 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-BusSpd2 Benchmark V1.2 11-Nov-2016 21.25 Compiled for 64 bit Intel x86_64 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3870 3871 4281 4386 4382 16766 2T 3290 3312 5048 5924 6729 22511 4T 4909 6232 6866 7231 7745 27366 8T 2662 3012 6328 6364 8818 26211 122.9 1T 1506 1534 2471 2433 3510 9204 2T 2071 2479 3727 4428 5757 17952 4T 2636 2833 5013 4918 7263 22352 8T 2552 3360 5211 6178 7819 23389 49152 1T 243 245 565 1037 1469 3522 2T 329 370 565 1425 2421 4783 4T 329 387 673 1501 3148 4866 8T 402 433 858 1681 2838 6987 Total Elapsed Time 53.8 seconds Maximum RAM Speed Estimate = 433 x 16 = 6928 MB/second ################# W1 Windows 10 32 bit ################# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10 4 GB DDR3 1600 dual channel 12.8 GB/s MP-BusSpeed From C/C++ 18.00.21005.1 for x86 Start of test Wed Dec 23 20:57:34 2015 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 6170 6348 6836 6869 7029 6743 2T 1859 3059 5657 7800 9685 10880 4T 989 1804 3289 5900 10157 16055 8T 473 843 1578 3101 5665 10124 122.9 1T 1476 1532 2319 2679 3515 4824 2T 2234 2733 4337 5226 6710 9655 4T 3428 4628 6956 8606 10978 16225 8T 2675 3965 6432 8355 11139 15714 49152 1T 241 273 565 1090 2130 3848 2T 346 409 734 1591 3082 5762 4T 499 496 947 1887 3818 7634 8T 476 500 930 1888 3932 7625 End of test Wed Dec 23 20:58:22 2015 Maximum RAM Speed Estimate = 500 x 16 = 8000 MB/second ################# W1 Windows 10 64 bit ################# MPbusSpeed64 From C/C++ 18.00.21005.1 for x64 Start of test Wed Dec 23 21:15:07 2015 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5222 6158 6233 6523 6404 6580 2T 1882 3670 6113 8124 9540 10760 4T 1089 1817 3378 6083 10832 15242 8T 505 837 1846 3250 5899 9788 122.9 1T 1424 1540 2285 2544 3490 4854 2T 2567 2756 4233 4920 6579 9820 4T 3444 4858 6699 8186 11628 16690 8T 2593 3644 5671 7370 9304 13630 49152 1T 240 268 566 1097 2070 3860 2T 342 411 754 1448 2940 5836 4T 451 494 894 1902 3804 7526 8T 424 503 935 1830 3710 7180 End of test Wed Dec 23 21:15:55 2015 ######## W2 Windows 10 32 bit Dual Boot With A5 ######## Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 MP-BusSpeed From C/C++ 18.00.21005.1 for x86 Start of test Fri Apr 15 16:19:46 2016 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5387 5874 6023 6023 6158 6175 2T 2051 3414 5527 6968 9063 9875 4T 1105 1897 3213 5706 9238 13066 8T 452 830 1874 3063 5620 8967 122.9 1T 1266 1286 2041 2420 3084 4283 2T 2258 2657 3976 4624 5973 8438 4T 3163 4119 5893 7241 10447 15588 8T 2540 3404 5628 8170 8647 12274 49152 1T 139 170 319 592 986 2063 2T 202 225 442 802 1633 3542 4T 295 359 597 1220 2489 5001 8T 282 313 651 1159 2359 5166 End of test Fri Apr 15 16:20:38 2016 Maximum RAM Speed Estimate = 313 x 16 = 5008 MB/second ######## W2 Windows 10 64 bit Dual Boot With A5 ######## MPbusSpeed64 From C/C++ 18.00.21005.1 for x64 Start of test Fri Apr 15 16:31:03 2016 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5414 5881 5982 6593 6320 6915 2T 2004 3844 6095 8469 10032 11237 4T 977 1709 3311 6239 12238 17994 8T 498 862 1737 3185 5915 10456 122.9 1T 1515 1537 2447 2750 3625 5040 2T 2330 2730 4064 4923 6364 9105 4T 3702 4830 7300 8835 11707 16740 8T 2587 3613 5718 7715 9699 16216 49152 1T 183 198 429 834 1652 3143 2T 244 303 565 1144 2221 4537 4T 346 324 644 1284 2552 5123 8T 306 307 618 1249 2421 4874 End of test Fri Apr 15 16:31:54 2016 #################### PC REMIX 32 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-BusSpd2 Benchmark V1.2 21-Oct-2016 12.32 Compiled for 32 bit Intel x86 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 13032 13915 24235 25197 22774 23523 2T 12780 25046 41965 50097 47757 52797 4T 27568 24981 36907 46686 50510 64687 8T 14880 22221 47422 54616 80188 96729 122.9 1T 7133 6612 9381 15623 21204 26016 2T 7641 13474 22117 24280 44150 51649 4T 19935 25520 43348 41204 69425 101560 8T 31478 38036 59094 79377 96106 103008 49152 1T 712 1034 2181 4347 8729 13516 2T 1510 2074 2393 8057 15548 27128 4T 2952 2228 6703 13593 27804 42109 8T 4961 4460 8805 25670 49205 68560 Total Elapsed Time 53.2 seconds #################### PC REMIX 64 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-BusSpd2 Benchmark V1.2 11-Nov-2016 14.29 Compiled for 64 bit Intel x86_64 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 11234 11268 11549 9728 11075 83709 2T 13975 18788 21241 20376 21981 126069 4T 11950 16021 25702 25888 22591 129598 8T 7847 11333 22999 26446 39027 137208 122.9 1T 7270 7472 9070 11037 11565 57013 2T 12151 13359 18497 21814 22939 110321 4T 23054 19821 35736 42796 23494 145387 8T 25125 32352 39249 44178 46373 261178 49152 1T 651 966 1872 3496 7749 18057 2T 930 1979 3815 6002 11796 33883 4T 2876 3639 7142 13308 26695 60051 8T 3802 4639 12125 22329 39597 106907 Total Elapsed Time 56.2 seconds ============================================== Top end 2015 PC - Core i7-4820K at 3.9 GHz Quad core, 8 threads, 10 MB shared L3 cache RAM 1600 MHz, quad channel, 51.2 GB/sec ============================================== Intel/Windows 32 Bit Version MP-BusSpeed From C/C++ 18.00.21005.1 for x86 Start of test Sun Feb 14 18:30:05 2016 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 14262 14567 19724 19553 19374 19743 2T 10737 12187 18359 23285 31442 31491 4T 5537 7660 13862 24507 32888 42530 8T 3967 6138 14340 22999 39199 60117 122.9 1T 7263 7213 11664 16448 19425 20552 2T 10361 9428 20446 31143 34263 40155 4T 18846 21063 38732 54792 57770 56587 8T 22328 32794 54749 69742 79276 80967 49152 1T 668 1031 2141 4185 8650 14974 2T 1210 1726 3867 7731 15627 28522 4T 2161 3177 6122 11449 25009 41192 8T 4728 4106 9842 23118 43257 61779 End of test Sun Feb 14 18:31:00 2016 Intel/Windows 64 Bit Version MPbusSpeed64 From C/C++ 18.00.21005.1 for x64 Start of test Sun Feb 14 18:46:52 2016 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 14760 14788 21402 20729 20934 21032 2T 12570 19878 27089 35589 37688 41618 4T 7000 11473 21725 34776 51827 74198 8T 3728 6525 14160 23059 40659 66975 122.9 1T 7571 7448 11828 16724 20283 21671 2T 13291 13676 22360 32586 39872 42740 4T 18270 21303 37555 62890 78583 84191 8T 21030 30880 53098 71255 91804 103575 49152 1T 663 1037 2159 4187 8611 15218 2T 1207 1720 3908 6418 15470 27796 4T 2319 2382 7002 13639 23754 46951 # 8T 4728 5602 12178 21784 35170 80274 # End of test Sun Feb 14 18:47:43 2016 # Some data from sharesd 10 MB L3 cache ######### Comparison MB/sec/MHz and RAM MB/sec ######### Unless indicated all are quad core CPUs, Core i7 runs up to 8 threads using HyperThreading dual 8 core T7 T11 T21 A1 A5 P37 KB Cortex Cortex Qualcom Atom Atom Cortex A9 A15 800 Z3745 z8300 A53 MB/sec/MHz 12.3 1T 2.72 3.56 2.61 3.93 4.81 1.75 2T 4.84 6.66 5.07 6.69 7.66 3.44 4T 9.81 6.36 9.80 12.64 13.63 6.84 8T 9.62 6.53 9.65 8.53 10.90 11.03 122.9 1T 0.82 1.96 2.49 2.89 3.44 1.68 2T 1.51 3.07 4.90 5.74 6.44 3.39 4T 2.31 2.83 9.85 10.75 10.14 6.56 8T 2.28 3.04 9.83 10.57 11.11 9.47 RAM MB/sec 49152 1T 330 1499 2929 3922 3023 2014 2T 620 2355 4776 6507 4559 3839 4T 835 2376 8243 6971 4969 6058 8T 851 2352 7907 7510 5329 5790 4T gain L1 3.61 1.79 3.75 3.21 2.83 3.91 L2 2.82 1.44 3.96 3.71 2.94 3.90 RAM 2.53 1.59 2.81 1.78 1.64 3.01 ======================================================== 64 bit compilations compared with 32 bit ======================================================== Android REMIX/Android 8HT 8HT T22 32 T22 64 R1 32 R1 64 R2 32 R2 64 KB Cortex Cortex Atom Atom Corei7 Corei7 A53 A53 Z8300 Z8300 4820K 4820K MB/sec/MHz 12.3 1T 1.80 4.49 3.52 9.11 6.03 21.46 2T 3.55 8.69 6.13 12.23 13.54 32.33 4T 6.96 15.85 6.21 14.87 16.59 33.23 8T 6.86 15.27 4.30 14.25 24.80 35.18 122.9 1T 1.76 3.92 2.72 5.00 6.67 14.62 2T 3.46 7.72 4.10 9.76 13.24 28.29 4T 6.82 11.49 4.35 12.15 26.04 37.28 8T 6.34 11.30 4.55 12.71 26.41 66.97 RAM MB/sec 49152 1T 1979 2653 2396 3522 13516 18057 2T 3547 3724 3926 4783 27128 33883 4T 4478 4328 5002 4866 42109 60051 8T 4282 4275 6796 6987 68560 106907 4T gain L1 3.86 3.53 1.76 1.63 2.75 1.55 L2 3.88 2.93 1.60 2.43 3.90 2.55 RAM 2.26 1.63 2.09 1.38 3.12 3.33 64/32 Bit L1 2.49 2.59 3.56 ======================================================== 64 bit compilations compared with 32 bit ======================================================== Windows 8HT 8HT W1 32 W1 64 W2 32 W2 64 PC 32 PC 64 KB Atom Atom Atom Atom Corei7 Corei7 Z8300 Z8300 z8300 z8300 4820K 4820K MB/sec/MHz 12.3 1T 4.68 4.57 3.36 3.76 5.06 5.39 2T 7.56 7.47 5.37 6.11 8.07 10.67 4T 11.15 10.58 7.10 9.78 10.91 19.03 8T 7.03 6.80 4.87 5.68 15.41 17.17 122.9 1T 3.35 3.37 2.33 2.74 5.27 5.56 2T 6.70 6.82 4.59 4.95 10.30 10.96 4T 11.27 11.59 8.47 9.10 14.51 21.59 8T 10.91 9.47 6.67 8.81 20.76 26.56 RAM MB/sec 49152 1T 3848 3860 2063 3143 14974 15218 2T 5762 5836 3542 4537 28522 27796 4T 7634 7526 5001 5123 41192 46951 # 8T 7625 7180 5166 4874 61779 80274 # # Core i7 results - some data from sharesd 10 MB L3 cache 4T gain L1 2.38 2.32 2.12 2.60 2.15 3.53 L2 3.36 3.44 3.64 3.32 2.75 3.88 RAM 1.98 1.95 2.42 1.63 2.75 3.09 64/32 Bit L1 0.98 1.12 1.07 Android/Win 0.75 1.99 1.19 3.98 |
This is an ARM/Intel version of the longer running MP-RndMem Benchmark, as the original, short version, produced inconsistent performance measurements. It is a multithreading variety of RandMem above. For further details and more results see here.
On tablet A1, with the Intel Atom CPU, the initial Houdini ARM to Intel conversion speeds were significantly slower than the results from the native code compilations. This problem was overcome via Android 5 procedures, when most results were faster.
On ARM based tablets, the new ARM/Intel compilations were generally slower than the original produced by an earlier compiler, but most of the difference was regained on a 64 bit version.
Intel/Windows Versions - Maximum data size of 12.3 MB was fine for early Android devices but performance can be affected by shared L2 caches on later ones. The Core i7 test, at this size, is mainly using the 10 MB L3 cache.
Later Intel Comparisons - Later measurements demonstrated inconsistent performance, using Intel Atom CPUs. For example, a second run of the tests could be faster, also between Windows and REMIX (Android) benchmarks and 32 bit vs 64 bit versions. On the other hand, all these comparisons were fairly consisten on the Intel Core i7 tests.
##################### T7 Original ###################### T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, 4 x 32 KB L1 cache, 1 MB shared L2 cache Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.17 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3120 3060 3128 3078 2T 6098 3003 6083 3004 4T 11354 2948 11188 2942 8T 11403 2857 10412 2872 122.9 1T 996 983 661 699 2T 1868 984 1012 697 4T 2600 982 1483 699 8T 2534 976 1459 694 12288 1T 335 286 91 80 2T 640 288 113 82 4T 892 286 130 82 8T 925 287 127 81 Total Elapsed Time 44.7 seconds #################### T7 ARM-Intel ##################### ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.59 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3060 2001 2867 1904 2T 5459 1879 5463 1867 4T 10797 1852 10537 1856 8T 10090 1802 10608 1813 122.9 1T 968 823 588 547 2T 1749 785 902 618 4T 2716 812 1328 672 8T 2733 810 1407 673 12288 1T 329 274 90 82 2T 636 272 112 82 4T 849 271 128 82 8T 869 271 126 81 Total Elapsed Time 45.4 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 1.7 GHz Cortex-A15, Android 4.2.2 2 x 32 KB L1 cache, 1 MB shared L2 cache Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.13 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 6696 4438 6594 4483 2T 12338 3078 12263 3573 4T 12419 2834 12166 2907 8T 12314 2903 11991 2934 122.9 1T 3371 2916 1639 1748 2T 6409 1922 2052 1097 4T 6155 1892 2027 1186 8T 6045 2105 2015 1192 12288 1T 1394 1048 153 133 2T 2245 985 285 123 4T 2277 1002 285 132 8T 2165 1001 286 127 Total Elapsed Time 44.0 seconds #################### T11 ARM-Intel #################### ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 12.07 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 6315 4486 6345 4484 2T 11837 2910 11846 3112 4T 11864 2835 11553 2858 8T 11821 3003 11805 3198 122.9 1T 3963 2681 1670 1704 2T 6672 1782 2040 1125 4T 6493 1817 2033 1218 8T 6673 1738 2038 1303 12288 1T 1805 1081 177 145 2T 2543 1066 279 137 4T 2600 1065 276 136 8T 2662 1073 281 138 Total Elapsed Time 43.7 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s L1 caches 4 x 16 KB, L2 cache shared 2048 KB Android MP-RndMem2 Benchmark V2.1 08-Jul-2015 16.33 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 5088 5325 4262 4711 2T 9752 4902 8895 4570 4T 17379 4653 17434 4096 8T 19771 4698 17358 4424 122.9 1T 2714 2578 1923 2163 2T 5614 2502 3483 2107 4T 10859 2219 4835 1972 8T 10654 2410 4904 1923 12288 1T 1798 952 186 204 2T 3489 974 341 195 4T 6515 943 563 196 8T 6218 922 563 187 Total Elapsed Time 42.3 seconds #################### T21 ARM-Intel #################### ARM/Intel MP-RndMem Benchmark V1.1 09-Jul-2015 11.48 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 4186 3777 4055 3933 2T 9324 3541 7710 3619 4T 16594 3350 15731 3142 8T 18117 3291 16187 3262 122.9 1T 2423 2043 1610 1683 2T 5235 2029 3013 1641 4T 10148 1935 4662 1565 8T 10015 1834 4611 1474 12288 1T 1363 886 171 186 2T 2643 845 325 187 4T 5197 823 534 184 8T 4801 835 542 184 Total Elapsed Time 42.6 seconds ###################### P37 32 Bit ###################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 Single Channel RAM, LPDDR3 933 MHz, 7.5 GB/second 8 x 32 KB L1 cache, 512 KB shared L2 cache ARM/Intel MP-RndMem Benchmark V1.2 14-Nov-2016 12.13 Compiled for 32 bit ARM v7a MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3464 2779 3249 2792 2T 6473 2549 6471 2574 4T 12671 2355 12644 2243 8T 20039 2055 19677 1837 122.9 1T 3142 2667 843 847 2T 6072 2463 1552 785 4T 11678 2098 2400 675 8T 15639 2228 3822 668 12288 1T 2404 887 71 70 2T 4058 899 141 69 4T 5665 867 258 67 8T 7169 881 410 66 Total Elapsed Time 49.2 seconds Android 7.0 ARM/Intel MP-RndMem Benchmark V1.2 17-Mar-2017 10.43 Compiled for 32 bit ARM v7a MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3497 2803 3267 2770 2T 6443 2600 6495 2585 4T 12818 2264 12751 2318 8T 20056 2121 19918 2160 122.9 1T 3148 2672 824 865 2T 6104 2493 1562 800 4T 11723 2203 2423 698 8T 16376 2120 3930 733 12288 1T 2554 931 73 72 2T 4276 909 148 70 4T 6703 872 267 68 8T 6425 914 407 67 Total Elapsed Time 47.9 seconds #################### T22 Original ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 4 x 32 KB L1 cache, 512 KB shared L2 cache Android MP-RndMem2 Benchmark V2.1 11-Nov-2015 13.03 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3401 3874 3435 3892 2T 6777 3817 6592 3773 4T 13025 3729 12630 3685 8T 12848 3654 12113 3654 122.9 1T 3257 3583 827 946 2T 6416 3572 1481 943 4T 11897 3564 2205 934 8T 11106 3550 2173 945 12288 1T 2397 1734 82 93 2T 4652 1725 161 94 4T 5834 1748 287 94 8T 4774 1743 276 93 Total Elapsed Time 45.9 seconds ###################### T22 32 Bit ###################### ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.13 Compiled for 32 bit ARM v7a MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 2894 2438 2887 2433 2T 5665 2402 5663 2403 4T 10922 2369 11100 2310 8T 10065 2293 10648 2265 122.9 1T 2681 2368 757 758 2T 5351 2360 1398 769 4T 10056 2308 2121 772 8T 8838 2351 1916 742 12288 1T 2309 1662 80 78 2T 3986 1683 164 73 4T 5419 1684 283 82 8T 4658 1694 279 82 Total Elapsed Time 44.6 seconds ###################### T22 64 Bit ###################### ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.15 Compiled for 64 bit ARM v8a 12.29 1T 4445 3109 4455 3089 2T 8010 3100 8072 3105 4T 15909 3057 14711 3040 8T 14764 3036 14570 3037 122.9 1T 3457 2888 842 876 2T 6537 2924 1524 876 4T 11095 2892 2119 861 8T 11729 2916 2080 874 12288 1T 2475 1679 81 78 2T 4155 1713 163 73 4T 5503 1711 285 89 8T 4519 1717 281 89 Total Elapsed Time 48.1 seconds #################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s 4 x 24 KB L1, 2 x 1 MB L2 Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.14 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 1337 2505 1337 2509 2T 2637 2513 2657 2521 4T 3535 2420 3484 2454 8T 3195 2403 3088 2406 122.9 1T 1305 2280 963 1758 2T 2581 2285 1945 1748 4T 3588 2130 3125 1740 8T 3211 2269 2949 1745 12288 1T 1248 1962 101 215 2T 2469 1940 191 214 4T 3462 1954 323 214 8T 3127 1926 318 212 Total Elapsed Time 43.7 seconds ################## A1 V1 Android 5.0 ################### Android MP-RndMem2 Benchmark V2.1 05-Nov-2015 11.55 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 5580 5533 5554 5455 2T 10460 5393 8625 5336 4T 15584 5013 12183 5211 8T 14687 4850 9754 4882 122.9 1T 4180 4368 2557 2522 2T 8301 4276 5072 2511 4T 15613 4238 7764 2425 8T 14496 4259 7278 2466 12288 1T 3360 2180 239 239 2T 6219 2140 379 240 4T 6758 2135 418 238 8T 6991 2131 418 232 Total Elapsed Time 47.6 seconds #################### A1 ARM-Intel ###################### ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.54 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 4643 3593 4710 3641 2T 8583 3552 8761 3564 4T 12707 3450 12496 3384 8T 10410 3389 10796 3408 122.9 1T 3733 2874 2408 2150 2T 7259 2871 4781 2165 4T 11726 2897 7656 2133 8T 11673 2853 7100 2113 12288 1T 3153 2087 226 238 2T 5782 2073 327 238 4T 6451 1997 447 236 8T 6471 2071 446 233 Total Elapsed Time 41.5 seconds ########### A5 ARM-Intel Dual Boot With W2 ############# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Android 5.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 ARM/Intel MP-RndMem Benchmark V1.2 14-Apr-2016 17.41 Compiled for 32 bit Intel x86 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 4395 3558 4562 3346 2T 8094 3465 7975 3372 4T 11923 3377 11375 3220 8T 10165 3207 10220 3205 122.9 1T 3519 2796 2360 1993 2T 6875 2591 4233 1970 4T 10225 2761 5943 1935 8T 10158 2755 6363 2052 12288 1T 2586 1846 187 192 2T 3890 1728 310 213 4T 5035 1986 373 194 8T 3972 1887 359 186 Total Elapsed Time 44.0 seconds #################### W1 REMIX 32 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-RndMem Benchmark V1.2 21-Oct-2016 14.32 Compiled for 32 bit Intel x86 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 4504 3504 4322 3382 2T 7137 2799 5874 3446 4T 8441 2526 7759 3049 8T 7693 1763 8478 1300 122.9 1T 2947 2777 2389 2086 2T 5791 2196 3345 1799 4T 6721 1821 4257 1475 8T 7466 1129 4926 1201 12288 1T 3026 2278 201 239 2T 3850 1687 326 218 4T 4451 1772 304 215 8T 5007 1407 407 160 Total Elapsed Time 47.0 seconds #################### W1 REMIX 64 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-RndMem Benchmark V1.2 11-Nov-2016 21.30 Compiled for 64 bit Intel x86_64 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3501 2736 3655 2561 2T 5999 2462 6015 1922 4T 7295 1306 5998 1930 8T 7895 983 7769 1607 122.9 1T 2851 2036 2273 1861 2T 4950 1772 2973 1623 4T 6384 1405 4053 1292 8T 6409 1046 4598 1049 12288 1T 2362 1826 207 225 2T 3609 1356 349 185 4T 3711 1378 288 174 8T 4910 1131 436 120 Total Elapsed Time 51.0 seconds ################# W1 Windows 10 32 bit ################# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10 4 GB DDR3 1600 dual channel 12.8 GB/s MPRandMem32 From C/C++ 18.00.21005.1 for x86 Start of test Mon Dec 12 16:17:43 2016 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 4227 5149 4457 5086 2T 7978 5490 7846 5379 4T 10589 5292 10543 5208 8T 7912 5066 8068 5137 122.9 1T 3571 3893 2345 2380 2T 6453 3867 4227 2327 4T 11784 3845 6403 2385 8T 11449 3950 6431 2373 12288 1T 2948 2750 222 227 2T 4889 2761 408 229 4T 6290 2771 532 231 8T 6256 2724 534 269 End of test Mon Dec 12 16:18:27 2016 ################# W1 Windows 10 64 bit ################# MPRandMem64 From C/C++ 18.00.21005.1 for x64 Start of test Mon Dec 12 16:22:12 2016 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 3816 4658 3884 4495 2T 7060 4531 6971 4390 4T 12603 4383 12604 4334 8T 12435 4179 12493 4215 122.9 1T 3212 3594 2431 2248 2T 5919 3437 4220 2302 4T 11178 3459 6838 2299 8T 10630 3539 6775 2280 12288 1T 2789 2689 228 229 2T 4688 2663 424 242 4T 6079 2670 561 250 8T 6061 2667 562 270 End of test Mon Dec 12 16:22:55 2016 ######## W2 Windows 10 32 bit Dual Boot With A5 ######## Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 MPRandMem32 From C/C++ 18.00.21005.1 for x86 Start of test Mon Dec 12 16:29:17 2016 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 4151 4929 4126 5104 2T 7501 5063 7496 4887 4T 10549 4933 10620 5206 8T 7259 5126 7278 5072 122.9 1T 3576 3997 2358 2372 2T 6223 3629 3763 2206 4T 11064 3709 6300 2234 8T 11442 3464 5399 2334 12288 1T 2691 2043 195 203 2T 3706 1999 315 217 4T 5382 2098 371 205 8T 5067 1925 352 197 End of test Mon Dec 12 16:30:01 2016 ######## W2 Windows 10 64 bit Dual Boot With A5 ######## MPRandMem64 From C/C++ 18.00.21005.1 for x64 Start of test Mon Dec 12 16:26:52 2016 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 3606 4076 3535 4068 2T 5461 3879 6031 3761 4T 11092 3779 10265 4028 8T 9485 3753 9284 3728 122.9 1T 2465 2726 1897 1916 2T 4836 2957 3673 2066 4T 8259 3168 4491 1974 8T 10424 3125 6583 2052 12288 1T 2246 1655 187 188 2T 3245 1769 301 187 4T 4933 1560 360 186 8T 4345 1790 344 175 End of test Mon Dec 12 16:27:38 2016 #################### PC REMIX 32 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-RndMem Benchmark V1.2 21-Oct-2016 12.49 Compiled for 32 bit Intel x86 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 25329 28404 22502 27578 2T 45352 28049 43404 29901 4T 67532 27226 66231 26721 8T 73022 27909 70942 29210 122.9 1T 24237 24426 12519 8183 2T 40910 24130 22546 8612 4T 67966 22138 28955 7129 8T 74659 18872 46730 7929 12288 1T 14375 12505 1139 1127 2T 27645 11799 2248 1105 4T 48129 11772 3564 1078 8T 72818 12119 4256 775 Total Elapsed Time 43.6 seconds #################### PC REMIX 64 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-RndMem Benchmark V1.2 11-Nov-2016 14.35 Compiled for 64 bit Intel x86_64 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 26834 29490 27107 28870 2T 54416 29824 53434 25831 4T 105809 27746 56139 20591 8T 85898 19779 84818 21910 122.9 1T 23931 25524 11601 8270 2T 48842 25062 23859 8412 4T 98110 22674 47244 7154 8T 89250 16270 53559 5951 12288 1T 15175 12540 1077 1127 2T 29600 11483 2342 1095 4T 43737 10585 2200 904 8T 78035 11667 4351 755 Total Elapsed Time 46.1 seconds ============================================== Top end 2015 PC - Core i7-4820K at 3.9 GHz Quad core, 8 threads, 10 MB shared L3 cache RAM 1600 MHz, quad channel, 51.2 GB/sec ============================================== Intel/Windows 32 Bit Version MPRandMem32 From C/C++ 18.00.21005.1 for x86 Start of test Tue Feb 23 16:05:00 2016 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 26590 29369 27321 27593 2T 52121 30063 48980 27757 4T 66651 29464 72519 27466 8T 58774 28464 57426 26236 122.9 1T 25876 28670 13416 8815 2T 46692 28183 21803 8767 4T 82678 28469 46885 8497 8T 83158 28482 49158 8677 12288 1T 16527 13042 1196 1191 2T 27888 12767 2389 1188 4T 49291 13049 3393 1191 8T 84109 12954 4176 1192 End of test Tue Feb 23 16:05:41 2016 Intel/Windows 64 Bit Version MPRandMem64 From C/C++ 18.00.21005.1 for x64 Start of test Tue Feb 23 16:06:04 2016 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 26322 28220 25930 28695 2T 54658 30081 39512 27874 4T 99694 29950 89274 27925 8T 88620 29773 85848 27924 122.9 1T 25196 27993 13424 8633 2T 44627 28207 21816 8785 4T 65329 28108 44155 8620 8T 91445 28208 53751 8715 12288 1T 17662 13110 1301 1198 2T 32242 12856 2595 1198 4T 57536 13117 4905 1197 8T 85697 13079 4645 1197 End of test Tue Feb 23 16:06:46 2016 |
The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run. The numeric results start with values of 1.0, with subsequent calculations reducing the values, the amount depending on the number of calculations. Further details, results and links to download original MP-MFLOPS benchmark can be found here, with more details of the latest MP-MFLOP2S compilations here. The newer versions have longer running times that avoid inconsistent speeds produced by the original.
Using Tablet A1, with the Intel Atom CPU, the original ARM only version was much slower than the native code variety, at 32 operations per word, and running via Android 5.0 was not much faster. Similarly, there was little difference on ARM based systems, between the original and later compilations.
Tablet T22 results, from the 64 bit compilation, showed that it could be much faster than the 32 bit benchmark, up to 3.7 times at 2 operations per word. The reason is that 64 bit vector SIMD instructions were produced, instead of scalars.
MFLOPS/MHz Comparisons - These are provided to compare different CPU technology. None of these are particularly good, the best being The Cortex A53 at 64 bits, producing just over 1 result per cycle per CPU.
Intel/Windows Versions - The compiler used for these appears to be somewhat more advanced than that used for Intel/Android, implementing full SIMD SSE instructions for 64 bit and 32 bit benchmarks. The result is that a Z8300 Atom CPU core produced up to 1.66 MFLOPS/MHz. The maximum speed of a Core i7, using SSE instructions, is 4 multiplies and 4 linked adds per cycle (8 MFLOPS/MHz). This benchmark demonstrated more than 5.5 MFLOPS/MHz.
A5 and W2 Dual Boot Tablet - The Windows compilation is much faster than the Android version, as SSE SIMD type instructions are used. For comparable performance see A5 results below in section NEON-MFLOPS-MP Benchmark. This uses hand coded NEON intrinsic functions, rather than compiler generated machine code. Speeds from the 64 bit version appear to be somewhat faster than the 32 bit variety. However, note that there can ve wide variations in recorded results.
REMIX/Android vs Windows - Windows was faster at 32 bits but performance was similar at 64 bits.for both Atom Z8300 and Core i7.
Other 64 Bit vs 32 Bit - REMIX/Android produced significantly increased speeds at 64 bits, on the Atom and Core i7.
##################### T7 Original ###################### T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, Android MP-MFLOPS2 Benchmark V2.1 05-Feb-2015 11.37 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 182 156 114 598 578 572 2T 365 321 194 1194 1163 1141 4T 716 655 233 2367 2316 2240 8T 717 682 233 2347 2371 2246 Total Elapsed Time 135.5 seconds #################### T7 ARM-Intel ##################### ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.44 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 188 156 116 598 578 574 2T 365 319 197 1195 1161 1145 4T 682 709 237 2372 2345 2249 8T 678 731 237 2361 2381 2254 Total Elapsed Time 135.0 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 1.7 GHz Cortex-A15, Android 4.2.2 Android MP-MFLOPS2 Benchmark V2.1 29-Apr-2015 10.22 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 845 817 544 1546 1539 1512 2T 1593 1668 648 3140 3067 2977 4T 1974 1775 645 2963 3093 2845 8T 1935 2059 652 3108 3147 2985 Total Elapsed Time 58.5 seconds #################### T11 ARM-Intel #################### ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 20.30 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 695 756 536 1537 1501 1476 2T 1319 1527 645 3151 3077 3000 4T 1604 1567 657 3035 3095 2997 8T 1604 1639 658 3108 3125 2996 Total Elapsed Time 59.1 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 15.35 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 718 781 590 1214 1220 1228 2T 1572 1583 1118 2406 2436 2442 4T 2338 2959 1836 4867 4911 4859 8T 3148 3266 1866 4870 4916 4888 Total Elapsed Time 56.4 seconds #################### T21 ARM-Intel #################### ARM/Intel MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 16.50 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 822 768 636 1232 1228 1231 2T 1662 1637 1184 2460 2463 2446 4T 2509 3216 1659 4519 4762 4900 8T 2965 3193 1881 4847 4925 4880 ###################### P37 32 Bit ###################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 Single Channel RAM, LPDDR3 933 MHz, 7.5 GB/second 8 x 32 KB L1 cache, 512 KB shared L2 cache ARM/Intel MP-MFLOPS2 Benchmark V2.2 14-Nov-2016 12.16 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 229 226 217 811 810 797 2T 451 446 422 1615 1617 1591 4T 884 857 646 3213 3199 3159 8T 1309 1276 714 5192 5164 5030 Total Elapsed Time 90.7 seconds Android 7.0 ARM/Intel MP-MFLOPS2 Benchmark V2.2 11-May-2017 10.39 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 229 227 220 814 813 801 2T 455 450 435 1626 1623 1609 4T 891 867 687 3225 3219 3181 8T 1283 1307 708 5156 5241 5142 Total Elapsed Time 90.1 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.17 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 190 190 184 670 672 664 2T 377 378 370 1343 1345 1329 4T 707 755 725 2657 2669 2621 8T 722 736 714 2640 2672 2631 Total Elapsed Time 113.0 seconds ###################### T22 64 Bit ###################### ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.24 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 705 701 636 1398 1394 1362 2T 1376 1395 942 2794 2797 2757 4T 2063 2602 962 5491 5546 5336 8T 2474 2611 957 5367 5500 5417 Total Elapsed Time 51.6 seconds #################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MP-MFLOPS2 Benchmark V2.1 04-Feb-2015 11.03 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 502 501 476 575 575 573 2T 1012 975 921 1133 1140 1115 4T 1571 1627 979 2238 2255 2258 8T 1550 1890 1007 2235 2239 2217 Total Elapsed Time 117.4 seconds ################## A1 V1 Android 5.0 ################## Android MP-MFLOPS2 Benchmark V2.1 05-Nov-2015 11.59 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 607 586 559 556 553 555 2T 1174 1153 1057 1111 1115 1112 4T 1539 2220 992 2181 2207 2179 8T 1736 2097 1011 2184 2194 2178 Total Elapsed Time 119.2 seconds #################### A1 ARM-Intel ###################### ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.24 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 695 696 661 1061 1061 1055 2T 1335 1382 1058 2088 2086 2102 4T 1832 2635 979 3993 4125 4145 8T 2026 2557 1007 3842 4044 4110 Total Elapsed Time 65.8 seconds ########### A5 ARM-Intel Dual Boot With W2 ############# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Android 5.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 ARM/Intel MP-MFLOPS2 Benchmark V2.2 14-Apr-2016 17.53 Compiled for 32 bit Intel x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 422 450 401 945 964 939 2T 795 849 754 1809 1859 1815 4T 1161 1514 1084 3043 3159 3144 8T 1141 1376 1065 3173 3241 3234 Total Elapsed Time 78.8 seconds #################### W1 REMIX 32 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-MFLOPS2 Benchmark V2.2 21-Oct-2016 14.27 Compiled for 32 bit Intel x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 386 449 427 922 930 917 2T 579 738 733 1658 1642 1636 4T 894 1011 839 2326 2146 2121 8T 974 1084 1039 2239 2355 2433 Total Elapsed Time 90.6 seconds #################### W1 REMIX 64 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel MP-MFLOPS2 Benchmark V2.2 14-Aug-2016 22.35 Compiled for 64 bit Intel x86_64 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1365 1369 926 2478 2525 2438 2T 2628 2746 1403 4420 4439 4382 4T 2505 3654 1462 5398 6022 5754 8T 2619 3133 1570 6133 6500 6224 Total Elapsed Time 34.0 seconds ################# W1 Windows 10 32 bit ################# Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 MP-MFLOPS From C/C++ 18.00.21005.1 for x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1467 1388 1215 2537 2529 2486 2T 2773 2825 1659 4937 4958 4740 4T 3334 4845 1512 8453 8813 8694 8T 2818 5068 1575 8338 8896 8627 ################# W1 Windows 10 64 bit ################# MP-MFLOPS From C/C++ 18.00.21005.1 for x64 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1470 1471 1252 2936 3060 2996 2T 2775 2982 1653 5593 5860 5680 4T 3610 5290 1520 9401 10488 10326 8T 3132 5178 1562 8957 8365 10433 ######## W2 Windows 10 32 bit Dual Boot With A5 ######## Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 MP-MFLOPS From C/C++ 18.00.21005.1 for x86 Start of test Sat May 21 19:10:08 2016 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1415 1346 968 2368 2274 2227 2T 2336 2436 857 4460 4433 4181 4T 2718 4196 1046 7192 7984 7678 8T 3073 3220 1071 6133 8773 6413 ######## W2 Windows 10 64 bit Dual Boot With A5 ######## MP-MFLOPS From C/C++ 18.00.21005.1 for x64 Start of test Fri Apr 15 16:41:27 2016 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1560 1584 1034 2952 2965 2877 2T 2590 2757 1160 5369 5862 5333 4T 3852 5094 1090 9407 10478 10331 8T 3480 4973 1133 7748 10417 7742 #################### PC REMIX 32 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-MFLOPS2 Benchmark V2.2 21-Oct-2016 12.47 Compiled for 32 bit Intel x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 3593 3565 3355 5610 5870 5859 2T 6858 7298 6767 10848 11732 11689 4T 7267 14299 7480 18157 23093 20018 8T 10919 13727 11940 22555 22935 22929 Total Elapsed Time 12.1 seconds #################### PC REMIX 64 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel MP-MFLOPS2 Benchmark V2.2 11-Nov-2016 14.34 Compiled for 64 bit Intel x86_64 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 13176 8885 6002 21867 22182 21447 2T 21999 22460 11030 42151 43598 45387 4T 24740 31790 15002 82615 86988 87136 8T 24161 41857 27639 78321 89838 85588 Total Elapsed Time 3.4 seconds ################# PC Windows 10 32 bit ################# Top end 2015 PC - Core i7-4820K at 3.9 GHz MP-MFLOPS From C/C++ 18.00.21005.1 for x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 11945 10323 6088 21760 21813 21691 2T 18020 20096 11072 34309 43919 45673 4T 25662 42897 13955 55831 89194 90429 8T 22256 49955 14299 80928 90240 88848 ################# PC Windows 10 64 bit ################# MP-MFLOPS From C/C++ 18.00.21005.1 for x64 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 14218 12522 6044 22097 22201 22087 2T 21473 24706 11189 42464 44797 46061 4T 24241 28250 15774 59471 90548 81144 8T 27512 57442 14238 82808 92377 92959 ################ Comparison MFLOPS/MHz ################ FPU Add & Multiply using 1, 2, 4 and Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 Threads 32 Bit Only Android T7 1T 0.16 0.13 0.10 0.50 0.48 0.48 Cortex 2T 0.30 0.27 0.16 1.00 0.97 0.95 A9 4T 0.57 0.59 0.20 1.98 1.95 1.87 T11 1T 0.41 0.44 0.32 0.90 0.88 0.87 Cortex 2T 0.78 0.90 0.38 1.85 1.81 1.76 A15 4T 0.94 0.92 0.39 1.79 1.82 1.76 T21 1T 0.38 0.36 0.30 0.57 0.57 0.57 Qualcomm 2T 0.77 0.76 0.55 1.14 1.15 1.14 800 4T 1.17 1.50 0.77 2.10 2.21 2.28 A1 1T 0.37 0.37 0.36 0.57 0.57 0.57 Atom 2T 0.72 0.74 0.57 1.12 1.12 1.13 Z3745 4T 0.98 1.42 0.53 2.15 2.22 2.23 A5 1T 0.23 0.24 0.22 0.51 0.52 0.51 Atom 2T 0.43 0.46 0.41 0.98 1.01 0.99 z8300 4T 0.63 0.82 0.59 1.65 1.72 1.71 P37 1T 0.15 0.15 0.14 0.54 0.54 0.53 Cortex 2T 0.30 0.30 0.28 1.08 1.08 1.06 A53 4T 0.59 0.57 0.43 2.14 2.13 2.11 8 core 8T 0.87 0.85 0.48 3.46 3.44 3.35 ########################################################### 32 Bit and 64 Bit Android T22 32b 1T 0.15 0.15 0.14 0.52 0.52 0.51 Cortex 2T 0.29 0.29 0.28 1.03 1.03 1.02 A53 4T 0.54 0.58 0.56 2.04 2.05 2.02 T22 64b 1T 0.37 0.37 0.36 0.57 0.57 0.57 Cortex 2T 0.72 0.74 0.57 1.12 1.12 1.13 A53 4T 0.98 1.42 0.53 2.15 2.22 2.23 REMIX/Android R1 32b 1T 0.21 0.24 0.23 0.50 0.51 0.50 Atom 2T 0.31 0.40 0.40 0.90 0.89 0.89 Z8300 4T 0.49 0.55 0.46 1.26 1.17 1.15 R1 64b 1T 0.74 0.74 0.50 1.35 1.37 1.33 Atom 2T 1.43 1.49 0.76 2.40 2.41 2.38 Z8300 4T 1.36 1.99 0.79 2.93 3.27 3.13 R2 32b 1T 0.92 0.91 0.86 1.44 1.51 1.50 Core i7 2T 1.76 1.87 1.74 2.78 3.01 3.00 4820K 4T 1.86 3.67 1.92 4.66 5.92 5.13 8HT 8T 2.80 3.52 3.06 5.78 5.88 5.88 R2 64b 1T 3.38 2.28 1.54 5.61 5.69 5.50 Core i7 2T 5.64 5.76 2.83 10.81 11.18 11.64 4820K 4T 6.34 8.15 3.85 21.18 22.30 22.34 8HT 8T 6.20 10.73 7.09 20.08 23.04 21.95 Windows W1 32b 1T 0.80 0.75 0.66 1.38 1.37 1.35 Atom 2T 1.51 1.54 0.90 2.68 2.69 2.58 Z8300 4T 1.81 2.63 0.82 4.59 4.79 4.73 W1 64b 1T 0.80 0.80 0.68 1.60 1.66 1.63 Atom 2T 1.51 1.62 0.90 3.04 3.18 3.09 Z8300 4T 1.96 2.88 0.83 5.11 5.70 5.61 W2 32b 1T 0.77 0.73 0.53 1.29 1.24 1.21 Atom 2T 1.27 1.32 0.47 2.42 2.41 2.27 z8300 4T 1.48 2.28 0.57 3.91 4.34 4.17 W2 64b 1T 0.85 0.86 0.56 1.60 1.61 1.56 Atom 2T 1.41 1.50 0.63 2.92 3.19 2.90 z8300 4T 2.09 2.77 0.59 5.11 5.69 5.61 PC 32b 1T 3.06 2.65 1.56 5.58 5.59 5.56 Core i7 2T 4.62 5.15 2.84 8.80 11.26 11.71 4820K 4T 6.58 11.00 3.58 14.32 22.87 23.19 8HT 8T 5.71 12.81 3.67 20.75 23.14 22.78 PC 64b 1T 3.65 3.21 1.55 5.67 5.69 5.66 Core i7 2T 5.51 6.33 2.87 10.89 11.49 11.81 4820K 4T 6.22 7.24 4.04 15.25 23.22 20.81 8HT 8T 7.05 14.73 3.65 21.23 23.69 23.84 |
NEON-MFLOPS-MP carries out the same calculations as MP-MFLOPS Benchmarks above, but with NEON intrinsic functions used for all calculations. For further results see here. The effect of using these functions, instead of leaving it to the compiler, is that 32 bit performance, on ARM based systems, was similar between the original and new benchmarks.
T22 NEON 64 bit compilation produced a small performance gain over 32 bit results, at 2 operations per word, but near double speed at 32 operations, the latter benefiting from availability of sufficient registers for all the variables.
On the Intel Atom based tablet A1, via the ARM to Intel conversion layer, performance was similar via Android 4 and 5, but the native code version was more than twice as fast at 32 operations per word.
MFLOPS/MHz Comparisons are also provided, including examples on maximum speeds from the non-NEON version, demonstrating NEON gains of up to more than three times as fast. A result submitted for P33, with an ARM Cortex-A57 produced the best single core performance (at November 2015) of 3.47 results per cycle at 64 bits, followed by the Cortex-A53 at 2.13. This is still disappointing, compared with Intel desktop processors, such as the Core 2 onwards, at 6 per clock cycle out of a maximum of 8, with SSE SIMD code (See Linux results).
Intel REMIX/Android - For some reason. this native ARM/Intel and 64 bit/32 bit version failed to run. In this case, the compiler probably failed to translate NEON intrinsic functions into appropriate Intel instructions. The original benchmark had pure ARM code, translated by the Houdini interpreter and that ran successfully, results being included below. This demonstrated up to 4.11 MFLOPS/MHz using a single core.
Following the performance details are the numeric results of calculations from the fixed parameters used in the new version, for both ARM and Intel. It seems that Tablet T11 has an intermittent fault, as it occasionally fails to calculate a correct answer or causes the Tablet to crash and reboot. Now, this also appears to happen using the older version.
The benchmark appeared to run successfully with an Energy Saving On setting, where performance was much slower and CPU MHz was measured as 1000 MHz instead of 1700 (see results below).
##################### T7 Original ###################### T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 16.57 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 532 402 124 1135 1044 960 2T 1255 798 213 2041 1987 1916 4T 2441 1553 229 4185 4034 3450 8T 1922 2403 226 3774 3996 3346 Total Elapsed Time 4.5 seconds #################### T7 ARM-Intel ##################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.24 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 657 407 132 1077 1074 1053 2T 1265 817 222 2147 2150 2078 4T 2024 1695 234 4214 4276 3555 8T 2435 2495 234 4196 4100 3523 Total Elapsed Time 39.0 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 1.7 GHz Cortex-A15, Android 4.2.2 Dual Core Android NEON-MFLOPS-MP Benchmark V1.1 13-Sep-2013 13.44 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1847 1415 597 3772 4096 3545 2T 3649 3309 664 8065 7966 7505 4T 3670 3922 658 7753 8148 7490 8T 5664 5570 681 8092 8355 7672 Total Elapsed Time 13.0 seconds #################### T11 ARM-Intel #################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.07 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1965 1630 582 3792 4077 3521 2T 3789 2690 663 8497 8133 7297 4T 5714 4883 654 8364 8192 7554 8T 5414 6316 673 7976 8437 6635 Total Elapsed Time 13.0 seconds ######## T11 ARM-Intel Power Saving On 1.0 GHz ######## ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-Nov-2015 16.55 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1935 1290 645 2516 2397 2339 2T 3664 2644 684 4945 4780 4657 4T 3436 3337 690 4911 4931 4674 8T 3133 3543 689 4818 4959 4651 Total Elapsed Time 19.2 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s Android NEON-MFLOPS2-MP Benchmark V2.1 25-Jul-2015 18.44 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 2757 2576 771 2808 2825 2800 2T 5662 5525 1516 5631 5664 5570 4T 6550 7846 1945 11167 11281 10939 8T 10273 10928 1981 10851 11211 11350 Total Elapsed Time 40.0 seconds #################### T21 ARM-Intel #################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 28-Jun-2015 16.32 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 3049 2857 622 2923 2874 2098 2T 5508 4887 1009 5477 5736 4349 4T 5643 5282 1410 11244 11601 8564 8T 9294 11156 1681 11288 11605 8946 Total Elapsed Time 14.0 seconds ###################### P37 32 Bit ###################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 Single Channel RAM, LPDDR3 933 MHz, 7.5 GB/second 8 x 32 KB L1 cache, 512 KB shared L2 cache ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 14-Nov-2016 12.18 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 740 660 399 1739 1729 1691 2T 1334 1228 566 3449 3416 3328 4T 2188 2139 675 6671 6674 6463 8T 2489 3261 722 10379 10466 9768 Total Elapsed Time 22.1 seconds Android 7.0 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 11-May-2017 10.44 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 716 686 432 1740 1740 1703 2T 1367 1255 614 3457 3427 3358 4T 2389 2131 726 6814 6682 6644 8T 2914 2776 744 10082 9994 9712 Total Elapsed Time 21.8 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.35 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 619 613 575 1444 1446 1426 2T 1174 1206 889 2894 2902 2839 4T 1585 1616 901 5679 5726 5596 8T 2075 2130 944 5400 5585 5519 Total Elapsed Time 25.8 seconds ###################### T22 64 Bit ###################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.38 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 726 745 647 2766 2774 2639 2T 1397 1402 903 5523 5552 5371 4T 1871 1930 898 10780 10479 10439 8T 2496 2876 1011 9736 10679 9900 Total Elapsed Time 15.1 seconds ##################### P33 64 Bit ##################### P33 Quad-core 2 GHz Qualcomm Snapdragon 810, Android 5.0.2 4 x Cortex-A57 and 4 x Cortex-A53 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 16-Sep-2015 17.59 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 2811 3126 1089 6943 6589 6342 2T 2488 4114 1541 12084 10559 8809 4T 4759 5480 2038 16516 14826 11960 8T 4840 8985 2452 22082 23563 12461 Total Elapsed Time 7.6 seconds #################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android NEON-MFLOPS2-MP Benchmark V2.1 07-Feb-2015 18.38 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1796 1520 1025 1231 1228 1227 2T 3354 2959 1047 2427 2445 2445 4T 4627 5508 978 4690 4791 4733 8T 3861 6307 1030 4611 4869 4742 Total Elapsed Time 88.3 seconds ################## A1 V1 Android 5.0 ################## Android NEON-MFLOPS2-MP Benchmark V2.1 05-Nov-2015 12.09 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1969 1913 832 1230 1245 1225 2T 3537 3632 1046 2482 2487 2445 4T 3388 6497 982 4546 4847 4819 8T 4197 6863 1026 4640 4899 4828 Total Elapsed Time 87.7 seconds #################### A1 ARM-Intel ###################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.17 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 2151 1962 1064 2619 2694 2650 2T 4421 3849 1048 5296 5463 5343 4T 5886 6652 982 9592 10735 10362 8T 3744 7284 1018 9085 10791 9493 Total Elapsed Time 13.8 seconds ################### W1 REMIX Original ################## R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 Android NEON-MFLOPS-MP Benchmark V1.1 11-Nov-2016 21.39 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 392 414 388 1964 1954 2084 2T 1790 2301 1133 3237 3775 3774 4T 2130 2386 1068 4165 3541 4188 8T 2110 2047 1026 4438 4091 3631 #################### W1 REMIX 32 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 21-Oct-2016 14.40 Compiled for 32 bit Intel x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1322 1342 965 2377 2517 2354 2T 2261 2627 1155 4140 4316 4329 4T 2187 2656 1361 5494 6082 5693 8T 1978 2673 1613 5888 6050 6119 Total Elapsed Time 17.7 seconds #################### W1 REMIX 64 Bit ################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB Shared L2 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 11-Nov-2016 21.40 Compiled for 64 bit Intel x86_64 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS Can't run - Not an ARMv7 CPU Total Elapsed Time 0.0 seconds #################### A5 ARM Intel ###################### Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Android 5.1, 4 GB DDR 3 1600 4 x 24 KB L1, 2 x 1 MB L2 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 14-Apr-2016 17.57 Compiled for 32 bit Intel x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1501 1551 1030 2520 2485 2301 2T 2300 2957 1161 4699 4999 4632 4T 3106 5126 1097 7929 8173 8015 8T 2692 4623 1108 7830 8432 7989 Total Elapsed Time 15.7 second ################### PC REMIX Original ################## R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, Android NEON-MFLOPS-MP Benchmark V1.1 11-Nov-2016 14.44 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 7381 6891 4206 16044 14885 15134 2T 8892 8294 6078 25814 15291 15897 4T 20783 20566 12919 55052 33458 58857 8T 14049 16003 13811 49462 46915 53373 #################### PC REMIX 32 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 21-Oct-2016 12.53 Compiled for 32 bit Intel x86 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS Can't run - CPU doesn't support NEON Total Elapsed Time 0.0 seconds #################### PC REMIX 64 Bit ################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo 4 x 32 KB L1, 4 x 256 KB L2, 10 MB L3 800 MHz RAM, 4 channels, 51.2 GB/s, Android 6.0.1, ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 11-Nov-2016 14.45 Compiled for 64 bit Intel x86_64 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS Can't run - Not an ARMv7 CPU Total Elapsed Time 0.0 seconds ################ Comparison MFLOPS/MHz ################ 2 Ops/Word 32 Ops/Word Not NEON KB 12.8 128 12800 12.8 128 12800 12.8 Threads 32 Bit Only Android T7 1T 0.55 0.34 0.11 0.90 0.90 0.88 0.50 Cortex 2T 1.05 0.68 0.19 1.79 1.79 1.73 1.00 A9 4T 1.69 1.41 0.20 3.51 3.56 2.96 1.98 T11 1T 1.16 0.96 0.34 2.23 2.40 2.07 0.90 Cortex 2T 2.23 1.58 0.39 5.00 4.78 4.29 1.85 A15 4T 3.36 2.87 0.38 4.92 4.82 4.44 1.79 T21 1T 1.42 1.33 0.29 1.36 1.34 0.98 0.57 Qualcomm 2T 2.56 2.27 0.47 2.55 2.67 2.02 1.14 800 4T 2.62 2.46 0.66 5.23 5.40 3.98 2.10 A1 1T 1.16 1.05 0.57 1.41 1.45 1.42 0.57 Atom 2T 2.38 2.07 0.56 2.85 2.94 2.87 1.12 Z3745 4T 3.16 3.58 0.53 5.16 5.77 5.57 2.15 A5 1T 0.82 0.84 0.56 1.37 1.35 1.25 0.51 Atom 2T 1.25 1.61 0.63 2.55 2.72 2.52 0.98 z8300 4T 1.69 2.79 0.60 4.31 4.44 4.36 1.65 P37 1T 0.49 0.44 0.27 1.16 1.15 1.13 0.54 Cortex 2T 0.89 0.82 0.38 2.30 2.28 2.22 1.08 A53 4T 1.46 1.43 0.45 4.45 4.45 4.31 2.14 8 core 8T 1.66 2.17 0.48 6.92 6.98 6.51 3.46 ########################################################### 32 Bit and 64 Bit Android T22 32b 1T 0.48 0.47 0.44 1.11 1.11 1.10 0.52 Cortex 2T 0.90 0.93 0.68 2.23 2.23 2.18 1.03 A53 4T 1.22 1.24 0.69 4.37 4.40 4.30 2.04 T22 64b 1T 0.56 0.57 0.50 2.13 2.13 2.03 0.57 Cortex 2T 1.07 1.08 0.69 4.25 4.27 4.13 1.12 A53 4T 1.44 1.48 0.69 8.29 8.06 8.03 2.15 P33 1T 1.41 1.56 0.54 3.47 3.29 3.17 N/A Cortex 2T 1.24 2.06 0.77 6.04 5.28 4.40 A57 64b 4T 2.38 2.74 1.02 8.26 7.41 5.98 REMIX/Android R1 32b 1T 0.72 0.73 0.52 1.29 1.37 1.28 0.50 Atom 2T 1.23 1.43 0.63 2.25 2.35 2.35 0.90 Z8300 4T 1.19 1.44 0.74 2.99 3.31 3.09 1.26 R1 64b 1T Can't run - Not an ARMv7 CPU Atom 2T Z8300 4T R2 32b 1T Can't run - Not an ARMv7 CPU Core i7 2T 4820K 4T 8HT 8T R2 64b 1T Can't run - Not an ARMv7 CPU Core i7 2T 4820K 4T 8HT 8T Original Houdini Interpreted Windows R2 32b 1T 1.89 1.77 1.08 4.11 3.82 3.88 5.56 Core i7 2T 2.28 2.13 1.56 6.62 3.92 4.08 11.71 4820K 4T 5.33 5.27 3.31 14.12 8.58 15.09 23.19 Windows Not applicabe ##################### New Results ##################### Results x 100000, 12345 indicates ERRORS ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 1T 44934 86735 99850 36770 79897 99759 2T 44934 86735 99850 36770 79897 99759 4T 44934 86735 99850 36770 79897 99759 8T 44934 86735 99850 36770 79897 99759 T11 44934 12345 99850 36770 79897 99759 Android NEON-MFLOPS-MP Benchmark V1.1 1T 86735 98519 99984 79897 97638 99975 2T 86735 98519 99984 79897 97638 99975 4T 86735 98519 99984 79897 97638 99975 8T 86735 98519 99984 79897 97638 99975 Android NEON-MFLOPS2-MP Benchmark V2.1 1T 40015 66980 99522 35216 54898 99234 2T 40015 66980 99522 35216 54898 99234 4T 40015 66980 99522 35216 54898 99234 8T 40015 66980 99522 35216 54898 99234 |
OpenGL Benchmark - JavaOpenGL1.apkThe benchmark does not rely on complex visual scenes or mathematical functions. The objective being to generate moderate to excessive loading via multiple simple objects. It uses all Java code, with OpenGL ES GL10 statements, to measure graphics performance in Frames Per Second (FPS). Four tests draw a background of 50 cubes first as wireframes then colour shaded. The third test views the cubes in and out of a tunnel with slotted sides and roof, also containing rotating plates. The last test adds textures to the cubes and plates. The 50 cubes are redrawn 15, 30 and 60 times, with randomised positions, colours rotational settings. With 6 x 2 triangles per cube, minimum triangles per frame for the three sets of tests are 9000, 18000 and 36000. An example of the last scene is on the right. The tunnel is provided to show 3D effects, the plates rotating in fixed positions. The numerous cubes are in the distant background, the tunnel slots showing that they are still there, with size varying according to proximity. The cubes appear more as jumping objects, with changing colours and position. Android 5 has switched to ART virtual machine for Java, instead of Dalvik. First results indicate severe degradation in performance with this benchmark. Further details and results can be found
here.
This includes information on Vertical Synchronisation (VSYNC) that limits Frames Per Second (FPS) to 60 and can lead to heavier loading reducing speed in 50% steps. as is apparent in the results below.
Links to my Windows and Linux OpenGL benchmarks are also provided.
|
|
![]() |
On tablets A1 and T7 Android was upgraded to version 5.0, leading a reduction in measured speeds by up to 50%, possibly suggesting that VSYNC had change to 30 FPS. The graphics in A5 appear to be slightly faster than A1, but maximum speed appears to be similarly restricted to 30 FPS.
Except for tablet T15, none of the results are particularly good at the heavier loading. T15 results were also produced via Android 5, with several measurements at near 60 FPS, suggesting that speed reductions on the other tablets are not solely dependent on Android 5.
P37, with Adreno graphics and Android 6 was also slower than T21, with an inferior Adreno GPU and Android 4. So was Wi/R1 Intel Atom based REMIX/Android 6 tablet The powerful Intel Core i7 REMIX speeds were some of the fastest but disappointing for high end GeForce graphics (All effects of the change to Java via ART?).
########################## T7 ########################## T7 Nexus 7 Quad 1200 MHz Cortex-A9, Android 4.1.2 nVidia ULP GeForce Graphics 12 core, 416 MHz Android Java OpenGL Benchmark 06-Mar-2013 21.51 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 42.18 43.57 33.38 23.54 18000+ 23.68 23.47 19.91 13.38 36000+ 12.05 11.95 11.00 7.10 Screen Pixels 1280 Wide 736 High Total Elapsed Time 121.0 seconds #################### T7 Android 5.0 #################### Android Java OpenGL Benchmark 12-Oct-2015 16.06 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 22.61 23.23 17.71 13.46 18000+ 12.03 12.11 10.36 7.57 36000+ 6.14 6.01 5.64 4.03 Screen Pixels 1280 Wide 736 High Total Elapsed Time 121.5 seconds ########################## T11 ######################### T11 Samsung EXYNOS 5250 Dual 1.7 GHz Cortex-A15, Android 4.2.2 Mali-T604 Quad Core GPU Android Java OpenGL Benchmark 09-Aug-2013 09.42 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 39.13 41.52 32.19 27.25 18000+ 22.03 20.73 19.69 16.30 36000+ 12.24 12.23 10.75 8.68 Screen Pixels 1920 Wide 1032 High Total Elapsed Time 120.8 seconds ########################## T15 ######################### T15 HTC Nexus 9, dual core Denver CPU 2400 MHz, Android 5.0.1 Kepler DX1 Graphics Android Java OpenGL Benchmark 28-Jan-2015 22.38 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 59.79 59.84 59.84 57.79 18000+ 59.97 59.26 52.64 32.74 36000+ 31.33 30.95 29.02 17.59 Screen Pixels 2048 Wide 1440 High Total Elapsed Time 121.0 seconds ########################## T21 ######################### T21 Quad Core 2.2 GHz Snapdragon 800, Android 4.4.3 GPU Qualcomm Adreno 330, 578 MHz Android Java OpenGL Benchmark 27-Jul-2015 16.50 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 35.05 35.45 25.60 21.58 18000+ 18.04 18.05 15.32 12.73 36000+ 9.28 9.33 8.47 6.91 Screen Pixels 1200 Wide 1803 High Total Elapsed Time 120.8 seconds ########################## P37 ######################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 GPU Adreno 405 550 MHz Android Java OpenGL Benchmark 17-Oct-2016 10.01 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 27.46 27.68 21.16 17.96 18000+ 14.56 14.60 12.47 10.36 36000+ 7.17 7.21 6.56 5.37 Screen Pixels 1776 Wide 1080 High Total Elapsed Time 121.0 seconds Android 7.0 Android Java OpenGL Benchmark 17-Mar-2017 10.39 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 18.49 18.74 14.45 11.73 18000+ 9.70 9.75 8.40 6.31 36000+ 4.78 4.78 4.45 3.48 Screen Pixels 1776 Wide 1080 High Total Elapsed Time 121.3 seconds ########################## T22 ######################### T22 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 Android 5.0, GPU Mali T720 MP2 Android Java OpenGL Benchmark 26-Aug-2015 16.24 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 22.55 22.11 16.67 14.27 18000+ 11.55 11.60 9.98 8.27 36000+ 5.92 5.98 5.48 4.48 Screen Pixels 800 Wide 1216 High Total Elapsed Time 120.9 seconds ########################## A1 ########################## A1 Asus MemoPad 7, Quad Core 1.86 GHz Intel Atom Z3745 Intel HD Graphics, Android 4.4.2 Android Java OpenGL Benchmark 21-Dec-2014 16.30 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 37.95 37.64 29.86 23.63 18000+ 19.44 19.70 17.26 13.26 36000+ 9.99 9.93 9.35 7.17 Screen Pixels 1280 Wide 736 High Total Elapsed Time 120.6 seconds #################### A1 Android 5.0 #################### Android Java OpenGL Benchmark 10-Oct-2015 13.44 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 25.87 25.89 20.27 16.29 18000+ 13.43 13.56 11.72 9.38 36000+ 6.92 6.73 6.32 4.98 Screen Pixels 800 Wide 1216 High Total Elapsed Time 120.9 seconds #################### A5 Android 5.1 ###################### Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Intel HD Graphics, Android 5.1 Android Java OpenGL Benchmark 21-May-2016 13.00 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 29.77 30.17 22.58 18.54 18000+ 16.09 16.03 13.70 10.78 36000+ 8.31 8.27 7.79 5.76 Screen Pixels 2048 Wide 1440 High Total Elapsed Time 121.0 seconds ####################### W1 REMIX ###################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, HD Graphics Android Java OpenGL Benchmark 14-Aug-2016 22.40 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 19.87 20.29 15.75 12.98 18000+ 11.57 11.68 9.90 7.71 36000+ 6.12 6.14 5.64 4.22 Screen Pixels 1920 Wide 996 High Total Elapsed Time 121.4 seconds ####################### PC REMIX ###################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo Android 6.0.1, GeForce GTX 650, 64-Bit Windows 10 Android Java OpenGL Benchmark 14-Aug-2016 14.23 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 59.96 59.95 60.00 56.78 18000+ 58.21 58.49 53.97 36.68 36000+ 33.45 33.47 31.29 20.46 Screen Pixels 1920 Wide 996 High Total Elapsed Time 120.3 seconds |
OpenGL Drawing Benchmark - JavaDraw.apkThis all Java benchmark uses small to rather excessive simple objects to measure drawing performance, again via Frames Per Second (FPS). Five tests draw on a background of continuously changing colour shades. The image on the right is after four tests.
Further details and results can be found
here,
that includes links to an off line version that runs on PCs via Windows and Linux.
|
|
![]() |
As with Java OpenGL, speeds are limited to 60 FPS by imposed VSYNC. In general, there was not a great deal of differences in performance on the initial systems shown here. In the cases of Android upgrades to version 5. it was virtually identical to tablet A1 but T7 speed was much faster on the tests least dependent on CPU speed.
March 2016 - Results from W1, the Windows 10 based tablet, indicate that VSYNC is not imposed, producing the fastest speeds at this time. Windows/Android dual boot tablet W2/A5, confirms the faster Windows performance (via Java). However, the android version runs at full screen, as opposed to a fixed 1280 x 720 with the Windows variety. The latter was recompiled to use full screen, producing much slower speeds (see below). Windows results from the PC, with a reasonably powerful graphics card, are also shown, to reflect the huge difference in performance.
A5 and W2 Dual Boot Tablet - At Screen pixels 2048 x 1440, the Windows speed was slower than via Android, on the first test, but faster on others. A second test on W2, at 1280 x 720, demonstrates faster speed using a smaller window.
REMIX Android vs Windows - Unlike Android, Windows based tests were not limited to 60 FPS, due to VSYNC, and particularly the PC results shown indicated superior performance.. As with the OpenGL benchmark, P37 was relatively slow (More ART/Java issues?).
########################## T7 ########################## T7 Nexus 7 Quad 1200 MHz Cortex-A9, Android 4.2.1 nVidia ULP GeForce Graphics 12 core, 416 MHz Android Java Drawing Benchmark 12-Apr-2013 19.50 Test Frames FPS Display PNG Bitmap Twice 204 20.38 Plus 2 SweepGradient Circles 165 16.48 Plus 200 Random Small Circles 145 14.50 Plus 320 Long Lines 113 11.30 Plus 4000 Random Small Circles 39 3.81 Screen pixels 1280 Wide 736 High Total Elapsed Time 50.4 seconds Maximum 19.2 Million Pixels Per Second #################### T7 Android 5.0 #################### Android Java Drawing Benchmark 01-Oct-2015 12.24 Test Frames FPS Display PNG Bitmap Twice 487 48.70 Plus 2 SweepGradient Circles 297 29.66 Plus 200 Random mall Circles 231 23.02 Plus 320 Long Lines 149 14.85 Plus 4000 Random Small Circles 39 3.90 Screen pixels 1280 Wide 736 High Total Elapsed Time 50.1 seconds ########################## T11 ######################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Mali-T604 quad core GPU Android Java Drawing Benchmark 09-Aug-2013 09.39 Test Frames FPS Display PNG Bitmap Twice 558 55.74 Plus 2 SweepGradient Circles 277 27.66 Plus 200 Random Small Circles 244 24.36 Plus 320 Long Lines 169 16.84 Plus 4000 Random Small Circles 68 6.72 Screen pixels 1920 Wide 1032 High Total Elapsed Time 50.4 seconds Maximum 110 Million Pixels Per Second ########################## T21 ######################### T21 2.2 GHz Quad Core Snapdragon 800, Android 4.4.3 GPU Qualcomm Adreno 330, 578 MHz Android Java Drawing Benchmark 27-Jul-2015 16.47 Test Frames FPS Display PNG Bitmap Twice 533 53.24 Plus 2 SweepGradient Circles 248 24.73 Plus 200 Random Small Circles 218 21.72 Plus 320 Long Lines 158 15.75 Plus 4000 Random Small Circles 57 5.61 Screen pixels 1200 Wide 1803 High Total Elapsed Time 50.3 seconds ########################## T22 ######################### T22 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 Android 5.0, GPU Mali T720 MP2 Android Java Drawing Benchmark 26-Aug-2015 16.21 Test Frames FPS Display PNG Bitmap Twice 558 55.72 Plus 2 SweepGradient Circles 368 36.70 Plus 200 Random Small Circles 286 28.52 Plus 320 Long Lines 178 17.76 Plus 4000 Random Small Circles 50 4.99 Screen pixels 800 Wide 1216 High Total Elapsed Time 51.5 seconds ########################## P37 ######################### P37, 8 Core ARM Cortex-A53 1500/1200 MHz, Android 6.0.1 GPU Adreno 405 550 MHz Android Java Drawing Benchmark 17-Oct-2016 09.59 Test Frames FPS Display PNG Bitmap Twice 246 24.53 Plus 2 SweepGradient Circles 158 15.77 Plus 200 Random Small Circles 130 12.98 Plus 320 Long Lines 98 9.71 Plus 4000 Random Small Circles 27 2.66 Screen pixels 1776 Wide 1080 High Total Elapsed Time 50.4 seconds Android 7.0 Android Java Drawing Benchmark 17-Mar-2017 10.32 Test Frames FPS Display PNG Bitmap Twice 236 23.57 Plus 2 SweepGradient Circles 149 14.85 Plus 200 Random Small Circles 132 13.19 Plus 320 Long Lines 103 10.24 Plus 4000 Random Small Circles 41 4.06 Screen pixels 1776 Wide 1080 High Total Elapsed Time 50.3 seconds ########################## A1 ########################## A1 Asus MemoPad 7, Quad Core 1.86 GHz Intel Atom Z3745 Intel HD Graphics, Android 4.4.2 Android Java Drawing Benchmark 21-Dec-2014 16.35 Test Frames FPS Display PNG Bitmap Twice 599 59.79 Plus 2 SweepGradient Circles 486 48.55 Plus 200 Random Small Circles 383 38.25 Plus 320 Long Lines 219 21.88 Plus 4000 Random Small Circles 64 6.38 Screen pixels 1280 Wide 736 High Total Elapsed Time 50.1 seconds #################### A1 Android 5.0 #################### Android Java Drawing Benchmark 10-Oct-2015 13.42 Test Frames FPS Display PNG Bitmap Twice 595 59.40 Plus 2 SweepGradient Circles 458 45.79 Plus 200 Random Small Circles 383 38.27 Plus 320 Long Lines 199 19.81 Plus 4000 Random Small Circles 56 5.60 Screen pixels 800 Wide 1216 High Total Elapsed Time 50.1 seconds #################### A5 Android 5.1 #################### Same Tablet as W2 Teclast X98 Plus, Intel Atom Z8300 1.44 GHz, Turbo 1.84 Intel HD Graphics, Android 5.1 Android Java Drawing Benchmark 02-Mar-2016 17.37 Test Frames FPS Display PNG Bitmap Twice 447 44.62 Plus 2 SweepGradient Circles 212 21.12 Plus 200 Random Small Circles 171 17.02 Plus 320 Long Lines 93 9.25 Plus 4000 Random Small Circles 32 3.13 Screen pixels 2048 Wide 1440 High Total Elapsed Time 50.4 seconds ####################### W1 REMIX ####################### R1 Intel Atom Z8300 quad core 1.84 GHz Android 6.0.1, HD Graphics Android Java Drawing Benchmark 14-Aug-2016 22.38 Test Frames FPS Display PNG Bitmap Twice 594 59.39 Plus 2 SweepGradient Circles 375 37.47 Plus 200 Random Small Circles 315 31.43 Plus 320 Long Lines 210 20.96 Plus 4000 Random Small Circles 66 6.57 Screen pixels 1920 Wide 1032 High Total Elapsed Time 50.1 seconds ############## W1 Windows 10 1280 x 720 ############## Intel Atom Z8300 quad core 1.44 GHz Turbo 1.84 Windows 10, Intel HD Graphics Gen8 Java Drawing Benchmark, Dec 27 2015, 21:51:45 Produced by javac 1.7.0_2 Test Frames FPS Display PNG Bitmap Twice Pass 1 872 87.13 Display PNG Bitmap Twice Pass 2 991 98.95 Plus 2 SweepGradient Circles 961 95.98 Plus 200 Random Small Circles 782 78.08 Plus 320 Long Lines 605 60.44 Plus 4000 Random Small Circles 164 16.32 Total Elapsed Time 60.1 seconds Operating System Windows 10, Arch. x86, Version 10.0 Java Vendor Oracle Corporation, Version 1.8.0_66 Intel64 Family 6 Model 76 Stepping 3, GenuineIntel, 4 CPUs ############## W2 Windows 10 1280 x 720 ############## Same Tablet as A5 Teclast X98 Plus, Intel Atom Z8300 1.44 GHz, Turbo 1.84 Windows 10, Intel HD Graphics Gen8 Java Drawing Benchmark, Mar 2 2016, 21:30:58 Produced by javac 1.7.0_2 Test Frames FPS Display PNG Bitmap Twice Pass 1 748 74.78 Display PNG Bitmap Twice Pass 2 833 83.24 Plus 2 SweepGradient Circles 828 82.78 Plus 200 Random Small Circles 690 68.99 Plus 320 Long Lines 560 55.94 Plus 4000 Random Small Circles 163 16.30 Total Elapsed Time 60.0 seconds Operating System Windows 10, Arch. x86, Version 10.0 Java Vendor Oracle Corporation, Version 1.8.0_66 Intel64 Family 6 Model 76 Stepping 3, GenuineIntel, 4 CPUs ############ W2 Windows 10 2048 x 1440 ############# Java Drawing Benchmark, Mar 3 2016, 12:22:42 Produced by javac 1.7.0_2 2048 x 1440 Test Frames FPS Display PNG Bitmap Twice Pass 1 275 27.42 Display PNG Bitmap Twice Pass 2 301 30.01 Plus 2 SweepGradient Circles 296 29.54 Plus 200 Random Small Circles 286 28.51 Plus 320 Long Lines 225 22.45 Plus 4000 Random Small Circles 118 11.72 Total Elapsed Time 60.3 seconds Operating System Windows 10, Arch. x86, Version 10.0 Java Vendor Oracle Corporation, Version 1.8.0_66 Intel64 Family 6 Model 76 Stepping 3, GenuineIntel, 4 CPUs ####################### PC REMIX ###################### R2 Core i7 4820K quad core + HT at 3900 MHz Turbo Android 6.0.1, GeForce GTX 650, 64-Bit Windows 10 Android Java Drawing Benchmark 14-Aug-2016 14.19 Test Frames FPS Display PNG Bitmap Twice 582 55.49 Plus 2 SweepGradient Circles 601 60.01 Plus 200 Random Small Circles 415 41.41 Plus 320 Long Lines 303 30.25 Plus 4000 Random Small Circles 43 4.20 Screen pixels 396 Wide 674 High Total Elapsed Time 50.8 seconds ################ PC REMIX Full Scrren ################# Android Java Drawing Benchmark 14-Aug-2016 14.21 Test Frames FPS Display PNG Bitmap Twice 553 55.21 Plus 2 SweepGradient Circles 539 53.86 Plus 200 Random Small Circles 330 32.91 Plus 320 Long Lines 212 21.19 Plus 4000 Random Small Circles 39 3.88 Screen pixels 1920 Wide 996 High Total Elapsed Time 50.2 seconds ########### PC Windows 10 GeForce GTX 650 ########### Core i7-4820K at 3.9 GHz Java Drawing Benchmark, Mar 7 2016, 10:56:24 Produced by javac 1.7.0_2 2048 x 1440 Test Frames FPS Display PNG Bitmap Twice Pass 1 5237 523.39 Display PNG Bitmap Twice Pass 2 5477 547.04 Plus 2 SweepGradient Circles 5484 548.07 Plus 200 Random Small Circles 5144 513.58 Plus 320 Long Lines 4736 473.32 Plus 4000 Random Small Circles 735 73.49 Total Elapsed Time 60.0 seconds Operating System Windows 10, Arch. x86, Version 10.0 Java Vendor Oracle Corporation, Version 1.8.0_60 Intel64 Family 6 Model 62 Stepping 4, GenuineIntel, 8 CPUs |
This program measures CPU MHz samples over 30 seconds, with 300 reports at 100 millisecond intervals (timing functions and overheads increase this time to 120 ms or above). The procedures are open a benchmark, open MHz program and run, switch in benchmark from recent screens and run, save benchmark results, switch in MHz program from recent screens and save results when finished. Further details and results can be found here. and here
Note - This program might not measure the CPU MHz that controls reductions in speed (throttling), introduced to reduce power consumption when temperature increases too much. No simple programming functions appear to be available for logging via a single app. Installing CPU Z might enable independent measurements to be noted. CPU Z can also provide CPU temperature measurement, with a range of values for a number of sensors on different systems. Research might be needed to find which are appropriate for CPU cores.
Below is an example, over the first 18 seconds, whilst running NEON-MFLOPS-MP benchmark (taking 14.6 seconds). In this case, MHz is fairly constant, but the frequency can vary a lot on other devices, or might run at a constant low value, if power saving is switched on.
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 15-Nov-2015 17.21 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 785 771 669 2862 2851 2739 2T 1485 1499 895 5654 5729 5606 4T 1937 2074 995 10862 11024 10636 8T 2678 3021 1012 9971 10730 10534 Total Elapsed Time 14.6 seconds Android CPU MHz 100 ms Sampling 15-Nov-2015 17:21:45 0.00 1300 0.12 1300 0.23 299 0.38 299 0.53 1300 0.67 1300 0.88 1300 1.01 1300 1.15 1300 1.29 1300 1.43 1300 1.57 1300 1.70 1300 1.83 1300 1.97 1300 2.11 1300 2.25 1300 2.39 1300 2.53 1300 2.67 1300 X 2.81 1300 2.95 1300 3.09 1300 3.22 1300 3.36 1300 3.49 1300 3.63 1300 3.76 1300 3.90 1300 4.04 1300 4.18 1235 4.30 1300 4.42 1300 4.59 1300 4.76 1300 4.91 1300 5.08 1300 5.23 819 5.40 1300 5.55 299 X 5.74 1300 5.91 1300 6.09 1300 6.25 1300 6.41 1300 6.59 1300 6.76 1300 6.92 1300 7.08 1300 7.24 1300 7.40 1300 7.52 1300 7.68 299 7.88 442 8.06 1300 8.24 1300 8.40 819 8.56 1300 8.71 1300 8.86 1300 X 9.01 1300 9.16 1300 9.32 1300 9.48 1300 9.64 1300 9.80 1300 9.97 1300 10.13 1300 10.27 1300 10.43 1300 10.57 1300 10.72 1300 10.88 1300 11.02 1300 11.17 1300 11.33 1300 11.47 1300 11.62 1300 11.78 1300 11.92 1300 X 12.07 1300 12.22 1300 12.37 1300 12.53 1300 12.68 1300 12.84 1300 12.99 1300 13.15 1300 13.30 1300 13.46 1300 13.61 1300 13.76 1300 13.92 1300 14.08 1300 14.24 1300 14.40 1300 14.56 1300 14.72 1300 14.88 1300 15.04 1300 X 15.20 1300 15.36 1300 15.52 1300 15.69 1300 15.86 1300 16.02 1300 16.21 1300 16.39 1300 16.55 1300 16.71 1300 16.85 1300 17.00 1300 17.14 1300 17.30 1300 17.45 1300 17.60 1300 17.75 1300 17.91 1300 18.05 1300 18.21 1300 |
The program runs the second most demanding OpenGL drawing benchmark test except CPU MHz is displayed, along with Frames Per Second (FPS) and running time in minutes, the MHz figure being the average of one measurement per frame. Default running arrangements are 60 passes of one second each, producing two columns of results that are displayed and saved on the Internal Drive. The CPU MHz is the average of samples taken once per frame. These results are to demonstrate any reductions as the battery capacity reduces. Before running, Display/Power Settings should be changed to never switch off and CPU to run at maximum speed, if possible.
Three buttons are provided where, besides the Run and usual Email option, to save results, there is a Time button, enabling manual input of the number of seconds for each pass. After rebooting, following a flat battery turning the device off, and after recharging, restarting the program reads and displays the saved results, ready for E-mailing. NOTE: some Android versions will not open a log file for saving results.
Following are results from a test set to run for 2 hours (60 x 120 seconds) and run twice. Displayed MHz, whilst the test was running, showed rapid variations that affected the final speed and FPS had a similar variation, but these were fairly constant over 4 hours.
Note - the later CPU Stress Tests might be more effective. Also, the CPU MHz app might not work on later systems.
T21 Quad Core Qualcomm Snapdragon 800, Android 4.4.3 GPU Qualcomm Adreno 330, 578 MHz Up to 60 120 second runs, MHz 1 sample/frame Log File /storage/emulated/0/BatteryTest.txt Android Battery Test 28-Jul-2015 11.08 28-Jul-2015 13.28 Run FPS MHz Run FPS MHz Run FPS MHz Run FPS MHz /b> 1 12.0 2100 2 12.1 1937 1 12.1 2014 2 12.2 2005 3 7.6 1874 4 12.1 1966 3 12.0 1975 4 12.0 2005 5 12.2 1993 6 12.2 1996 5 12.1 1962 6 12.1 1948 7 12.2 1996 8 12.2 1966 7 12.2 1979 8 12.2 2004 9 12.0 1935 10 12.3 1925 9 12.1 1959 10 12.1 2060 11 12.3 1983 12 12.0 2015 11 12.0 2017 12 12.2 1992 13 11.9 2013 14 12.1 2000 13 12.1 1987 14 12.2 1964 15 12.1 1934 16 12.1 2005 15 12.1 1973 16 12.1 1978 17 12.0 1948 18 12.0 2000 17 12.1 1998 18 12.1 1977 19 11.9 1979 20 12.0 1972 19 12.0 2007 20 12.0 1956 21 12.0 1997 22 12.0 1994 21 11.9 1966 22 12.1 1975 23 12.2 2035 24 12.1 2013 23 12.0 1978 24 12.0 2012 25 12.2 1981 26 12.1 1977 25 12.1 1988 26 12.1 2010 27 12.2 1976 28 12.2 1991 27 12.1 2004 28 12.0 1994 29 12.2 2000 30 12.3 1984 29 12.1 1989 30 12.2 2004 31 12.2 1986 32 12.3 1964 31 12.2 2009 32 12.1 1979 33 12.2 1955 34 12.1 1980 33 12.1 1945 34 12.0 1951 35 12.1 2002 36 12.2 2045 35 12.1 1997 36 12.1 2022 37 12.1 1993 38 12.2 2010 37 12.2 2038 38 12.1 2024 39 12.1 1947 40 12.1 1959 39 12.1 1997 40 12.1 2049 41 11.9 1949 42 12.0 1993 41 12.2 1996 42 12.1 1994 43 12.1 1953 44 12.2 2005 43 12.0 1978 44 12.0 1985 45 12.4 1928 46 12.3 1989 45 11.9 1947 46 12.2 1982 47 12.1 1987 48 12.1 1969 47 12.1 2022 48 11.8 1964 49 12.1 1992 50 12.1 1999 49 11.9 1985 50 12.1 1991 51 12.2 1929 52 12.1 1955 51 12.1 1988 52 12.0 2002 53 12.4 1950 54 12.3 1990 53 12.0 2009 54 11.9 2018 55 12.3 1930 56 12.2 1922 55 12.0 1994 56 12.0 1974 57 12.4 1952 58 12.1 1977 57 12.1 1950 58 12.1 2009 59 12.1 1986 60 12.2 1962 59 12.0 1976 60 12.0 2010 Total Elapsed Time 7614.9 seconds Total Elapsed Time 7202.9 seconds |
This is primarily intended for measuring performance of SD cards and internal drives, but can also be used to test USB drives. DriveSpeed carries out four tests.
Test 1 - Write and read three 8 and 16 MB; Results given in MBytes/second
Test 2 - Write 8 MB, read can be cached in RAM; Results given in MBytes/second
Test 3 - Random write and read 1 KB from 4 to 16 MB; Results are Average time in milliseconds
Test 4 - Write and read 200 files 4 KB to 16 KB; Results in MB/sec, msecs/file and delete seconds.
The first DriveSpeed benchmark has two run buttons, RunS for an SD card and RunI for the internal drive, the file path being identified by standard functions. The external SD test worked on earlier Android tablets but failed on later Android versions. RunS ran but provided distorted reading speeds by caching data in RAM. An extra button was added to prevent large files from being deleted and a read only option to measure uncached speeds after rebooting.
DriveSpd2 requires input of the file path to use and this might be identified using a file browser app. The file path can sometimes be selected for internal drives, SD cards and USB devices but there are complications associated with permissions and caching.
Running these benchmarks can require a lot of experimentation. Lots of paths, results and explanations are here and here. Following are example DriveSpd2 results from T22 ( Lenovo Tab 2 A8-50) testing an external SD card, T11 (Voyo A15) from a USB 3 flash drive and read only benchmark results.
Intel/Windows Versions - Results for Tablet W1 main drive are below, with USB 3 and SD card speeds
Here along with some
via Windows and Linux.
########################## T22 ######################### T22 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 Android DriveSpeed2 Benchmark 1.0 28-Aug-2015 12.56 Data Not Cached MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 3.7 3.7 3.6 20.3 20.6 20.4 16 2.6 3.7 3.7 20.5 20.5 20.5 Cached 8 52.4 107.8 13.2 228.8 226.3 226.7 Random Write Read From MB 4 8 16 4 8 16 msecs 4.65 4.91 18.23 0.01 0.01 0.66 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.07 0.18 0.49 2.16 3.79 6.51 msecs 59.14 44.59 33.61 1.90 2.16 2.52 2.099 Total Elapsed Time 85.4 seconds File Path Used - /storage/sdcard1/ Drive MB 15258 Free 14687 ########################## T11 ######################### T11 Samsung EXYNOS 5250 Dual 1.7 GHz Cortex-A15, Android DriveSpeed2 Benchmark 1.0 10-Dec-2013 12.52 Data Not Cached MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 40.9 46.6 46.2 100.7 95.9 71.4 16 45.2 51.9 51.1 98.8 70.7 66.2 Cached 8 150.4 127.7 50.9 687.6 688.7 709.2 Random Write Read From MB 4 8 16 4 8 16 msecs 0.91 0.90 0.82 0.01 0.01 0.02 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 0.56 1.18 1.85 4.20 13.33 34.79 msecs 7.29 6.96 8.88 0.98 0.61 0.47 0.149 Total Elapsed Time 24.8 seconds File Path Used - /mnt/udisk/ Drive MB 30517 Free 30466 ###################### Read Only ####################### Android DriveSpeed Benchmark Internal Drive Read Only MBytes/Second Device Write1 Write2 Write3 Read1 Read2 Read3 T7 0.0 0.0 0.0 41.7 42.8 39.0 T11 0.0 0.0 0.0 53.7 53.5 53.9 T21 0.0 0.0 0.0 102.9 104.0 103.6 T22 0.0 0.0 0.0 127.7 145.7 139.9 A1 0.0 0.0 0.0 155.7 128.6 156.2 ################## W1 DriveSpeed32.exe W1 Windows 10 ################# Current Directory Path: C:\Test Total MB 58722, Free MB 45286, Used MB 13436 Windows Storage Speed Test 32-Bit Version 1.2, Mon Jan 04 16:09:25 2016 Copyright (C) Roy Longbottom 2011 8 MB File 1 2 3 4 5 Writing MB/sec 100.68 101.04 110.81 105.04 113.32 Reading MB/sec 154.58 155.78 132.18 153.97 153.86 16 MB File 1 2 3 4 5 Writing MB/sec 115.96 117.50 118.53 113.16 116.46 Reading MB/sec 150.29 155.47 156.13 150.62 157.92 32 MB File 1 2 3 4 5 Writing MB/sec 118.84 118.26 123.01 123.42 125.39 Reading MB/sec 146.70 153.65 146.41 148.77 155.54 --------------------------------------------------------------------- 8 MB Cached File 1 2 3 4 5 Writing MB/sec 176.10 292.34 462.14 201.19 452.46 Reading MB/sec 599.06 830.94 992.19 878.99 1033.57 --------------------------------------------------------------------- Bus Speed Block KB 64 128 256 512 1024 Reading MB/sec 101.09 107.71 123.43 139.70 136.62 --------------------------------------------------------------------- 1 KB Blocks File MB > 2 4 8 16 32 64 128 Random Read msecs 0.22 0.18 0.18 0.18 0.18 0.19 0.19 Random Write msecs 0.13 0.13 0.13 0.13 0.14 0.19 0.21 --------------------------------------------------------------------- 500 Files Write Read Delete File KB MB/sec ms/File MB/sec ms/File Seconds 2 0.56 3.68 3.00 0.68 0.629 4 0.84 4.85 6.79 0.60 0.541 8 1.92 4.27 13.34 0.61 0.502 16 1.01 16.17 22.14 0.74 0.528 32 1.95 16.81 38.21 0.86 0.527 64 3.75 17.50 59.57 1.10 0.490 End of test Mon Jan 04 16:10:53 2016 |
Reliability/Stress tests were run using the ARM CPUs on various Raspberry Pi Systems, including 32 bit and 64 bit Operating Systems. Besides attempting to identify any false calculations or system crashes, a main purpose was to demonstrate performance reductions as the CPUs became overheated and identify processor clock throttling. This was aided by the availability of programmable functions that measure CPU MHz and temperature. The Raspberry Pi tests exercised multiple processor cores by running a number copies of the same programs via script files.
Running multiple copies of the same program does not appear to be possible using Android. So, multithreaded versions were produced, one using floating point calculations and the other integers. Earlier Android CPU benchmarks did not display results until the end of executing all tests. With long running stress tests, it is desirable to display running time and performance on an on-going basis. In this case, unreported calibration phases attempt to set run time parameters that lead to initial reportable test periods of around 10 seconds. This can be longer, if the initial pass takes more than 10 seconds, such as when other programs are running at the same time (as in the screen shot below).
Besides the CPU slowing down due to heating effects, the mobile devices, of course, run slower as the battery becomes discharged. In event of the latter, or CPU MHz throttling cannot avoid overheating, the CPU should turn off automatically (OR WORSE! - WATCH IT). It is recommended that stress testing is limited to one or a number of 15 minute sessions, to allow results to be saved and judgments made whether to continue.
Apparently running CPU MHz Benchmark and Raspberry Pi Stress Tests, functions required to obtain effective CPU MHz, can vary. This also applies to the measurement of CPU temperature. Hence, there can be no simple program to monitor these. In some cases, manual measurements can be noted after installing CPU-Z from Google Play. One difficulty there, is that a number of temperature measurements might be provided, without indications of the location.
The screenshot, below, of both stress tests, was from P37, a Moto G phone running Android 7. This has the option to run two programs at the same time, via a split screen. Besides performance, note the displayed sumchecks. An indication is given if data is not of the expected value.
The source code and project files are included in
Android Intel-ARM Benchmarks.zip.
![]() |
Buttons RunB - Run Benchmark - Runs most combinations of number of threads, data sizes and calculations per data word for the FPU tests. This is mainly to help to decide which options to use for stress testing. The benchmark runs using fixed parameters, carrying out exactly the same number of calculations using all thread combinations and data sizes. The pass count changes according to the number of calculations per word, for the FPU tests. RunS - Run Stress Tests - Default running time is 15 minutes, with the middle data size, intended for containment in L2 cache, using 8 threads. and 32 operations per word in the FPU tests. SetS - Specify run time parameters for stress test - These are 1, 2, 4, 8, 16 or 32 threads, 2, 8 or 32 Operations per word for FPU tests, 12.8 or 16 KB, 128 or 160 KB, 12.8 or 16 MB for FPU or Integer tests, and running time in minutes. Info - Test description and details - The is essentially the same as details provided here. Save - This offers details of the results and identified CPU hardware and Operating System for E-mail. Default addressee is the program author via results@roylongbottom.org.uk but this can be changed or additional addresses added.
Timing
On benchmarking running time of each pass is provided, reducing, where appropriate, on doubling the thread count. Cumulative running time is provided for the stress tests, demonstrating the number of passes carried out in the specified running time. This increases as the CPU slows down due to heating effects or a discharged battery. |
Benchmark - This is essentially the same program as used for the MP-MFLOPS Benchmark which, besides carrying out calculations with 2 and 32 floating point operations per data word, includes a further function with 8 operations. As a reminder, the benchmark runs using fixed parameters, carrying out exactly the same number of calculations using 1, 2, 4 and 8 threads. Note the sumchecks of numeric results of calculations, where every word is checked for identical values and results of zero are reported if any are incorrect. The number of calculations, and associated sumchecks, vary using different memory sizes and varying speeds of operation of caches and RAM.
Stress Test - As indicated earlier, the stress test runs multiple times, using the same run time parameters for number of threads, data size, floating point operations per data word and operations per pass, for the specified number of minutes. Then, the number of repeat passes can be fewer if CPU MHz is reduced. The calculated sumchecks should be identical for all threads. In the event of any comparison failures, the reported sumcheck is shown as zero.
Below are results from one minute stress tests using 16 and 32 threads, demonstrating similar throughput of around 6 GFLOPS. This is followed by details from 15 minute runs on various systems using 8 threads, including the same T22 system, that still produced a consistent performance of 6 around GFLOPS. All tests were carried out with fully charged batteries and power connected.
The table demonstrates a wide variation in the number of passes carried out in 15 minutes, where some are influenced by the calibration calculations for 10 seconds test duration, in this case the first pass shown as taking between 9.8 and 11.5 seconds. Besides speed reductions due to heating effects, or little change at the end, there can be short term reductions due to other system activity (worst case like downloading and installing updates).
P37 produced the highest performance degradation for initial tests, at 43%. The next three had similar beginning and end performance, with the occasional short term hiccup. The first T21 session produced slightly slower speed at the end. Repeating this shortly afterwards produced a 12% degradation. Kindle3 was run with the tablet in direct sunlight, with surrounding air around 30°C. This led to a 57% performance degradation. The last set of results were somewhat inconsistent over the whole period.
Benchmark Mode Results ARM/Intel MP-FPU Stress Test V1.0 30-May-2017 19.39 Compiled for 32 bit ARM v7a MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 8.6 T1 2 228 227 220 40392 76406 99700 4.4 T2 2 451 449 434 40392 76406 99700 2.4 T4 2 882 882 736 40392 76406 99700 2.0 T8 2 1182 1250 758 40392 76406 99700 16.3 T1 8 477 477 466 54760 85092 99819 8.2 T2 8 951 949 925 54760 85092 99819 4.2 T4 8 1856 1879 1830 54760 85092 99819 2.8 T8 8 2738 2941 2744 54760 85092 99819 38.1 T1 32 811 813 801 35218 66014 99520 19.1 T2 32 1625 1621 1605 35218 66014 99520 9.7 T4 32 3190 3222 3186 35218 66014 99520 6.1 T8 32 4909 5179 5135 35218 66014 99520 End Time 30-May-2017 19.41 Stress Test 16 Threads ARM/Intel MP-FPU Stress Test V1.0 01-Jun-2017 11.43 Compiled for 64 bit ARM v8a Data Ops/ Nmeric Seconds Size Threads Word MFLOPS Results 11.9 128 KB 16 32 6058 35951 22.1 128 KB 16 32 6012 35951 32.8 128 KB 16 32 5717 35951 43.1 128 KB 16 32 5988 35951 53.3 128 KB 16 32 5991 35951 63.6 128 KB 16 32 5962 35951 End Time 01-Jun-2017 11.46 Stress Test 32 Threads ARM/Intel MP-FPU Stress Test V1.0 01-Jun-2017 11.40 Compiled for 64 bit ARM v8a Data Ops/ Nmeric Seconds Size Threads Word MFLOPS Results 11.8 128 KB 32 32 6087 35951 22.0 128 KB 32 32 6040 35951 32.2 128 KB 32 32 6001 35951 42.4 128 KB 32 32 6020 35951 52.7 128 KB 32 32 6001 35951 63.1 128 KB 32 32 5897 35951 End Time 01-Jun-2017 11.43 Various Systems, all 8 Threads, 32 Ops/word, 128 KB, 15 Minutes System P37 T22 A1 A5 T21 T21 T21 T11 Device moto Leno Asus Tec Kindl1 Kindl2 Kindl3 Voyo CPU A53 A53 Atom Atom QC800 QC800 QC800 A15 Cores 8 4 4 4 4 4 4 2 GHz 1.5+1.2 1.3 1.86 1.44 2.2 2.2 2.2 2 Test Secs Start 11.5 10.2 10.0 9.8 10.5 10.5 10.6 10.4 End 17.0 10.3 10.0 9.1 11.1 11.7 23.8 12.2 Pass -------------------------- MFLOPS -------------------------- 1 5435 6025 4131 3329 4766 4853 4810 2758 2 5451 5937 4110 3183 4856 4876 4826 2226 3 5451 6005 4114 3097 4886 4886 4846 2937 4 5349 5919 4107 3168 4889 4882 4729 3045 5 5396 5995 4137 3138 4863 4897 4833 3052 6 5332 5997 4117 3154 4895 4766 4712 3032 7 5334 5985 4103 3161 4877 4690 4717 3023 8 5431 6009 4097 3214 4889 4610 4864 3056 9 5195 5977 4099 3193 4894 4609 4873 2726 10 5415 5879 4144 3153 4898 4574 4876 3033 11 5278 5994 4087 3149 4805 4610 4891 2592 12 5315 5989 4109 3140 4835 4592 4878 3046 13 5311 5977 4136 3151 4862 4617 4874 1617 14 5142 5991 4106 3173 4890 4557 4856 2216 15 5069 5964 4138 3113 4890 4569 4894 2719 16 5017 4618 4101 3118 4899 4546 4805 3037 17 5102 5879 4128 3161 4869 4553 4604 2729 18 5073 5945 4098 3135 4871 4520 4733 2727 19 5064 5963 4144 3170 4869 4533 4652 2973 20 5104 5976 4131 3139 4885 4558 4605 2672 21 4625 5824 4139 3152 4882 4512 4594 2699 22 4558 5984 4145 3106 4892 4547 4559 2924 23 4572 5934 4164 3128 4870 4508 4535 2739 24 4701 5968 4128 3132 4860 4497 4524 2626 25 4674 5975 4083 3121 4870 4550 4488 2987 26 4298 5979 4079 3139 4734 4525 4443 2675 27 4384 5963 4124 3034 4697 4485 4413 2623 28 4343 5981 4106 3118 4781 4483 4416 2928 29 4442 5965 4135 3180 4866 4514 4441 2692 30 4147 5974 4141 3130 4817 4492 4436 2619 31 4246 5998 4099 3032 4837 4505 4422 2744 32 4530 6008 4046 3393 4872 4469 3390 2892 33 3903 5951 4120 3380 4876 4488 4259 2615 34 3979 5990 4098 3350 4864 4519 3228 2617 35 4639 5973 4120 3388 4858 4488 3572 2833 36 3934 5953 4107 3364 4889 4499 3408 2801 37 4021 5921 4118 3372 4842 4474 3150 2579 38 3872 5983 4138 3401 4855 4515 3377 2624 39 4002 5925 4109 3397 4853 4464 2772 2613 40 4212 5996 4141 3384 4832 4474 2996 2838 41 3997 5970 4109 3397 4854 4460 2892 2800 42 3998 5986 4084 3397 4856 4446 2686 2645 43 3878 5992 4116 3302 4878 4432 2691 2523 44 3907 5965 4150 3400 4854 4485 2695 2589 45 3955 5922 4113 3402 4818 4429 2696 2840 46 3795 5944 4132 3368 4862 4475 2702 2765 47 3843 5938 4098 3359 4786 4432 2690 2652 48 3799 5979 4118 3379 4817 4464 2690 2492 49 3532 5947 4125 3374 4876 4438 2202 2619 50 3115 5986 4121 3375 4804 4435 2162 2798 51 3962 5979 3728 3399 4840 4435 2162 2694 52 3922 5980 4084 3401 4697 4404 2165 2520 53 3822 5977 4120 3383 4776 4448 2148 2621 54 3669 5967 4067 3364 4732 4383 2113 2607 55 3777 5991 4141 3389 4673 4444 2126 2702 56 3591 5964 4111 3390 4739 4423 2170 2830 57 3660 5992 4113 3372 4700 4428 2137 2627 58 3883 5966 4115 3397 4684 4445 2163 2510 59 3727 5972 4114 3395 4723 4436 2158 2522 60 3710 6002 4105 3209 4700 4356 2152 2792 61 3951 6009 4046 2951 4722 4408 2699 62 3628 5807 4082 3109 4745 4381 2546 63 3572 5929 4124 3069 4728 4390 2527 64 3743 5963 4113 3144 4714 4442 2522 65 5954 4133 3145 4699 4405 2785 66 5949 4142 3074 4688 4360 2698 67 5964 4112 3087 4688 4374 2532 68 5903 4107 3152 4685 4334 2468 69 5956 4088 3037 4527 4370 2630 70 5962 4136 3146 4664 4399 2793 71 5981 4146 3158 4658 4407 2598 72 5985 4107 3119 4647 4382 2508 73 5937 4086 3134 4618 4372 2512 74 5944 4130 3162 4658 4387 2504 75 5965 4086 3143 4602 4395 2787 76 5971 4153 3163 4652 4382 2607 77 5987 4130 3155 4588 4383 2547 78 5957 4150 3145 4581 4391 2512 79 5920 4137 3128 4596 4381 80 5984 4109 3141 4631 81 5989 4146 3121 4623 82 5959 4120 3174 4609 83 5957 4140 3184 4533 84 5982 4102 3143 4634 85 5902 4111 3171 86 5954 3787 3144 87 6000 4097 3167 88 4101 3121 89 4084 3162 90 4155 3141 91 3027 92 3336 93 3397 94 3365 Average 4403 5948 4108 3217 4779 4498 3715 2690 Maximum 5451 6025 4164 3402 4899 4897 4894 3056 Minimum 3115 4618 3728 2951 4527 4334 2113 1617 |
This test writes data, comprising two data patterns out of 24 variations (such as binary 0000. 0101, 0011, 1111) then reads it via alternate additions and subtractions. This leaves the original data unchanged, which is checked for correctness and any errors reported. As with the Floating Point Stress Test, buttons are provided to run a quick benchmark or long running stress test and one to set parameters for the latter. Performance is measured in MB/second.
Benchmark - Below is an example of results, the program using all thread and data size combinations, and the first 6 data patterns. Note fastest speeds are with all threads using different sections of 160 KB.
Stress Test - Following benchmark output are some stress test results, all at the default parameter settings and mainly with the systems connected to the power source. As with MP-FPU-Stress.apk, the number of passes in 15 minutes varies, depending on the initial calibrated time and whether speed is changed due to the CPU clock speed reducing at higher temperatures.
Results include some with the devices running without the power supply connected. One (T21), showed similar performance between battery and power supply driven, but the battery was probably fully charged.
Benchmark Mode Results ARM/Intel MP-Int Stress Test V1.0 21-Jun-2017 16.50 Compiled for 32 bit ARM v7a MB/second KB KB MB Same All Secs Thrds 16 160 16 Sumcheck Tests 9.1 1 2970 2855 2336 00000000 Yes 4.7 2 5770 5605 4523 FFFFFFFF Yes 3.0 4 10876 10907 5534 5A5A5A5A Yes 2.4 8 14361 16162 6156 AAAAAAAA Yes 2.3 16 16522 18100 6091 CCCCCCCC Yes 2.3 32 15948 17827 6187 0F0F0F0F Yes End Time 21-Jun-2017 18.41 Various Systems, 8 Threads, 160 KB, 15 Minutes System P37 T22 T11 T11 A1 A5 T21 T21 Device moto Leno Voyo Voyo Asus Tec Kindl2 Kindl2 CPU A53 A53 A15 A15 Atom Atom QC800 QC800 Cores 8 4 2 2 4 4 4 4 GHz 1.5+1.2 1.3 2 2 1.86 1.44 2.2 2.2 Test Secs start 9.5 10.1 9.7 8.5 9.7 7.2 9.4 9.0 end 13.5 9.7 14.2 12.9 9.0 7.1 10.3 9.9 Pass Battery Battery 1 20037 16331 12745 11029 25433 22184 14100 9778 2 19149 16111 12102 10888 26589 21509 14361 14046 3 19451 16127 10349 10629 26185 20577 14570 13577 4 19308 16073 10938 8464 26727 18492 14433 14111 5 19386 16308 10988 9600 26541 18574 14449 12458 6 19714 16511 10713 6264 26841 19075 14468 12866 7 19376 16283 10186 6286 26982 18206 14468 14298 8 19327 16110 9845 6453 26761 18080 14488 13913 9 19224 16036 9792 6239 26804 16131 14101 14174 10 20331 16409 10116 6267 26563 15385 14097 13272 11 19945 16324 10797 6302 26765 14799 13961 12987 12 19101 15923 9830 6348 26946 12244 13875 13444 13 19478 16066 9630 6469 26928 17727 13381 14341 14 19482 16472 9043 6358 26708 16036 13173 14083 15 18492 16146 9873 6441 26985 12831 13121 14339 16 18664 15971 10445 6272 26678 18164 13071 13381 17 18476 16296 9875 6337 26818 18394 13017 13738 18 16615 16371 8887 6402 27028 18204 13033 13732 19 15829 16419 9170 6441 27069 18688 13078 13842 20 16755 16205 9185 6399 26640 18542 13067 13650 21 14564 16059 10952 6297 26796 18379 12936 13692 22 16996 15787 9597 6418 26967 18645 12896 13573 23 14891 16051 9359 6201 26830 19071 12966 13614 24 17154 16219 9178 6244 26765 18641 12759 13540 25 14580 15907 8707 6373 26817 18589 12908 13484 26 17185 15995 10194 6327 26875 18765 12891 13518 27 14063 15978 9824 6362 26716 17421 12781 13392 28 15158 16004 8697 6422 26725 18432 12771 13301 29 14347 16341 8705 6292 26909 16909 12779 13060 30 13116 16060 8774 6420 26854 15801 12689 13445 31 13267 16475 10325 6281 26888 13700 12768 13196 32 13814 16123 9327 6411 26948 20494 12785 13248 33 14348 16107 8643 6313 26960 20499 12723 13073 34 12555 16150 8445 6334 25360 21228 12794 12926 35 12579 16043 8702 6450 26332 21266 12613 12942 36 14506 16026 9960 5991 26047 21142 12613 13107 37 14338 16309 9510 6435 26233 20585 12594 13225 38 12474 16409 8837 6389 26708 23052 12523 13167 39 14030 15855 8564 6435 26985 23206 12551 13142 40 14399 16140 8594 6322 26869 23224 12503 13108 41 13122 15976 8876 6174 26730 23089 12447 13092 42 12340 16181 10637 6233 26445 23183 12443 13055 43 12880 16184 9317 6376 26554 23255 12555 12914 44 12454 16184 8423 6423 26970 23393 12473 13159 45 12220 16341 8614 6447 26637 23267 12408 12927 46 11486 16351 8183 6391 27022 23291 12423 13078 47 13306 16196 10820 6327 26650 23321 12383 12895 48 13629 16254 8968 6312 26702 23192 12234 13033 49 11897 16351 8463 6286 26558 23306 12353 12884 50 14640 16069 8036 6364 26757 23229 12344 12942 51 11354 16054 10331 6440 26781 23166 12320 12918 52 13217 16080 9005 6360 26759 23270 12261 12868 53 12672 15856 8373 6425 26863 23259 12216 12919 54 11752 16150 8603 6441 27063 23352 12255 12808 55 12783 16147 8228 6384 26641 23302 12283 12923 56 12984 16340 9258 6424 26832 23249 12201 12838 57 11459 16434 9690 6247 26491 23395 12228 12829 58 13042 16378 8510 6461 26706 22927 12265 12856 59 11289 16019 8259 6372 26866 23063 12241 12821 60 14140 16443 8089 6360 26919 23335 12212 12743 61 11527 16392 9949 6249 26639 23257 12227 12683 62 12224 16332 9177 6332 26721 23148 12081 12581 63 11942 16209 8383 6430 26806 23201 12302 12732 64 11836 15867 8339 6366 26724 23254 12210 12706 65 12680 16161 8215 6435 26560 23295 12143 12740 66 11171 16228 8719 6338 26823 23278 12166 12731 67 13207 16193 9821 6422 27002 23175 12107 12792 68 11382 16320 8978 6410 26768 23358 12077 12673 69 11365 16309 8073 6353 26942 23214 12135 12575 70 13366 16169 7896 6426 26681 22885 12111 12817 71 10909 16759 9942 6360 26608 23360 12208 12645 72 13351 15734 9088 6300 26883 23238 12098 12650 73 16595 8319 26708 23302 12652 13016 74 16291 26859 23308 12639 13126 75 15874 27044 23292 12656 13135 76 15990 26904 23344 12775 13155 77 16167 26865 23252 12664 13198 78 16320 26734 23181 12554 13108 79 16416 26781 23351 12423 12995 80 15805 27054 23130 12554 12889 81 16097 26781 23135 12464 13011 82 16405 26911 23331 12538 12960 83 16120 26803 19419 12370 13119 84 16338 26739 20700 12478 13037 85 16211 26756 21231 11963 12552 86 16311 26798 21087 12061 12426 87 16082 26798 21321 12123 12465 88 16141 26969 20290 11974 12387 89 16125 26974 21290 11936 12448 90 15928 26737 20739 11999 12499 91 16068 26703 21205 11932 12403 92 16321 26724 21281 12033 12422 93 16329 26595 21068 11915 12455 94 16398 27019 20730 12305 95 26742 21275 12404 96 26705 21495 12317 97 27052 20938 98 26899 21087 99 26613 21154 100 26466 21066 101 26338 20967 102 26626 19663 103 20117 104 20996 105 21445 106 20935 107 20882 108 21328 109 21347 110 20954 111 21175 112 21107 113 20924 114 20812 115 21636 116 21501 117 21184 118 21303 119 21259 120 21161 121 21398 122 20517 123 20803 124 21484 125 21334 126 20291 Average 14780 16186 9356 6615 26739 20978 12713 13046 Maximum 20331 16759 12745 11029 27069 23395 14570 14341 Minimum 10909 15734 7896 5991 25360 12244 11915 9778 End 11365 16398 8319 6300 26626 20291 11915 12317 End Pass Seconds 13.5 9.7 14.2 12.9 9.0 7.1 10.3 9.9 Passes 72 94 73 72 102 126 93 96 |
The following series of tests comprised running both the floating point and integer stress tests at the same time, both using the default parameters with 8 threads. All tests were run using battery power. Results provided are for the first three test runs and the last three of the overall 15 minutes. Note that the two test programs had different running times.
The first (P37) has 8 cores. On starting, each of the two stress tests, as might be expected, initially running at around half speed. After 15 minutes, both produced similar performance degradations and essentially the same as single system tests, using power supplies. Following a slight delay, the second tests started running at slightly decreased temperatures and faster speed, but produced slower end speeds. Run 3 started in a similar manner, then went haywire, with FPU tests running at a crawl and the other speeding up. The fourth test runs fitted the normal pattern, each ending with performance equivalent to a quarter of the maximum of that running a single program.
The second system (T21) has a quad core CPU and produced fairly consistent performance over this particular hour of testing.
Next log details are provided to demonstrate that a device can handle 64 threads, using 32 from each of the stress tests. In this case (with P37), performance over 5 minutes was similar to that at the start of the 8 thread test, using both apps.
P37 Octa-core Cortex-A53 T21 Quad Core Snapdragon 800 Secs MB/sec %max Secs MFLOPS %max Secs MB/sec %max Secs MFLOPS %max Max 1 program 20037 5435 14570 4899 Run 1 Start 17 10441 52 11 2790 51 9 7192 49 21 2702 55 16 10677 11 2819 8 7313 20 2482 17 10163 11 2862 8 6848 21 2517 End 22 7703 15 2018 9 6718 25 2482 25 7030 16 1886 9 6419 24 2517 25 6913 35 16 1899 35 9 6564 45 24 2479 51 Run 2 Start 18 8713 43 16 1969 36 10 6414 44 24 2140 44 20 7848 16 1964 10 6669 23 2268 20 7711 16 1949 10 6312 24 2179 End 26 5966 20 1529 10 6513 27 1883 27 5865 20 1522 10 6299 25 2092 27 5733 29 20 1569 29 12 5339 37 27 1900 39 Run 3 Start 21 6957 35 18 1746 32 10 6619 45 23 2247 46 23 6445 20 1553 10 6609 24 2120 24 6135 18 1680 10 6888 24 2168 End 17 8548 53 581 10 6738 26 1996 17 8542 84 367 12 5353 28 1849 18 8414 42 74 413 8 12 5519 38 28 1811 37 Run 4 Start 22 6275 31 12 1738 32 10 6880 47 22 2341 48 24 5941 12 1699 10 6821 23 2183 25 5532 14 1437 10 6563 24 2117 End 26 5309 15 1396 12 5357 28 1834 26 5277 15 1370 12 5629 28 1845 27 5081 25 16 1276 23 12 5572 38 28 1853 38 P37 Both Stress Tests 32 Threads Each ARM/Intel MP-Int Stress Test V1.0 ARM/Intel MP-FPU Stress Test V1.0 25-Jul-2017 10.53 25-Jul-2017 10.54 Compiled for 32 bit ARM v7a Compiled for 32 bit ARM v7a Data Same All Data Ops/ Numeric Secs KB Threads MB/sec Sumcheck Threads Secs KB Threads Word MFLOPS Results 17 160 32 11626 00000000 Yes 15 128 32 32 2771 42157 35 160 32 10456 00000000 Yes 28 128 32 32 2492 42157 52 160 32 10743 00000000 Yes 38 128 32 32 2845 42157 71 160 32 10198 00000000 Yes 50 128 32 32 2758 42157 89 160 32 10534 00000000 Yes 61 128 32 32 2677 42157 106 160 32 10752 00000000 Yes 72 128 32 32 2926 42157 125 160 32 10168 FFFFFFFF Yes 84 128 32 32 2549 42157 142 160 32 11094 FFFFFFFF Yes 94 128 32 32 3017 42157 160 160 32 10389 FFFFFFFF Yes 104 128 32 32 2881 42157 178 160 32 10408 FFFFFFFF Yes 117 128 32 32 2474 42157 195 160 32 11203 FFFFFFFF Yes 127 128 32 32 2920 42157 214 160 32 9938 FFFFFFFF Yes 139 128 32 32 2712 42157 230 160 32 11622 5A5A5A5A Yes 150 128 32 32 2826 42157 249 160 32 9857 5A5A5A5A Yes 161 128 32 32 2816 42157 267 160 32 10381 5A5A5A5A Yes 173 128 32 32 2496 42157 285 160 32 10317 5A5A5A5A Yes 183 128 32 32 3067 42157 305 160 32 9808 5A5A5A5A Yes 194 128 32 32 2779 42157 207 128 32 32 2410 42157 End Time 25-Jul-2017 11.00 217 128 32 32 3020 42157 228 128 32 32 2743 42157 Average 10537 240 128 32 32 2474 42157 250 128 32 32 3147 42157 Started a little earlier 263 128 32 32 2423 42157 273 128 32 32 2927 42157 284 128 32 32 2876 42157 297 128 32 32 2422 42157 305 128 32 32 3758 42157 End Time 25-Jul-2017 11.00 Average 2786 Ended slightly later |
Running the stress tests did not reveal any real data comparison failures, although one did appear to occur before a flat battery lead to a switch off. Also, there were a couple of inexplicable program crashes, where, of course, the recorded results are lost. However, there is an issue regarding false error reports.
All of my Android CPU benchmarks arrange for starting, stopping and displaying results via Java code, with those executing native machine code produced from compiled C. The original benchmarks only display results when all processing is finished and do not appear to demonstrate the peculiar behaviour of the stress tests.
When running these stress tests, rotating the device leads to the initial starting display to be produced. Then, after pressing the Run button, errors, as shown below, are indicated. Before this, running VMSTAT, via a Terminal Emulator app, indicates that processing had not stopped executing the benchmark code. Hence, it seems that two copies of the program were running at the same time, confusing reported results. The same effect is reproduced by pressing the Run button whilst the program is executing.
AVOID RUNNING WHEN THE PHONE/TABLET IS HAND HELD
The following are examples of false errors, when it was known that tests had been restarted after a stoppage caused by rotating the device. For a clean restart, normally the offending program can be killed by tapping the “RECENTS” (square) button and swiping the app off the screen. then restarted via the main display. However, one tablet (T21) had to be removed via the Settings, App, Force Stop button.
ARM/Intel MP-FPU Stress Test V1.0 26-Jun-2017 11.25 Data Ops/ Nmeric Seconds Size Threads Word MFLOPS Results 32.2 128 KB 8 32 1410 0 Zero indicates eorrors found 54.1 128 KB 8 32 1172 0 Time/pass much greater than 10 seconds 83.4 128 KB 8 32 1195 0 121.9 128 KB 8 32 1385 0 135.4 128 KB 8 32 1507 49805 Unexpected result 153.8 128 KB 8 32 1367 0 189.7 128 KB 8 32 1159 0 204.5 128 KB 8 32 693 0 Shorter time but worse performance 323.5 128 KB 8 32 692 0 350.8 128 KB 8 3234222848 99999 Measured test time near zero in 27 seconds, 99999 reflects initial data 410.1 128 KB 8 32 1005 0 433.9 128 KB 8 32 1255 66014 Expected result ARM/Intel MP-Int Stress Test V1.0 26-Jun-2017 11.25 Data Same All Seconds Size Threads MB/sec Sumcheck Threads 8.7 160 KB 8 4568 00000000 Yes Test seconds as expected around 10 25.3 160 KB 8 6375 00000000 Yes to 11 seconds 32.4 160 KB 8 3451 00000000 Yes Yes means all threads correct result 39.5 160 KB 8 3492 00000000 Yes 46.6 160 KB 8 1951607840 00000000 Yes Impossible MB/sec suggests test did 79.5 160 KB 8 4728 FFFFFFFF Yes not run 98.0 160 KB 8 5205 FFFFFFFF Yes 114.6 160 KB 8 5760 FFFFFFFF Yes Should be six times 00000000 134.4 160 KB 8 4797 5A5A5A5A Yes then six times FFFFFFFF 158.9 160 KB 8 3537 5A5A5A5A Yes then six times 5A5A5A5A 174.1 160 KB 8 4891 5A5A5A5A Yes etc. 237.4 160 KB 8 1951607840 5A5A5A5A No 1 Impossible MB/sec, 1 thread wrong 374.6 160 KB 8 279600040 CCCCCCCC No 8 Impossible MB/sec, 8 threads wrong 397.3 160 KB 8 3204 CCCCCCCC Yes 415.7 160 KB 8 6304 0F0F0F0F No 4 4 threads wrong, were they CCCCCCCC 420.5 160 KB 8 1951607840 CCCCCCCC Yes |
T7 Nexus 7 quad core CPU 1.3, GHz 1.2 GHz > 1 core Device Asus Nexus 7 RAM 1 GB DDR3L-1333 Bandwidth 5.3 GB/sec Screen pixels w x h 1280 x 736 MHz Twelve-core Nvidia GeForce ULP graphics 416 MHz Android Build Version 4.1.2 Processor : ARMv7 Processor rev 9 (v7l) processor : 0 BogoMIPS : 1993.93 processor : 1 BogoMIPS : 1993.93 processor : 2 BogoMIPS : 1993.93 processor : 3 BogoMIPS : 1993.93 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 - Cortex-A9 CPU revision : 9 Hardware : grouper - nVidia Tegra 3 T30L Revision : 0000 Linux version 3.1.10 Runs at 1.2 GHz T11 Voyo A15, Samsung EXYNOS 5250 Dual core 2.0 GHz Cortex-A15, Device Urbetter VOYO A15 Mali-T604 GPU, 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/s Screen pixels w x h 1920 x 1032 Android Build Version 4.2.2 - Jelly Bean Processor : ARMv7 Processor rev 4 (v7l) processor : 0 BogoMIPS : 992.87 processor : 1 BogoMIPS : 997.78 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc0f CPU revision : 4 Hardware : SMDK5250 Linux version 3.4.35Ut Runs at 1.7 GHz T15 HTC Nexus 9, dual core Denver CPU 2400 MHz Screen pixels w x h 2048 x 1440 Android Build Version 5.0.1 Processor : NVIDIA Denver 1.0 rev 0 (aarch64) processor : 0 & 1 Features : fp asimd aes pmull sha1 sha2 crc32 CPU implementer : 0x4e CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0x000 CPU revision : 0 Hardware : Flounder Revision : 0000 MTS version : 33410787 Linux version 3.10.40 T21 Kindle Fire HDX 7, 2.2 GHz Quad Core Qualcomm Snapdragon 800 (Krait 400) 2 x 32 Bit LPDDR3-1866 Memory, 14.9 GB/s, GPU Qualcomm Adreno 330, 578 MHz Device Amazon KFTHWI Screen pixels w x h 1200 x 1803 Android Build Version 4.4.3 Processor : ARMv7 Processor rev 0 (v7l) processor : 0, 1, 2, 3 BogoMIPS : 38.40 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt CPU implementer : 0x51 CPU architecture: 7 CPU variant : 0x2 CPU part : 0x06f CPU revision : 0 Hardware : Qualcomm MSM8974 Revision : 0000 Linux version 3.4.0-perf (gcc version 4.7) T22 Lenovo Tab 2 A8-50, 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 1 GB LPDDR3, GPU Mali T720 MP2 Device LENOVO Lenovo TAB 2 A8-50F Screen pixels w x h 800 x 1216 Android Build Version 5.0.2 Processor : AArch64 Processor rev 3 (aarch64) processor : 0, 1, 2 BogoMIPS : 26.0 Features : fp asimd aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 3 Hardware : MT8161 Linux version 3.10.65 P33 Sony Xperia Z3+ E6533, Quad-core 1.5 GHz & Quad-core 2 GHz Qualcomm Snapdragon 810 64-bit CPU Screen pixels w x h 1080 x 1776 Android Build Version 5.0.2 Processor : AArch64 Processor rev 1 (aarch64) processor : 0 to 7 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd07 CPU revision : 1 Hardware : Qualcomm Technologies, Inc MSM8994 Linux version 3.?10.?49 P36 LGE LG-H811 Qualcomm Snapdragon 808, 1.8 GHz 64-bit Hexa-Core Device LGE LG-H811 Screen pixels w x h 1440 x 2392 Android Build Version 5.1 Processor : AArch64 Processor rev 2 (aarch64) processor : 0, 1, 2, 3, 4, 5 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd07 CPU revision : 2 Hardware : Qualcomm Technologies, Inc MSM8992 Revision : 000b Linux version 3.10.49- P37 Lenovo Moto G4 Snapdragon 617, Octa-core Cortex-A53 Cores 4x1.5 GHz 4x1.2 GHz, 2 GB RAM 933 MHz, GPU Adreno 405 550 MHz Device Motorola Moto G (4) Screen pixels w x h 1080 x 1776 Android Build Version 6.0.1 CPU part : 0xd03 CPU revision : 4 Hardware : Qualcomm Technologies, Inc MSM8952 Revision : 82a0 Processor : ARMv7 Processor rev 4 (v7l) Device : athene_13mp Radio : EMEA MSM Hardware : MSM8952 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 5, 6, 7 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.00 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 Linux version 3.10.84-g061c37c P37 Later Android Build Version 7.0 Linux version 3.10.84-g478d03a P38 Samsung Galaxy Note 4 Snapdragon 805, 4x2.7 GHz Cortex A57 + 4x1.3 GHz Cortex A53 Device Samsung SM-N910C Screen pixels w x h 1440 x 2560 Android Build Version 6.0.1 processor : 4 to 7 model name : ARMv7 Processor rev 0 (v7l) BogoMIPS : 76.00 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xd07 CPU revision : 0 Hardware : Samsung EXYNOS5433 Revision : 0015 Serial : bfc12ce406b30041 Linux version 3.10.9-9186796 P39 Galaxy Tab S2 SM-T710 EXYNOS 5433, 4x1.9 GHz Cortex A57 + 4x1.3 GHz Cortex A53 Device Samsung SM-T710 Screen pixels w x h 1536 x 2048 Android Build Version 6.0.1 processor : 4 to 7 model name : ARMv7 Processor rev 0 (v7l) BogoMIPS : 76.00 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xd07 CPU revision : 0 Hardware : Samsung EXYNOS5433 Revision : 0008 Serial : 5f827412e6280033 Linux version 3.10.9-8374498 P40 Moto X 1st XT1049, dual core 1.7 GHz Qualcomm Snapdragon S4 Pro MSM8960 Device Motorola XT1049 Screen pixels w x h 720 x 1184 Android Build Version 5.1 Processor : ARMv7 Processor rev 0 (v7l) processor : 0, 1 BogoMIPS : 13.53 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 CPU implementer : 0x51 CPU architecture: 7 CPU variant : 0x2 CPU part : 0x04d CPU revision : 0 Hardware : msm8960dt Revision : 8300 Serial : 0001000c044ef01d Device : ghost Radio : 4 Linux version 3.4.42-gd5fa9d8 P41 Moto G Play XT1607, quad core 1.2 GHz Cortex A53 MSM8916 Snapdragon 410 Device Motorola Moto G Play Screen pixels w x h 720 x 1184 Android Build Version 6.0.1 CPU revision : 0 Hardware : Qualcomm Technologies, Inc MSM8916 Revision : 81b0 Serial : e5c8122300000000 Device : harpia Radio : US MSM Hardware : MSM8916 processor : 0 to 3 model name : ARMv7 Processor rev 0 (v7l) BogoMIPS : 38.00 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 0 Linux version 3.10.49-g41f86a8 A1 Asus MemoPad 7 ME176CEX, 1.86 GHz Atom Intel Atom Z3745 Device Asus K013 Screen pixels w x h 800 x 1216 Android Build Version 4.4.2 Processor : ARMv7 processor rev 1 (v7l) BogoMIPS : 1500.0 Features : neon vfp swp half thumb fastmult edsp vfpv3 CPU implementer : 0x69 CPU architecture: 7 CPU variant : 0x1 CPU part : 0x001 CPU revision : 1 Hardware : placeholder Revision : 0001 Linux version 3.10.20 Mainly runs at 1.86 GHz Turbo Boost A4 Intel(R) Atom x5-Z8300 1.84 GHz (turbo) Device Intel cht_cr_rvp Screen pixels w x h 800 x 1216 Android Build Version 5.1.1 : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf nonstop_tsc_s3 pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms bogomips : 2879.90 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 76 model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz stepping : 3 microcode : 0x358 cpu MHz : 1840.000 cache size : 1024 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid Linux version 3.14.37 A5 Same tablet as W2 - Intel Atom Z8300 1.44 GHz, Turbo 1.84 Device Teclast X98 Plus(A5C8) Screen pixels w x h 2048 x 1440 Android Build Version 5.1 Processor : ARMv7 processor rev 1 (v7l) BogoMIPS : 1500.0 Features : neon vfp swp half thumb fastmult edsp vfpv3 vfpv4 idiva idivt CPU implementer : 0x69 CPU architecture: 7 CPU variant : 0x1 CPU part : 0x001 CPU revision : 1 Hardware : placeholder Revision : 0001 Linux version 3.14.37-x86_64-L1-R429 R1 Same as tablet W! running via Remix for PC with Android 6 Intel Z8300 quad core 1.44 GHz Turbo 1.8 Device PIPO W1S Screen pixels w x h 396 x 674 Android Build Version 6.0.1 - 64 bit flags etc. As A4 above processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 76 model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz stepping : 3 microcode : 0x34f cpu MHz : 1599.975 cache size : 1024 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid Linux version 4.4.14-android-x86_64 R2 Same as PC - Core i7 4820K quad core + HT at 3900 MHz Turbo Screen pixels w x h 396 x 674 Android Build Version 6.0.1 - 64 bit flags: numerous bogomips : 7421.92 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz stepping : 4 microcode : 0x416 cpu MHz : 2471.484 cache size : 10240 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes Linux version 4.4.14-android-x86_64 W1 Pipo W1S Tablet. Intel Z8300 quad core 1.44 GHz Turbo 1.84 Same as R1 above Windows 10, 4 GB DDR 3 1600 CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000406C3 Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz Measured 1440 MHz Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, AMD64 processor architecture, 4 CPUs Windows NT Version 6.2, build 9200, Memory 4020 MB, Free 2520 MB W2 Same tablet as A5 Teclast X98 Plus, Intel Atom Z8300 1.44 GHz, Turbo 1.84 CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000406C3 Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz Measured 1440 MHz Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, Intel processor architecture, 4 CPUs Windows NT Version 6.2, build 9200, Memory 4021 MB, Free 2540 MB User Virtual Space 4096 MB, Free 4083 MB 64 Bit AMD64 processor architecture, 4 CPUs User Virtual Space 134217728 MB, Free 134217716 MB PC Core i7 4820K quad core + HT at 3900 MHz Turbo Same as R2 above CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4 Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz Measured 3711 MHz Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, AMD64 processor architecture, 8 CPUs Windows NT Version 6.2, build 9200, Memory 32705 MB, Free 30584 MB User Virtual Space 134217728 MB, Free 134217715 MB |