Roy Longbottom at Linkedin  Roy Longbottom's Android Native ARM + Intel Benchmarks

For latest results see Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM, Intel and MIPS.

Contents


General Logged Configuration Whetstone Benchmark
Dhrystone Benchmark Linpack Benchmark Livermore Loops Benchmark
MemSpeed Benchmark BusSpeed Benchmark RandMem Benchmark
MP-MFLOPS Benchmarks MP-MFLOPS Benchmark Results MP-Whetstone Benchmark
MP-Dhrystone Benchmark MP-BusSpeed Benchmark MP-RandMem Benchmark
NEON-Linpack Benchmark NeonSpeed Benchmark NEON-MFLOPS-MP Benchmark
NEON-Linpack-MP Benchmark FFT Benchmarks System Details

Download Benchmark Apps


A Settings, Security option may need changing to allow installation of non-Market applications

Logo NativeWhetstone2.apk
First standard benchmark
Download
Logo Dhrystone2i.apk
First integer benchmark
Download
Logo LinpackDP2.apk
First comptutational benchmark
Download
Logo LinpackSP2.apk
Single precision Linpack
Download
Logo LivermoreLoops2.apk
First supercomputer benchmark
Download
Logo MemSpeedi.apk
Floating Point Cache and
RAM Test
Download
Logo BusSpeedv7i.apk
Integer Bus, Cache and RAM
Test
Download
Logo RandMemi.apk
Random/Serial Access
Cache and RAM Test
Download
Logo MP-MFLOPSi.apk
CPU, Cache, RAM MFLOPS
Test
Download
Logo MP-MFLOPS2i.apk
Long Running MP-MFLOPS
Download
Logo MP-WHETSi.apk
Whetstone Floating and Fixed Point Tests
Download
Logo MP-Dhryi.apk
Dhrystone Integer Benchmark
Download
Logo MP-BusSpdi.apk
Multithreaded BusSpeed
Benchmark
Download
Logo MP-RndMemi.apk
Multithreaded RandMem
Benchmark
Download
Logo NEON-Linpacki.apk
Linpack Benchmark using ARM
NEON Intrinsic Functions
Download
Logo NeonSpeedi.apk
NEON Memory Speed Test
Using Intrinsic Functions
Download
Logo NEON-MFLOPS2i-MP.apk
MP-MFLOPS using ARM
NEON Intrinsic Functions
Download
Logo NEON-Linpacki-MP.apk
Linpack MP Benchmark nsing
NEON Intrinsic Functions
Download
Logo MP-BusSpd2i.apk
Long running vesion
with staggered start
Download
Logo fft1.apk
Original FFT Benchmark
Download
Logo fft3c.apk
Optimised FFT Benchmark
Download




All the above were produced using gcc 4.8, via Eclipse, running under Linux Ubuntu 14.04

General

Intel Atom processors are appearing in a number of Android devices. When running existing ARM apps that are compiled to produce native code, rather than via Java, Android, for these devices, has a compatibility layer, called Houdini, that maps ARM instructions into X86 instructions. This is known to produce poor performance, with questions on battery drain.

My existing Android benchmarks were produced on Linux Ubuntu based PCs, using Eclipse. Many use a Java front end, with C/C++ code compiled using a Java Native Interface. These projects can be downloaded from Android Benchmarks.zip, Android Graphics Benchmarks.zip, Android NEON Benchmarks.zip, and Android MP Benchmarks.zip.

The JNI directory contains the C/C++ code and an Application.mk file that tells the compiler which platform to produce machine code for. The mk file, for original benchmarks, had parameters APP_ABI := armeabi-v7a, for ARM V7 CPUs, or = armeabi armeabi-v7a, to include earlier technology, the appropriate one being selected at run time.

I was surprised to find that gcc 4.8 provided parameters to produce native Intel code, and others. Those currently available are arm64-v8a, armeabi, armeabi-v7a, mips, mips64, x86 and x86-64. I use APP_ABI := all, to at least run the programs via ARM and Intel CPUs. Although the Atom is a 64 bit CPU, the currently installed Android 4.4 will not run x86-64 compilations. Eclipse projects for the new compilations are in Android Intel-ARM Benchmarks.zip

Initial comparisons provided are for tablets with Intel Atom, ARM Cortex-A9 and ARM Cortex-A15 CPUs, plus via BlueStacks Emulator running under Windows 7, on a 3.0 GHz Phenom, and Windows 8 on a 3.7 GHz Core i7. The results are for the original ARM only compilations and the latest with ARM and Intel native instructions.

These benchmarks should also run on 64 bit CPUs with 64 bit versions of Android. Some slight changes are being included in the programs to identify which section of the software is being used. They are being run on a Lenovo Tab 2 A8-50, 8 Inch Tablet, with a 1.3 GHz MediaTek mt8161 quad core processor (64 bit ARM Cortex-A53) and Android 5.0.2. Further details are in Android 64 Bit Benchmarks.htm and results are included below.

To Start


Logged Configuration

All the benchmarks were run on an Asus MeMO Pad 7 ME176CX that has a quad core Intel Atom Z3745, rated as 1.33 GHz but mainly running at the Turbo Boost Speed of 1.86 GHz. All benchmarks have an option save results via Email, and this includes details of system used. Following are example details provided for this Asus MeMo Pad 7. Similar details of other Android deices are in Android Benchmarks.htm. Those provided later are a brief summary.


 Intel CPU Code

 Device Asus K013
 Screen pixels w x h 800 x 1216
 Android Build Version      4.4.2
 d : 0, siblings : 4, core id : 3, cpu cores : 4, apicid : 6, initial apicid : 6
 fdiv_bug : no, f00f_bug : no, coma_bug : no, fpu : yes, fpu_exception : yes
 cpuid level : 11, wp : yes
 flags : fpu vme + numerous others including up to SSE4
 bogomips : 2666.77
 clflush size : 64
 cache_alignment : 64
 address sizes : 36 bits physical, 48 bits virtual
 processor : 3
 vendor_id : GenuineIntel
 cpu family : 6
 model : 55
 model name : Intel(R) Atom(TM) CPU  Z3745  @ 1.33GHz
 stepping : 8
 microcode : 0x81b
 cpu MHz : 1862.000
 cache size : 1024 KB
 physical i
 Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP
 PREEMPT Tue Sep 16 10:49:37 CST 2014

 With ARM CPU Code

 Screen pixels w x h 800 x 1216
 Android Build Version      4.4.2
 Processor : ARMv7 processor rev 1 (v7l)
 BogoMIPS : 1500.0
 Features : neon vfp swp half thumb fastmult edsp vfpv3
 CPU implementer : 0x69
 CPU architecture: 7
 CPU variant : 0x1
 CPU part : 0x001
 CPU revision : 1
 Hardware : placeholder
 Revision : 0001
 Serial : 0000000000000001
 Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP
 PREEMPT Tue Sep 16 10:49:37 CST 2014
 
   

To Start


Whetstone Benchmark - NativeWhetstone2.apk

This provides an overall rating in MWIPS, plus separate results for the eight test procedures in MFLOPS (floating point) and MOPS (functions and integer). For full details and results via Windows. Linux, Android and via different programming languages, see Whetstone Benchmark Results on PCs.

Native Intel code produced average performance gains of 1.93 times using Atom A1. The original version was slow running on the Phenom based BlueStacks Android emulator, not the case with the later BlueStacks version, running on the 3.7 GHz Core i7, with both being much faster on the newer benchmark, apparently running native Intel instructions, rather than conversion to ARM. With the later ARM code, MWIPS was much lower on the Cortex CPUs, entirely due to the slow EXP functions test.

July 2015 - ARM/Intel version speeds are similar to the original on ARM CPUs reported here, except the COS tests on T7 and T11 which produces significant impact on the overall MWIPS rating.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Results at 32 and 64 bits were not that different.



 System   ARM   MHz Android MWIPS  ------MFLOPS-------   ------------MOPS--------------
 See      CPU        Build           1      2      3     COS   EXP  FIXPT      IF  EQUAL

 Original ARM Version
 A1    Z3745   1866  4.4.2 1075.4  373.8  311.5  284.5  21.9  14.2 1421.1  1839.2  797.0
 T7    v7-A9   1200  4.1.2 1115.0  271.3  250.7  256.4  25.8  14.6 1190.0  1797.0 1198.7
 T22   v8-A53  1300  5.0.2 1433.7  348.0  319.3  308.2  36.3  19.8 1551.4  1861.9  611.0
 T11   v7-A15  1700  4.2.2 1477.7  363.9  220.6  307.5  39.7  18.0 1690.5  2527.9 1127.9
 T21   QU-800  2150  4.4.3 2035.1  665.7  640.0  531.6  45.2  23.1 3535.2  3180.4 2120.0
 BS1 Emul Phen 3000  2.3.4  103.6   36.9   32.6   37.7   1.8   1.4  130.2   414.0  374.1
 BS2 Emul i7   3700  4.4.2  844.5  428.6  351.8	 343.6  14.6  10.9 1909.1   533.5  478.8

 ARM/Intel 32 Bit Version
 A1    Z3745   1866  4.4.2 1888.4  665.8  504.4  492.0  35.7  27.5 3191.4  3585.8 2146.7
 T7     v7-A9  1200  4.1.2  731.1  273.6  253.0  252.8  28.0   5.0 1185.2  2383.4 1192.1
 T11   v7-A15  1700  4.2.2  907.4  363.3  327.1  303.1  33.6   6.3 1506.9  2476.5 1122.6
 T21   QU-800  2150  4.4.3 1973.8  679.6  648.4  525.6  44.7  21.9 3516.7  3147.2 1567.7
 T22   v8-A53  1300  5.0.2  834.7  348.9  312.7  310.9  36.7   5.4 1556.7  1867.2  570.5
 BS1 Emul Phen 3000  2.3.4 2992.3  897.2  707.4  623.6  76.3  37.8 3705.9  4423.1 2281.5
 BS2 Emul i7   3700  4.4.2 5086.9 1066.7 1120.0  963.2 166.4  56.4 6300.0 11436.5 3786.9

 ARM/Intel 64 Bit Version
 T22   v8-A53  1300  5.0.2 1494.2  347.1  307.0  305.9  37.5  20.6 1552.2  1863.7 1239.1
   

To Start


Dhrystone Benchmark - Dhrystone2i.apk

The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). Further details of the Dhrystone benchmark, and results from Windows and Linux based PCs, can be found in Dhrystone Results.htm. The ratio MIPS/MHz is often quoted, but this depends on compiler optimisation (or over-optimisation)

The new version, with native Intel code, produces a 33% gain in performance, with BlueStacks Emulator 9.2 times faster. Arm Cortex speeds are somewhat slower.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation produced a significant improvement.


 System   ARM    MHz   Android       Vax       MIPS
 See                                MIPS       /MHz

 Original ARM Version
 A1    Z3745   1866     4.4.2       1840       0.99
 T7    v7-A9   1200     4.1.2       1610       1.34
 T22   v8-A53  1300     5.0.2       1683       1.29
 T11   v7-A15  1700     4.2.2       3189       1.88
 T21   QU-800  2150     4.4.3       3854       1.79
 BS1 Emul Phen 3000     2.3.4        484       0.16
 BS2 Emul i7   3700     4.4.2        746       0.20

 ARM/Intel 32 Bit Version
 A1    Z3745   1866     4.4.2       2451       1.31
 T7    v7-A9   1200     4.1.2       1317       1.10
 T22   v8-A53  1300     5.0.2       1423       1.09
 T11   v7-A15  1700     4.2.2       2551       1.50
 T21   QU-800  2150     4.4.3       3319       1.54
 BS1 Emul Phen 3000     2.3.4       4464       1.49
 BS2 Emul i7   3700     4.4.2       8841       2.39

 ARM/Intel 64 Bit Version
 T22   v8-A53  1300     5.0.2       2569       1.98

   

To Start


Linpack Benchmark - LinpackDP2.apk, LinpackSP2.apk

The Linpack benchmark speed is measured in MFLOPS, officially for double precision floating point calculations. A version was produced using NEON functions, that only provides single precision operation. So, for comparison purposes, an available C code option, to define single precision data, was used to produce a new version and this has usually lead to a higher MFLOPS speed. Results from various hardware and software platforms can be found in Linpack Results.htm.

Performance of the Linpack benchmark is almost entirely dependent on the calculation x[i]=x[i]+c*y[i]. Later ARM processors include vfpv4 instructions that execute fused multiply-accumulate instructions, possibly doubling performance. Compilation of these seems to have appeared in compiler gcc 4.8. Tablet T11 has vfpv4 but T7 does not - See System Details. The result is that the T11 DP benchmark runs much faster on the recompiled code (same with T21). The Intel Native code compilation, running on A1, was more than twice as fast as the original, produced by gcc 4.4. Some of the gain is due to using the new compiler, with conversion to ARM instructions, and others due to native Intel code.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation increased speed by almost 2 times with double precision calculations and 2.7 times at single precision.

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with SP speed of 1277 MFLOPS at 64 bits.

BlueStacks is particularly fast running with the native Intel version.

 
 System   ARM    MHz   Android  LinpackDP  LinpackSP
 See                              MFLOPS     MFLOPS

 Original ARM Version
  A1    Z3745   1866     4.4.2    168.16     296.63
  T7    v7-A9   1200     4.1.2    151.05     201.30
  T22   v8-A53  1300     5.0.2    156.70     184.09
  T11   v7-A15  1700     4.2.2    459.17     803.04    
  T21   QU-800  2150     4.4.3    389.52     751.95
  BS1   Emul Ph 3000     2.3.4     16.61      26.53
  BS2   Emul i7 3700     4.4.2    138.85     227.42 

 GCC 4.8 ARM Version
  A1    Z3745   1866     4.4.2    282.29

 ARM/Intel 32 Bit Version
  A1    Z3745   1866     4.4.2    362.63     408.87
  T7    v7-A9   1200     4.1.2    159.34     199.84
  T22   v8-A53  1300     5.0.2    172.28     180.64
  T11   v7-A15  1700     4.2.2    826.36     952.88
  T21   QU-800  2150     4.4.3    629.92     790.83             
  BS1   Emul Ph 3000     2.3.4   1808.57    1474.70
  BS2   Emul i7 3700     4.4.2   3390.95    1886.36

 ARM/Intel 64 Bit Version
  T22   v8-A53  1300     5.0.2    340.18     482.43
  P33   QU-810  2000     5.0.2              1277.76 
    

To Start


Livermore Loops Benchmark - LivermoreLoops2.apk

The Livermore Loops comprise 24 kernels of numerical application with speeds calculated in MFLOPS. A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. As for other of these benchmarks, details and results are provided, in this case, in Livermore Loops Results.htm.

This time, the new compiler produces some slower results on Tablet T11, with the Atom, running native code, being faster on average, and 2.56 times faster than via that ARM conversion Houdini layer. T21 MFLOPS can also be different.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Here, 64 bit/32 bit geometric mean performance ratio is 1.5.


 System   ARM    MHz   Android        
 See                               Max  Average  Geomean Harmean   Min                         

 Original ARM Version
  A1    Z3745   1866     4.4.2    535.8   201.9   172.4   146.7    48.8
  T7    v7-A9   1200     4.1.2    391.9   202.1   181.3   160.9    68.1
  T11   v7-A15  1700     4.2.2   1252.8   476.0   375.8   288.8    90.8 
  T21   QU-800  2150     4.4.3   1075.5   437.1   356.7   284.4   100.3
  BS2   Emul i7 3700     4.4.2    321.7   134.4   118.1   101.8    29.3

 ARM/Intel 32 Bit Version
  A1    Z3745   1866     4.4.2   1031.2   480.0   429.8   378.6   154.7
  T22   v8-A53  1300     5.0.2    393.4   188.3   158.3   124.6    27.1
  T7    v7-A9   1200     4.1.2    396.6   207.6   175.6   136.1    26.8
  T11   v7-A15  1700     4.2.2   1411.4   471.2   342.1   219.5    34.3 
  T21   QU-800  2150     4.4.3   1159.4   446.9   356.0   280.3   112.3
  BS2   Emul i7 3700     4.4.2   5422.6  2232.1  1784.4  1372.7   350.5

 ARM/Intel 64 Bit Version
  T22   v8-A53  1300     5.0.2    772.2   265.9   232.5   206.3    97.8 
  

To Start


MemSpeed Benchmark - MemSpeedi.apk

This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing double precision MB/second by 8 and 16, for the two tests, and single precision speeds by 4 and 8. Assembly listings for integer tests show that Millions of Instructions Per Second (MIPS) can be found by multiplying MB/second by 0.78 with 2 adds and 0.66 for the other test. Cache sizes are indicated by varying performance as memory usage changes. For more details and further results see MemSpeed in Android Benchmarks.htm.

The native ARM/Intel results, on Intel Atom based A1, averaged 44% faster via L1 cache data, 27% using L2 and 14% from RAM. Result on tablets T7. T11 and T21 showed some gains and some losses. The Intel native code is particularly demonstrated by results using the BlueStacks App Player, running on an Intel Core i7 based PC.

August 2015 - Results provided for 64 bit T22. The 64 bit compilation was nearly twice as fast as the 32 bit version with double precision floating point calculations, using cached data, and provided a 33% increase from RAM. Corresponding single precision ratios were 2.6 and 2.0 times and integer ratios of 2.2 and 1.5.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

  Android MemSpeed Benchmark 1.1 01-Feb-2015 10.06

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   2773   1745   2821   5993   3274   3094 L1
      32   3088   1690   2451   4849   2769   2896
      64   3066   1694   2245   3883   2434   2568 L2 
     128   3084   1695   2261   3886   2466   2524
     256   3158   1732   2285   3964   2264   2176
     512   2666   1721   2295   3959   2505   2561
    1024   2938   1659   2163   3567   2356   2443
    4096   2775   1653   2123   3055   2307   2395 RAM
   16384   2827   1659   2121   3208   2321   2411
   65536   2840   1661   2112   3248   2314   2406

          Total Elapsed Time   10.8 seconds
 
 #################### A1 ARM-Intel ######################
 ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 11.46

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   3287   1859   4560   9789   4688   7316
      32   3233   1856   3807   6633   3990   4030
      64   3304   1860   2965   4457   2996   3894
     128   3303   1855   3006   4463   3113   3992
     256   3306   1860   2978   4463   3093   3946
     512   3307   1862   2964   4377   3097   3958
    1024   3031   1778   2766   3993   2867   3472
    4096   2863   1776   2692   3129   2763   3046
   16384   2857   1776   2702   3063   2768   3050
   65536   2865   1765   2702   3176   2782   3087

          Total Elapsed Time   10.1 seconds

 
 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2000 MHz Cortex-A15, Android 4.2.2
                Measured 1700 MHz

  Android MemSpeed Benchmark 1.1 09-Aug-2013 17.04

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   7296   4159   3513   9375   5453   6211 L1
      32   7253   4540   3882   7364   4873   4839
      64   6902   4265   3878   7026   4373   4274 L2
     128   6735   4032   2480   4005   2797   3288
     256   5859   3775   2192   4527   3263   3676
     512   5795   3781   3568   6282   3819   3818
    1024   2609   1757   1754   2607   1805   1825
    4096   1614   1422   1471   1654   1342   1441 RAM
   16384   1624   1412   1474   1642   1336   1443
   65536   1617   1408   1479   1368   1321   1423

          Total Elapsed Time   10.7 seconds
 
 #################### T11 ARM-Intel ####################

 ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 12.26

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   6540   4359   4580  10119   6292   6502
      32   8185   5132   4682   8729   4622   4465
      64   5770   3530   3473   5780   3447   3782
     128   5311   3386   3475   5225   3441   3451
     256   5667   3642   3678   5805   3643   3726
     512   5047   3318   3334   4869   3303   3337
    1024   2015   1469   1423   2050   1452   1386
    4096   1535   1322   1342   1598   1381   1385
   16384   1505   1379   1406   1584   1387   1384
   65536   1509   1306   1332   1585   1387   1382

          Total Elapsed Time   10.8 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4

 Android MemSpeed Benchmark 1.1 02-Jun-2015 11.01

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   8922   4635   3566  12412   5648   3774 L1
      32   5116   3542   2773   7594   4827   3657 L2
      64   5174   3393   2684   5652   3757   3130
     128   5286   3387   2648   5443   3758   3194
     256   4937   3446   2889   7469   4624   3449
     512   4941   3459   2915   7452   4566   3724
    1024   4837   3449   2848   7065   4455   3722
    4096   2840   2606   2343   2581   2458   2567 RAM
   16384   2606   2423   2232   2395   2238   2338
   65536   2653   2453   2257   2457   2312   2420

          Total Elapsed Time    9.7 seconds

     Maximum SP MFLOPS 1159 Integer MIPS 2802


 #################### T21 ARM-Intel ####################

 ARM/Intel MemSpeed Benchmark 1.1 02-Jun-2015 11.27

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   8074   4831   2603  11252   5065   3892 L1
      32   5302   4138   3709   7252   4985   3693 L2
      64   4801   3510   2832   5739   3684   3015 
     128   4502   3783   3577   5991   3914   3547
     256   4907   3913   3934   6876   4280   4056
     512   4686   3883   3921   6236   4215   4060
    1024   4716   3808   3823   6131   4185   3942 
    4096   2691   2603   2679   2249   2634   2709 RAM
   16384   2227   2223   2420   1798   2191   2445
   65536   2099   2106   2306   1738   2040   2346

          Total Elapsed Time    9.9 seconds

     Maximum SP MFLOPS 1207 Integer MIPS 2898

 
 ###################### T22 32 Bit ######################

  ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.16
           Compiled for 32 bit ARM v7a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   1940    971   1693   2470   1278   2084 L1
      32   1879    955   1676   2378   1255   1967
      64   1801    938   1615   2254   1218   1912 L2
     128   1706    941   1620   2279   1224   1872
     256   1818    935   1570   2291   1155   1875
     512   1633    884   1451   2008   1132   1704
    1024   1276    781   1181   1454    938   1324 RAM
    4096   1335    808   1260   1533   1010   1386
   16384   1342    813   1270   1487   1013   1419
   65536   1346    809   1274   1546   1031   1252

          Total Elapsed Time   11.7 seconds


###################### T22 64 Bit ######################

 ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.29
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   4092   2198   3951   5293   3611   4408
      32   3753   2496   3630   4651   3300   3992
      64   3407   2388   3368   3715   3023   3677
     128   3496   2462   3521   4137   3139   3844
     256   3535   2481   3573   4199   3322   3911
     512   3054   2248   3126   3556   2548   3372
    1024   1714   1704   2029   2069   1854   2099
    4096   1832   1595   1841   1914   1780   1897
   16384   1844   1601   1850   1925   1798   1891
   65536   1859   1608   1837   1921   1795   1812

          Total Elapsed Time   10.2 seconds


 ##################### T7 Original ######################

 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 1 GB DDR3 RAM 
          Measured 1200 MHz

 Android MemSpeed Benchmark 17-Oct-2012 20.19

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   1735    888   2456   2726   1364   2818 L1
      32   1448    760   1474   1700   1039   1648
      64   1318    719   1290   1468    952   1385 L2
     128   1279    715   1289   1443    944   1336
     256   1268    714   1279   1435    943   1313
     512   1158    691   1204   1321    892   1228
    1024    729    553    735    772    632    742
    4096    445    392    425    442    421    439 RAM
   16384    435    390    428    435    412    431
   65536    445    404    393    450    432    449

          Total Elapsed Time   12.2 seconds

 #################### T7 ARM-Intel #####################

 ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.24

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   1856   1019   2537   2913   1459   2544
      32   1416    832   1327   1508    920   1345
      64   1286    779   1198   1418    908   1296
     128   1282    781   1195   1424    912   1305
     256   1278    774   1190   1433    878   1298
     512   1197    752   1122   1340    862   1216
    1024    833    626    822    903    695    857
    4096    463    420    456    463    440    459
   16384    459    426    453    455    435    458
   65536    463    430    411    462    443    452

          Total Elapsed Time   11.5 seconds

 
 #################### BS2 Original ######################
 
 BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8

 Android MemSpeed Benchmark 1.1 25-Apr-2015 12.58

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   1523   1777    731   1406   1939   1163
      32   1306   1641    787   1641   1939   1023
      64   1524   1230    511   1422   1662   1143
     128   1524   1707    787   1641   1641    948
     256   1456   1670    853   1525   1708   1094
     512   1527   1642    853   1642   1779    948
    1024   1528   1646    853   1646   1713   1094
    4096   1535   1809    853   1809   1945   1194
   16384   1638   1638    819   1774   1872   1170
   65536   1404   1747    819   1747   1820   1156

          Total Elapsed Time   12.5 seconds

 #################### BS2 ARM-Intel #####################

 ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.47

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16  35555   9309  14065  30476  19393  19394
      32  30476  19394  14222  35555  18518  17066
      64  26666  16623  17778  30476  18286  16410
     128  26667  17778  17778  29092  18286  19051
     256  25098  16675  16327  27354  19395  18825
     512  25100  13063  12190  26666  19395  17793
    1024  24631  17589  16415  24623  16415  16415
    4096  24638  17783  16644  24638  17093  17783
   16384  14745  12639  11000  14000  13611  12834
   65536  14043  11359  12336  15490  10649  10649

          Total Elapsed Time   12.6 seconds
   

To Start


BusSpeed Benchmark - BusSpeedv7i.apk

This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum speed can be estimated by multiplying this minimum value by 16. With this burst rate, measured speed at 32 word and 16 word increments are likely to be the same. Cache sizes are indicated by varying speed as memory use changes. Note, with smallest L1 cache demands, measured speed can be low due to overheads when reading little data. For more details and further results see BusSpeed in Android Benchmarks.htm.

The native code ARM/Intel version provided no real performance improvement on tablet A1, with the Atom Z3745 CPU. In ARM mode, there was also little difference on Tablets T21, T11 and T7. The main reason for these similarities is the long sequence of identical C arithmetic statements is easy to convert for efficient processing. BlueStacks speed on the Intel CPU were again outstanding.

August 2015 - Results provided for 64 bit T22. Reading all data, 64/32 bit comparison ratios were up to 2.0 from L1 cache, 1.5 from L2 cache and 1.25 from RAM.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android BusSpeed Benchmark 1.1 v7 21-Dec-2014 16.06

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   4178   3473   6270   6713   6759   6869 L1
      32   1420   1529   2252   2686   3702   5108
      64   1385   1498   2276   2629   3657   5108 L2
     128   1394   1542   2278   2614   3640   5092
     256   1410   1576   2258   2607   3259   5110
     512   1417   1574   2274   2602   3700   5119
    1024    349    428    888   1431   2848   4306 RAM
    4096    215    265    593   1181   2289   3891
   16384    210    266    596   1181   2278   3897
   65536    220    272    600   1193   2346   3886

          Total Elapsed Time    5.1 seconds

 #################### A1 ARM-Intel ######################

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   4845   5705   6403   6926   7094   7167 L1
      32   1407   1716   2255   2646   3713   5094
      64   1395   1703   2257   2689   3754   4843 L2
     128   1283   1571   2108   2620   3671   5135
     256   1416   1753   2288   2679   3687   5178
     512   1439   1372   2251   2510   3679   5183
    1024    350    409    942   1696   2792   4403
    4096    213    253    564   1188   2173   3631 RAM
   16384    219    259    600   1189   2330   3920
   65536    218    259    599   1102   2323   3716

          Total Elapsed Time    5.1 seconds


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz
  2 GB DDR3-1600 RAM, dual channel, 12.8 GB/sec

 Android BusSpeed Benchmark 1.1 v7 09-Aug-2013 17.07

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   3193   3451   4412   5272   5389   6191 L1
      32   1298   1558   1990   3478   4264   4420
      64    804    928   1209   2442   3263   3426 L2
     128    784    904   1175   2321   3148   3333
     256    780    908   1181   2336   3142   3327
     512    788    907   1165   2312   3120   3300
    1024    360    387    384    803   1348   1744
    4096    145    146    194    507    648   1378 RAM
   16384    141    136    190    507    638   1373
   65536    142    141    191    506    643   1371

          Total Elapsed Time    5.3 seconds

 #################### T11 ARM-Intel ####################

 ARM/Intel BusSpeed Benchmark 1.1 v7 23-Apr-2015 12.15

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   2085   3208   4055   4553   5272   5758
      32   1282   1811   2498   4182   4867   5163
      64    600    864   1309   2974   3504   3841
     128    614    892   1310   3027   3500   3826
     256    614    892   1337   3050   3509   3828
     512    618    888   1319   3042   3382   3811
    1024    425    479    444   1244   1803   2291
    4096    146    146    191    590   1050   1751
   16384    141    139    186    585   1039   1725
   65536    139    139    187    585   1039   1721

          Total Elapsed Time    5.3 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4

 Android BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   1382   1350   3122   4300   4938   5283 L1
      32   1106   1118   2026   2637   3786   5210 L2
      64   1064   1118   2058   2679   3820   5251
     128   1123   1170   2081   2688   3669   4166
     256   1121   1196   2109   2623   3873   3429
     512    940   1127   2050   2684   3777   4795
    1024    951   1124   2038   2655   3759   4950
    4096    239    375    472    806   1486   2679 RAM
   16384    239    370    464    806   1476   2656
   65536    239    368    495    854   1537   2792

          Total Elapsed Time    5.0 seconds


 #################### T21 ARM-Intel ####################

 ARM/Intel BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   1328   1442   2797   4291   4699   5685 L1
      32   1165   1100   1933   2848   3603   5844 L2
      64   1147   1055   2007   2846   3586   5890
     128   1181   1136   2008   2711   3600   5878
     256   1185   1126   2018   2716   3568   5873
     512   1022   1026   1805   2525   3378   5611
    1024    796    843   1584   2202   3088   5053
    4096    199    294    431    657   1166   2409 RAM
   16384    200    299    430    659   1167   2408
   65536    205    301    436    668   1173   2380

          Total Elapsed Time    5.2 seconds


 ###################### T22 32 Bit ######################

 T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 10.57
           Compiled for 32 bit ARM v7a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16    874    932   1814   2302   2355   2263 L1
      32    758    803   1309   1820   2323   2386
      64    653    671   1203   1741   2206   2332 L2
     128    603    620   1107   1693   2222   2351
     256    574    589   1075   1711   2211   2327
     512    332    372    681   1075   1863   2120
    1024    137    193    371    578   1322   2129 RAM
    4096    172    179    351    567   1151   2126
   16384    172    178    351    504   1117   2136
   65536    172    177    349    478    882   2129

          Total Elapsed Time    5.3 seconds


 ###################### T22 64 Bit ######################

 T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 11.02
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   3188   3635   3937   4327   4372   4462
      32   1478   1607   2246   3382   3853   4144
      64    600    622   1163   2011   2972   3585
     128    558    575   1056   1889   2892   3525
     256    538    550   1028   1826   2837   3260
     512    371    425    813   1490   2403   3202
    1024    136    196    382    728   1423   2750
    4096    170    177    346    669   1340   2652
   16384    169    174    341    678   1352   2663
   65536    168    174    341    676   1347   2611

          Total Elapsed Time    5.2 seconds


 ##################### T7 Original ######################

 T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, 1 GB DDR3 RAM 

 Android BusSpeed Benchmark 19-Oct-2012 17.29

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   2723   2420   3044   3364   3499   3500 L1
      32   1054   1087   1061   1382   1565   2145
      64    436    433    419    652    751   1160 L2
     128    345    337    337    542    633    943
     256    329    309    322    522    614    961
     512    339    299    311    506    574    937
    1024    170    168    180    269    349    629
    4096     59     55     84    127    176    338 RAM
   16384     56     56     83    125    173    335
   65536     56     56     82    125    174    334

          Total Elapsed Time    5.7 seconds
 
 #################### T7 ARM-Intel #####################

 ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.30

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   2940   3344   3625   3866   3862   3893
      32    698    707    682   1071   1208   1826
      64    448    477    465    726    851   1357
     128    367    355    292    542    657   1070
     256    334    344    341    546    651   1059
     512    326    336    336    531    629   1025
    1024    169    175    197    309    411    749
    4096     58     58     83    131    191    395
   16384     56     57     83    129    189    392
   65536     56     48     82    129    187    388

          Total Elapsed Time    5.6 seconds


 #################### BS2 Original ######################

 BS 2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8 

 Android BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.57

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   1428   1280   1280   1422   1333   1489
      32   1428   1280   1280   1365   1706   1602
      64   1066   1481   1600   1463   1463   1707
     128   1666   1365   1489   1463   1463   1833
     256   1429   1706   1293   1425   1466   1823
     512   1333   1463   1603   1425   1468   1565
    1024   1280   1463   1710   1468   1565   1730
    4096   1282   1367   1475   1730   1310   1617
   16384    412    943    958   1258   1398   1677
   65536    449    958   1078   1304   1677   1677

          Total Elapsed Time    6.8 seconds

 #################### BS2 ARM-Intel #####################

 ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.49

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16  13333  12800  22222  13675  18285  14224
      32  10666  10666  12190  21333  21367  21334
      64   6666   6666  10666  13333  21333  21337
     128   6826   6400  10240  17067  21335  18290
     256   4266   5120   8533  13654  18290  20483
     512   2667   2667   5335   9103  16386  20515
    1024   2560   2560   5692   9105  15608  22806
    4096   2673   2752   5470   9175  17126  21880
   16384    741    943   2070   4404   8808  14680
   65536    542    838   1572   3595   6710  11930

          Total Elapsed Time    6.5 seconds
   

To Start


RandMem Benchmark - RandMemi.apk

RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches. For more details and further results see RandMem in Android Benchmarks.htm.

On A1 Atom based tablet, the native code ARM/Intel version results showed gains of around 25% on all reading tests, but no difference with writing and reading. The same benchmark, running on Tablets T11 and T21, showed some improvement, using cache based data, but a variability in comparative performance on T7.

August 2015 - Results provided for 64 bit T22 showing 32 bit and 64 bit versions were not that different overall, each one slightly faster on some tests.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

  Android RandMem Benchmark 1.1 01-Feb-2015 10.12

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     3434     5064     3462     5113 L1
       32     2833     4042     2652     3645
       64     2837     4058     2068     2561 L2
      128     2822     4041     1809     2205
      256     2828     4040     1435     1755
      512     2816     3997     1245     1456
     1024     2578     3256      379      445
     4096     2412     1946      209      268 RAM
    16384     2485     2039      179      217
    65536     2457     2041      140      170

          Total Elapsed Time   11.8 seconds

 #################### A1 ARM-Intel ######################

 ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 17.27

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     4291     5626     4584     5630
       32     3217     3792     3492     3783
       64     3677     4253     2629     2644
      128     3666     4241     2299     2289
      256     3688     3930     1829     1850
      512     3682     4189     1522     1592
     1024     3285     3558      562      667
     4096     2999     2007      272      274
    16384     3019     2065      210      220
    65536     2989     2068      141      186

          Total Elapsed Time    8.8 seconds


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz

 Android RandMem Benchmark 1.1 13-Aug-2013 17.29

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     2881     2478     3388     3650 L1
       32     4301     2968     3197     3249
       64     3669     2511     2201     2249 L2
      128     3566     2560     1571     1566
      256     3557     2461     1334     1256
      512     3524     2547     1136     1098
     1024     1933     1144      534      513
     4096     1993     1064      184      173 RAM
    16384     1970     1086      141      144
    65536     1973     1117      106      104

          Total Elapsed Time    9.1 seconds

 #################### T11 ARM-Intel ####################

 ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 20.42

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     3642     3102     5464     4114
       32     5462     3409     4096     3737
       64     4800     2785     2028     2064
      128     4308     2575     1572     1589
      256     4381     2574     1332     1260
      512     4311     2544     1215     1097
     1024     2033     1156      513      471
     4096     1891     1042      213      178
    16384     2028     1032      154      139
    65536     2033     1055      109      106

          Total Elapsed Time    9.2 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4

 Android RandMem Benchmark 1.1 10-Jun-2015 12.43

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     4407     4704     3995     4900
       32     2611     3071     2207     2703
       64     2496     2797     1821     2139
      128     2080     3173     1668     1758
      256     2425     3183     1439     1520
      512     2359     3116     1193     1355
     1024     2366     3117      368      382
     4096     2293     2280      201      209
    16384     2293     2237      170      175
    65536     2299     2261      146      150

          Total Elapsed Time    8.5 seconds


#################### T21 ARM-Intel ####################

 ARM/Intel RandMem Benchmark 1.1 10-Jun-2015 12.45

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     5005     4626     4067     4863
       32     3253     2994     2246     2622
       64     3223     2855     1986     2072
      128     2861     3128     1912     1776
      256     3246     3174     1666     1523
      512     3195     3111     1469     1372
     1024     3190     3079      369      383
     4096     3027     2381      212      213
    16384     3065     2300      174      177
    65536     3080     2281      150      150

          Total Elapsed Time    8.6 seconds


 ###################### T22 32 Bit ######################

  T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 

ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.29
           Compiled for 32 bit ARM v7a

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     2807     3606     2753     3595 L1
       32     2719     3433     1429     1930
       64     2615     3266      914     1166 L2
      128     2592     3243      705      828
      256     2570     3223      637      720
      512     2367     2684      237      347
     1024     2137     1855      120      163 RAM
     4096     1918     1658       83       97
    16384     2152     1665       74       85
    65536     2104     1652       72       64

          Total Elapsed Time   11.6 seconds


###################### T22 64 Bit ######################

  T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.32
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     3865     3033     3798     3027
       32     3622     2760     3105     2734
       64     3094     2803     1011     1077
      128     3074     2740      776      801
      256     3050     2771      718      693
      512     2420     2463      270      371
     1024     1322     1853      131      164
     4096     1754     1598       87      100
    16384     1791     1586       75       91
    65536     1856     1609       57       68

          Total Elapsed Time   14.6 seconds


 ##################### T7 Original ######################

 T7,   ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

  Android RandMem Benchmark 20-Oct-2012 11.14

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     2788     3041     2795     3041 L1
       32     2769     3011     2767     3020
       64     1027     1038      839      911 L2
      128      916      918      616      649
      256      904      905      514      538
      512      899      907      475      499
     1024      712      699      345      354
     4096      323      284       92       88 RAM
    16384      316      282       73       70
    65536      314      281       65       62

          Total Elapsed Time   10.9 seconds

 #################### T7 ARM-Intel #####################

 ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.33

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     2521     3175     2490     3038
       32     1427     1451     1218     1446
       64     1133     1052      853      907
      128     1039      871      646      650
      256     1028      909      543      518
      512     1025      895      499      502
     1024      700      489      242      236
     4096      487      282       90       88
    16384      483      281       71       70
    65536      478      274       63       62

          Total Elapsed Time   11.3 seconds


 #################### BS2 Original ######################

 BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8

 Android RandMem Benchmark 1.1 25-Apr-2015 12.59

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     4069     5008     4069     2174
       32     4439     5426     4069     1953
       64     3974     5682     3552     1860
      128     3721     5209     3758     1717
      256     4342     5210     3157     1204
      512     4167     5342     2845     1141
     1024     4350     5208     2606     1000
     4096     3475     5709     1938      867
    16384     4343     5120      747      400
    65536     3657     5818      533      256

          Total Elapsed Time   14.2 seconds

 #################### BS2 ARM-Intel #####################

 ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.50
          BlueStacks on 3.9 GHz Core i7

    MBytes/Second Transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16    23252    24414    19148    29593
       32    25432    27127    25432    24038
       64    21552    23674    14533     9301
      128    21702    20834    12020     8140
      256    22727    19934     9470     6513
      512    22321    17362     5953     5686
     1024    20840    18945     5691     4815
     4096    21053    16693     2291     2291
    16384    12308    10057     1067     1018
    65536    10667    10338      753      711

          Total Elapsed Time    8.3 seconds
   

To Start


MP-MFLOPS Benchmarks - MP-MFLOPSi and MP-MFLOPS2i

The benchmarks are recompilations of those in www.roylongbottom.org.uk/Android MultiThreading Benchmarks.htm. The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run. The numeric results start with values of 1.0, with subsequent calculations reducing the values, the amount depending on the number of calculations.

An example of results for MP-MFLOPSi, from the log file, is provided below. showing identical numeric results, independent of the number of threads used (as it should be). This original version became too fast for later technology, producing inconsistent MFLOPS performance ratios. Versions with longer running versions were produced, to avoid this problem, in this case MP-MFLOPS2i with 50 times more calculations, producing the expected reduction in result values. The numeric results from ARM processors are slightly different, due to rounding effects (see Short and Long below).

Examination of disassembled code, using default compile parameters, showed that Intel SIMD and ARM NEON instructions were not being produced. These could execute such as four linked multiply and add instructions simultaneously, providing MFLOPS speeds of up to eight times CPU MHz, per core. The type of instructions used are shown below, where Intel varieties used only one word out of four in SSE registers (Single Instruction Single Data - SISD), and ARM code employed single word scalar registers. The latter were vector type, using three registers, including such as floating-point multiply-accumulate single precision (fmacs).

The released versions were recompiled, using the compile options shown below, but made no difference to the type of code used. Intel compilations used more registers that produced faster speeds at 32 operations per word. ARM code was virtually identical, producing similar performance.


 Intel CPU Short - 5000 Repeat Passes

 ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      642     717     658    1053    1026     987
 2T     1052    1366    1016    2018    2108    2063
 4T     1752    2483     956    3817    3676    3894
 8T     1436    2217     992    3213    3428    3289
 Results x 100000, 0 indicates ERRORS
 1T    86735   98519   99984   79894   97641   99975
 2T    86735   98519   99984   79894   97641   99975
 4T    86735   98519   99984   79894   97641   99975
 8T    86735   98519   99984   79894   97641   99975

          Total Elapsed Time    3.6 seconds
 
 Intel CPU Long - 100000 Repeat Passes

 1T-8T 40392   76406   99700   35296   66012   99521

 ######################################################

 ARM CPU Short

 1T-8T 86735   98519   99984   79897   97638   99975

 ARM CPU Long

 1T-8T 40392   76406   99700   35218   66014   99520

 ######################################################

 Android.mk LOCAL_CFLAGS

 ifeq ($(TARGET_ARCH_ABI),x86)
    LOCAL_CFLAGS += -ffast-math -mtune=atom -mssse3 -mfpmath=sse
 endif
 ifeq ($(TARGET_ARCH_ABI),x86_64) 
    LOCAL_CFLAGS += -ffast-math -mtune=slm -msse4.2
 endif 
 ifeq ($(TARGET_ARCH_ABI),armeabi-v7a)
    LOCAL_ARM_NEON  := true
    LOCAL_CFLAGS    += -mfpu=neon
 endif
 ifeq ($(TARGET_ARCH_ABI),arm64-v8a)
    LOCAL_CFLAGS    += -DHAVE_NEON64=1
 endif

 ######################################################

 Intel SSE SISD Instructions - not SIMD        
 mulss   36(%esp), %xmm2      addss   %xmm1, %xmm2

 ARM Vector Instructions - not NEON 
 fmuls   s15, s15, s10        fmacs   s15, s14, s23
   

To Start


MP-MFLOPS Benchmark Results

Below are MFLOPS results, mainly for the longer running versions, including those from the original ARM compilations. The first ones are for tablet A1, with the quad core Intel Atom CPU, where results for the the shorter running version are also provided, showing some slower speeds. In this case, performance from the native Intel code was up to nearly twice as fast as the ARM converted test run. In both cases, with 2 operations per word, maximum MP gains were on using L2 cache based data, with RAM speed limitations, but requiring two threads for maximum speed. With 32 operations per word, the quad cores provided performance gains of nearly four times.

Tablet T11 had some slightly slower results on the ARM/Intel variety, with tablet T7 providing little variation. Except for RAM based data, and 2 operations per word, appropriate performance gains were produced in line with the number of cores.

T21, with the Qualcomm Snapdragon 800, produced similar speeds using the old and ARM/Intel versions. Calculation speeds, with 1 and 2 threads, could be slower than T11, Cortex-A15, but RAM speed was much faster. The opposite applied, compared with A1 Atom, using native code.

August 2015 - Results provided for 64 bit T22 showing that, at 32 operations per word, it was just over twice as fast at 64 bits, then up to 3.7 times, at 2 operations per word, with cache based data. The reason is that 64 bit vector SIMD instructions were produced, instead of scalars.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android MP-MFLOPS2 Benchmark V2.1 04-Feb-2015 11.03

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      502     501     476     575     575     573
 2T     1012     975     921    1133    1140    1115
 4T     1571    1627     979    2238    2255    2258
 8T     1550    1890    1007    2235    2239    2217

          Total Elapsed Time  117.4 seconds

 #################### A1 ARM-Intel ######################

 ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      642     717     658    1053    1026     987
 2T     1052    1366    1016    2018    2108    2063
 4T     1752    2483     956    3817    3676    3894
 8T     1436    2217     992    3213    3428    3289

  V7 Short Version Total Elapsed Time    3.6 seconds


 ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.24

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      695     696     661    1061    1061    1055
 2T     1335    1382    1058    2088    2086    2102
 4T     1832    2635     979    3993    4125    4145
 8T     2026    2557    1007    3842    4044    4110

         Total Elapsed Time   65.8 seconds

 -- Single Thread MFLOPS No Extra Compile Options --

         704     713      675    773     779     774


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
          Dual Core CPU Measured GHz = 1.7

 Android MP-MFLOPS2 Benchmark V2.1 29-Apr-2015 10.22

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
1T      845     817     544    1546    1539    1512
2T     1593    1668     648    3140    3067    2977
4T     1974    1775     645    2963    3093    2845
8T     1935    2059     652    3108    3147    2985

          Total Elapsed Time   58.5 seconds

 #################### T11 ARM-Intel ####################

 ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 20.30

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      695     756     536    1537    1501    1476
 2T     1319    1527     645    3151    3077    3000
 4T     1604    1567     657    3035    3095    2997
 8T     1604    1639     658    3108    3125    2996
 
         Total Elapsed Time  59.1 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
             Quad Cote 2150 MHz Measured

Android MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 15.35

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      718     781     590    1214    1220    1228
 2T     1572    1583    1118    2406    2436    2442
 4T     2338    2959    1836    4867    4911    4859
 8T     3148    3266    1866    4870    4916    4888

          Total Elapsed Time   56.4 seconds

 #################### T21 ARM-Intel #################### 

 ARM/Intel MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 16.50

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      822     768     636    1232    1228    1231
 2T     1662    1637    1184    2460    2463    2446
 4T     2509    3216    1659    4519    4762    4900
 8T     2965    3193    1881    4847    4925    4880


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.17
            Compiled for 32 bit ARM v7a

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      190     190     184     670     672     664
 2T      377     378     370    1343    1345    1329
 4T      707     755     725    2657    2669    2621
 8T      722     736     714    2640    2672    2631

           Total Elapsed Time  113.0 seconds

###################### T22 64 Bit ######################

 ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.24
            Compiled for 64 bit ARM v8a

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      705     701     636    1398    1394    1362
 2T     1376    1395     942    2794    2797    2757
 4T     2063    2602     962    5491    5546    5336
 8T     2474    2611     957    5367    5500    5417

           Total Elapsed Time   51.6 seconds


 ##################### T7 Original ######################

 T7,   ARM Cortex-A9 1300 MHz, Android 4.1.2, 
        Quad Core CPU Measured MGz = 1200

 Android MP-MFLOPS2 Benchmark V2.1 05-Feb-2015 11.37

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      182     156     114     598     578     572
 2T      365     321     194    1194    1163    1141
 4T      716     655     233    2367    2316    2240
 8T      717     682     233    2347    2371    2246

          Total Elapsed Time  135.5 seconds

 #################### T7 ARM-Intel #####################

 ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.44

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      188     156     116     598     578     574
 2T      365     319     197    1195    1161    1145
 4T      682     709     237    2372    2345    2249
 8T      678     731     237    2361    2381    2254

           Total Elapsed Time  135.0 seconds
  

To Start


MP-Whetstone Benchmark - MP-WHETSi

For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same code, with separate variables. In this case, performance of each of the eight test functions and overall MWIPS ratings is invariably (nearly) proportional to the number of CPU cores available. The driving program checks that calculations on every thread produce consistent numeric results.

The gcc 4.8 based ARM/Intel version, running on the Intel Atom tablet, is rated at twice the speed of the original, due to the use of native code. The fixed point results indicate overoptimisation, but the test uses little of the overall time, this being mainly dependent on the Cos, Exp and third MFLOPS tests.

The new native ARM version, running on tablets T11 and T7, produces a much slower overall MWIPS rating, mainly due to the Exp tests, but also influence by other slower results (some same as above). T21 indicates slower floating point calculations.

August 2015 - Results provided for 64 bit T22 showing that, at 64 bits, the Fixpt test was clearly nearly optimised out, but this makes little difference to the overall MWIPS rating, at 2.25 times faster than the 32 bit benchmark.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

   Android MP-Whetstone Benchmark V1.1 04-Feb-2015 11.39

                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T   953.7  363.0  382.4  267.8  21.0  13.2   413.1  1842.4  392.3
 2T  1921.2  726.0  663.5  541.4  42.6  27.0   816.1  3662.6  793.3
 4T  3820.6 1419.2 1514.6 1081.5  84.1  54.0  1543.8  6292.4 1588.5
 8T  4003.8 1912.9 1872.4 1114.1  86.5  56.4  2053.1  8292.6 1599.7

  Overall Seconds   4.88 1T,   4.87 2T,   4.96 4T,  10.05 8T

 #################### A1 ARM-Intel ######################

    ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 17.35
 
                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T  1916.9  691.4  691.3  497.2  35.3  27.6 10209.8  2787.3 1351.8
 2T  3800.3 1377.6 1381.2  980.0  70.1  54.7 20248.0  5252.8 2748.7
 4T  7604.9 2713.2 2711.8 1977.1 140.2 110.0 33906.3  9526.5 5550.8
 8T  7798.1 3141.5 3627.2 2064.2 141.2 110.2 59590.6 12743.7 5711.5

  Overall Seconds   4.94 1T,   5.00 2T,   5.06 4T,  10.11 8T


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz

   Android MP-Whetstone Benchmark V1.1 06-Sep-2013 12.49
 
                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T  1308.2  345.9  379.0  294.1  30.8  17.2  1351.4  1265.7  843.1
 2T  2886.6  782.1  782.6  614.0  80.1  34.3  2775.2  2463.7 1667.5
 4T  3086.0  998.6  788.1  610.6  79.2  44.5  3472.0  2526.4 2191.4
 8T  2930.0  788.2  843.5  616.5  80.5  35.0  2846.0  2799.1 1686.2

  Overall Seconds   3.54 1T,   3.30 2T,   6.62 4T,  13.16 8T

 #################### T11 ARM-Intel ####################

    ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.23

                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T   837.2  340.1  341.7  191.2  39.1   6.2  1521.1  2532.8  629.3
 2T  1676.2  596.2  683.2  387.3  77.8  12.4  3056.9  5055.1 1263.6
 4T  1697.7  687.5  869.4  394.5  78.1  12.4  2980.7  6518.4 1258.8
 8T  1685.2  685.9  691.0  389.7  78.3  12.4  3086.3  5113.7 1262.0

  Overall Seconds   4.06 1T,   4.07 2T,   8.12 4T,  16.19 8T

 
 #################### T21 Original #####################

    T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4

    Android MP-Whetstone Benchmark V1.1 06-Jul-2015 10.42

                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp  Fixpt     If  Equal
                 1      2      3  MOPS  MOPS   MOPS   MOPS   MOPS

 1T  1877.1  645.2  642.6  524.1  44.0  22.3 1364.7 1572.1  898.9
 2T  3668.6 1220.2 1262.4 1021.9  85.9  43.8 2663.5 3078.4 1753.4
 4T  7426.9 2375.5 2474.7 2097.7 175.7  88.2 5052.6 6240.4 3555.0
 8T  7706.6 2692.2 2746.2 2186.9 180.1  90.3 5822.5 6902.7 3681.3

 Overall Seconds   4.44 1T,   4.62 2T,   4.64 4T,   9.00 8T

          Total Elapsed Time   24.1 seconds

 #################### T21 ARM-Intel #################### 

    ARM/Intel MP-Whetstone Benchmark V1.1 22-Jul-2015 12.02
 
                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T  1598.0  512.1  508.7  311.7  43.6  22.1  1142.9  2123.3  598.4
 2T  3161.2  960.0  996.7  614.2  86.7  43.8  2258.9  3820.9 1194.7
 4T  6348.0 1593.5 2019.5 1231.5 174.2  88.5  4471.1  8139.4 2398.3
 8T  6419.6 2058.2 2077.5 1252.6 175.0  88.7  4520.9  8875.0 2409.0

 Overall Seconds   4.88 1T,   5.00 2T,   5.05 4T,   9.92 8T

          Total Elapsed Time   29.2 seconds


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

     ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.30
            Compiled for 32 bit ARM v7a

                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T   676.4  275.9  281.9  147.9  35.4   5.3   600.3   901.0  285.5
 2T  1362.5  533.8  561.7  298.0  70.9  10.8  1203.1  1838.9  574.0
 4T  2698.6  903.9 1071.7  594.4 141.2  21.5  2346.1  3305.5 1138.5
 8T  2830.1 1463.2 1393.0  614.2 152.5  21.9  3243.9  4418.3 1171.4

 Overall Seconds   4.95 1T,   4.94 2T,   5.11 4T,  10.09 8T

###################### T22 64 Bit ######################

     ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.34
            Compiled for 64 bit ARM v8a

                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T  1524.8  328.6  348.8  297.6  37.3  19.9 1462579  1867.2 1238.0
 2T  3062.5  688.8  697.9  596.0  75.5  39.8 2097113  3726.7 2481.3
 4T  6085.4 1214.9 1360.5 1185.4 150.5  79.4 2449153  7055.0 4951.8
 8T  6222.4 1495.2 1545.6 1204.2 152.2  80.6 3869846  9218.8 5154.1

 Overall Seconds   4.92 1T,   4.90 2T,   5.05 4T,   9.97 8T


 ##################### T7 Original ######################

 T7,   ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

     Android MP-Whetstone Benchmark V1.0 17-Oct-2012 13.49

                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T  1033.7  247.4  235.4  266.0  25.3  15.0   448.4   630.9  513.5
 2T  2058.1  456.3  473.0  532.4  50.0  30.1   898.1  1198.4 1026.6
 4T  4122.8  831.9  944.7 1064.6 100.7  60.1  1797.0  2392.2 2053.4
 8T  4163.2 1016.0  948.2 1069.5 101.8  60.9  1808.0  2414.2 2051.5

  Overall Seconds   5.28 1T,   5.34 2T,   5.42 4T,  10.81 8T

 #################### T7 ARM-Intel #####################

    ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.32

                    Using 1, 2, 4 and 8 Threads
      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                 1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

 1T   602.2  242.3  242.3  140.2  27.2   4.9   482.8  1425.2  239.1
 2T  1208.7  481.2  484.2  280.8  55.0   9.9   970.0  2869.6  478.7
 4T  2398.7  805.4  966.7  562.5 109.5  19.5  1938.2  5722.5  957.1
 8T  2429.1  974.6 1076.2  562.4 110.9  19.7  1981.5  5816.1  963.6

  Overall Seconds   4.94 1T,   4.93 2T,   5.08 4T,   9.93 8T
   

To Start


MP Dhrystone Benchmark - MP-Dhryi.apk

For further details see Dhrystone Benchmark above and the following, including further results Android MultiThreading Benchmark Apps. This multithreading benchmark runs using 1, 2, 4 and 8 threads, executing multiple copies of the same program. An initial calibration, using a single thread, determines the number of passes needed for an overall execution time of 1 second. Then all threads are run using the same pass count, running time being extended when there are more threads than CPUs. The same calculations are carried out on each thread. Separate data arrays are used for each thread but some variables can be used by all threads. The latter is probably responsible for failure to increase throughput, using multiple threads.

The new ARM/Intel version demonstarted similar speeds on the systems tested. Unlike other systems, the Intel Atom based tablet produced slower performance using multiple threads. Tests on a PC, via BlueStacks emulator, appeared to demonstrate that native Intel instructions were being used.

T21, with the Qualcomm Snapdragon 800, sometimes crashed running this benchmark and apparently every time, trying the ARM-Intel version. When running, the eigth thread performance is also highly suspect.

August 2015 - Results provided for 64 bit T22 showing that the 64 bit version was much faster than via the 32 bit variety.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.00

 Threads                        1        2        4        8
 Seconds                     0.96     3.27     6.83    13.79
 Dhrystones per Second    4147126  2449335  2343954  2320745
 VAX MIPS rating            2360     1394     1334     1321

 #################### A1 ARM-Intel ######################

 ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.02

 Threads                        1        2        4        8
 Seconds                     0.96     3.44     6.88    13.80
 Dhrystones per Second    4154551  2323340  2324139  2318280
 VAX MIPS rating             2365     1322     1323     1319


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz

 Android MP-Dhrystone 2 Benchmark V1.1 10-Aug-2013 09.55

 Threads                        1        2        4        8
 Seconds                     0.50     0.53     1.05     2.18
 Dhrystones per Second    3990211  7522450  7600539  7328598
 VAX MIPS rating             2271     4281     4326     4171
 
 #################### T11 ARM-Intel ####################

 ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.22

 Threads                        1        2        4        8
 Seconds                     0.99     1.12     2.33     4.45
 Dhrystones per Second    4031981  7127449  6856521  7196710
 VAX MIPS rating             2295     4057     3902     4096


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4

 Android MP-Dhrystone 2 Benchmark V1.1 06-Jul-2015 11.22

 Threads                        1        2        4        8
 Seconds                     0.64     0.83     0.94     1.23
 Dhrystones per Second    5007132  7722435 13592474 20769050
 VAX MIPS rating             2850     4395     7736    11821

          Total Elapsed Time    4.4 seconds



 #################### T21 ARM-Intel #################### 

 Failed to run


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.32
           Compiled for 32 bit ARM v7a

 Threads                        1        2        4        8
 Seconds                     0.64     0.71     0.90     1.70
 Dhrystones per Second    2481286  4495793  7094180  7540038
 VAX MIPS rating             1412     2559     4038     4291

###################### T22 64 Bit ######################

 ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.36
            Compiled for 64 bit ARM v8a

 Threads                        1        2        4        8
 Seconds                     0.89     1.06     1.64     3.24
 Dhrystones per Second    4476736  7574470  9768350  9861922
 VAX MIPS rating             2548     4311     5560     5613

            
 ##################### T7 Original ######################

 T7,   ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

 Android MP-Dhrystone 2 Benchmark V1.0 17-Oct-2012 13.59

 Threads                        1        2        4        8
 Seconds                     0.72     0.83     1.19     2.55
 Dhrystones per Second    2782404  4829150  6740332  6271011
 VAX MIPS rating             1584     2749     3836     3569

 #################### T7 ARM-Intel #####################

 ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.18

 Threads                        1        2        4        8
 Seconds                     0.78     0.95     1.27     2.44
 Dhrystones per Second    2572642  4214238  6280420  6565767
 VAX MIPS rating             1464     2399     3575     3737


 ################ BlueStacks Emulator ##################

            PC with 3 GHz Phenom x4, windows 7

 VAX MIPS Original            474      465      453      449
 VAX MIPS ARM/Intel          4844     4670     4623     4724
   

To Start


MP-BusSpeed Benchmark - MP-BusSpdi.apk

This is a multithreading version of BusSpeed Benchmark above. Here, single thread performance of A1 Atom tablet was similar to that obtained unthreaded, with the ARM/Intel version again providing no improvement. Except for calculating bus speeds, the last column is the only one of real interest, where four cores produced gains of up to 3.7 times, using caches, and 1.9 times via RAM. The latter provided even better relative performance compared to ARM based systems. ARM/Intel version results are not shown for tablets T11 and T7, as they were both essentially the same as those obtained using the original MP benchmark. For further details and more results see Android MultiThreading Benchmark Apps. Some ARM/Intel results for T21 are slower than the original, but this might be due to the short running time.

Results from the PC based BlueStacks emulator are also shown below, to confirm that native Intel instructions were being used in the revised benchmark.

Estimated maximum data transfer speeds, based on burst reading results (like 16 x 1018 for T21). can exceed the specification. This is caused be shared data in the L3 cache, and the way that the program is run.

MP-BusSpd2i.apk is a revised version for Android. Running time is longer and, rather than all threads reading data from the beginning, starting addresses are staggered. This can result in slower speed as there of fewer calculations in the inner loop, but increased speed, due to cached shared data, appears to no longer be applicable and burst results can be used to estimate maximum RAM throughput (as shown).

August 2015 - Results included for T22 with 64 bit CPU and 64 bit Android 5.0. Just considering the Read All data, A53 64/32 bit L1 cache, L2 cache and RAM performance ratios averaged 2.2, 1.8 and 1.0.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 13.02

    MB/Second Reading Data, 1, 2, 4 and 8 Threads
   KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

  12.3 1T   3990   4458   6123   6512   6438   6729
       2T   3894   5699   8948  10299  11800  12555
       4T   5046   7109  11952  14750  15533  23304
       8T   4533   7464  13097  16970  21674  22225
 122.9 1T   1304   1613   2291   2661   3667   5063
       2T   2568   3145   4529   5365   7440  10147
       4T   4117   4801   7963   7495   8239  18911
       8T   3130   5016   7355   8543  11648  15845
 12288 1T    190    265    601   1203   2316   3832
       2T    244    448    995   1771   3599   6575
       4T    427    584    860   1741   3439   7449
       8T    395    510    855   1613   3547   6776

          Total Elapsed Time   13.5 seconds

 #################### A1 ARM-Intel ######################

 ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.28

    MB/Second Reading Data, 1, 2, 4 and 8 Threads
   KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

  12.3 1T   5925   6494   6778   6979   7047   7026
       2T   3966   7029   9689  11689  12856  13654
       4T   4438   8698  16739  22057  23946  25729
       8T   4455   8619  15787  19934  22576  20804
 122.9 1T   1490   1975   2360   2802   3818   5330
       2T   2881   3798   4647   5531   7536  10546
       4T   4452   6338   5910  10217  14650  19903
       8T   4096   5075   6264   9213  12610  15821
 12288 1T    206    273    593   1198   2343   3935
       2T    276    455    842   1821   3319   6591
       4T    445    730   1401   2076   4457   7525
       8T    424    539    954   1829   3688   7064

          Total Elapsed Time   13.0 seconds

 ########## A1 New Long Version

 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.50

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   5431   6110   6780   6262   6655   7313
      2T   3550   4464   7375   9825  11777  12442
      4T   2027   4442   4399   8841  17611  23509
      8T    983   2477   5063   4433   8568  15867
122.9 1T   1499   1991   2357   2839   3818   5382
      2T   2816   3808   4708   5592   7557  10677
      4T   4316   6313   7991   9816  14335  19993
      8T   4235   5610   7917   8791  12828  19661
49152 1T    215    275    611   1183   2328   3922
      2T    276    435    787   1671   3323   6507
      4T    398    455    884   1754   3490   6971
      8T    376    511    867   1746   3512   7510

          Total Elapsed Time   48.6 seconds

 Maiximum RAM Speed Estimate = 511 x 16 = 8176 MB/second


 #################### T11 ARM-Intel ####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz

 ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.45

    MB/Second Reading Data, 1, 2, 4 and 8 Threads
   KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

  12.3 1T   2165   3591   4256   5587   5998   6109
       2T   4121   6469   9530  11381  11846  11936
       4T   4106   6438   8827   6793   9802  12080
       8T   4098   6390   9534  10141  10996  11603
 122.9 1T    464    740   1173   2395   3276   3340
       2T    579    989   1934   3994   5431   5792
       4T    579    988   1930   3873   5469   5821
       8T    580    985   1915   3999   5408   5812
 12288 1T    134    172    211    462    602   1904
       2T    269    343    387    934   1217   2685
       4T    252    231    374    768    991   2625
       8T    231    254    367    781   1104   2782

          Total Elapsed Time   12.1 seconds

 ########## T11 New Long Version

 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 17.07

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   3499   4539   5499   5505   6134   6045
      2T   3775   7202   8377  10605  10457  11319
      4T   3982   6676   7687   9326   9707  10807
      8T   2546   3643   7891   8003  10725  11097
122.9 1T    672    901   1336   2784   3274   3334
      2T    568    969   1931   3894   5427   5221
      4T    574    971   1912   3831   5256   4811
      8T    559    971   1917   3878   5387   5162
49152 1T    140    142    193    575    989   1499
      2T    221    223    342    769   1379   2355
      4T    228    223    344    783   1382   2376
      8T    223    223    342    787   1385   2352

          Total Elapsed Time   49.9 seconds

 Maiximum RAM Speed Estimate = 223 x 16 = 2568 MB/second

 Initial Results

 12.3 1T    693    936   1266   2522   3264   3329
      2T    557    900   1539   3459   3317   3613
      4T    551    903   1557   2902   3475   3616


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
     Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
     L1 caches 4 x 32 KB, L2 cache shared 2048 KB

 Android MP-BusSpd v7 Benchmark V1.1 29-Jun-2015 18.37

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   2580   2206   5048   5176   5679   5989
      2T   4062   5175   9340   9868  10971  11281
      4T   4688  10324  16552  17196  21714  23708
      8T   8467   9834  16698  18183  21936  23693
122.9 1T   1152   1052   2068   3035   3927   5723
      2T   1710   1840   3094   5001   7963  11475
      4T   2047   2002   5031   9267  14698  22920
      8T   2235   2275   5223   9348  14234  21783
12288 1T    262    382    508    867   1466   2661
      2T    464    766   1049   1754   3186   5735
      4T    612   1018   1796   3149   5892   9095
      8T    575    680   1277   2308   4987   7948

          Total Elapsed Time   12.7 seconds

 #################### T21 ARM-Intel #################### 

 ARM/Intel MP-BusSpd v7 Benchmark V1.1 23-May-2015 17.05

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   1840   2073   3512   3554   4829   5243
      2T   3432   4591   7128   7651   9120   9821
      4T   4398   7855  13752  15428  18530  20235
      8T   6692   9507  13857  16110  18143  18796
122.9 1T    860    753   2011   2841   3205   5282
      2T   1505   1609   3076   5038   8089  10421
      4T   1924   1981   4299   7588  14614  20754
      8T   1909   1988   4264   7980  13884  19027
12288 1T    270    379    538    856   1626   2859
      2T    471    677   1098   1849   3304   5924
      4T    549    787   1066   1874   6274  10781
      8T    713    853   1649   2258   4664   8321

          Total Elapsed Time   13.1 seconds

 ########## T21 New Long Version

 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.39

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   2247   2616   4010   4443   4909   5614
      2T   3558   4725   7241   9048   9747  10892
      4T   6074   8303  13442  16937  18525  21068
      8T   3998   5106  14314  13615  18200  20740
122.9 1T    874   1198   2024   2935   4529   5345
      2T   1686   1702   3174   5357   7688  10545
      4T   1988   2139   4465   8171  14969  21169
      8T   1972   2139   4468   8195  15261  21132
49152 1T    292    406    516    899   1663   2929
      2T    449    541    962   1569   2851   4776
      4T    495    605   1109   2439   4161   8243
      8T    530    564   1156   2149   4172   7907

          Total Elapsed Time   48.0 seconds

 Maiximum RAM Speed Estimate = 605 x 16 = 9680 MB/second


 ###################### T22 32 Bit ######################

 T22, Tab 2 A8-50, 1.3 GHz quad core 64 bit ARM Cortex-A53
  Single Channel RAM, LPDDR3 666 MHz, 5.3 GB/second

 ARM/Intel MP-BusSpd Benchmark V1.2 12-Aug-2015 16.13
           Compiled for 32 bit ARM v7a

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   1849   2140   2079   2211   2270   2297
      2T   3663   4252   4294   4400   4370   4580
      4T   4630   5574   5691   5893   6015   6083
      8T   5331   5775   6033   6622   7968   8023
122.9 1T    597    621   1119   1815   2135   2237
      2T    869    943   1644   2992   3740   4412
      4T    949    951   1922   3736   6468   7779
      8T    948    978   1911   3717   6464   7542
12288 1T    123    174    344    678   1215   1840
      2T    243    310    672   1332   2383   3974
      4T    302    285    594   1282   2271   4606
      8T    279    295    654   1198   2749   4660

          Total Elapsed Time   12.8 seconds

 ########## T22 Long Version

 ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.14
           Compiled for 32 bit ARM v7a

 12.3 1T   1877   2124   2176   2266   2296   2343
      2T   3625   4198   4341   4468   4536   4613
      4T   5733   7541   8293   8830   8024   9042
      8T   2985   3829   7438   6117   8108   8923
122.9 1T    604    625   1142   1846   2150   2284
      2T    924    950   1793   3277   4270   4504
      4T    962    989   1939   3765   6798   8862
      8T    965    993   1933   3748   6651   8239
49152 1T    165    175    344    677   1285   1979
      2T    234    238    482    961   1907   3547
      4T    266    298    562   1224   2296   4478
      8T    272    275    538   1098   2149   4282

          Total Elapsed Time   48.8 seconds


 ###################### T22 64 Bit ######################

 ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.18
           Compiled for 64 bit ARM v8a

 12.3 1T   2610   2472   2586   2727   2748   5841
      2T   4404   4681   4994   5369   5420  11297
      4T   6546   8125   9105  10243  10319  20610
      8T   3380   4023   7919   7146   9871  19852
122.9 1T    604    621   1110   1872   2446   5100
      2T    919    948   1855   3433   4853  10037
      4T    961    974   1984   3924   7491  14935
      8T    963    942   1931   3915   7572  14689
49152 1T    173    177    340    692   1300   2653
      2T    266    241    479    968   1883   3724
      4T    304    277    556   1130   2126   4328
      8T    279    278    544   1138   2179   4275

          Total Elapsed Time   49.4 seconds


 #################### T7 ARM-Intel #####################

 T7,   ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

 ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.35

    MB/Second Reading Data, 1, 2, 4 and 8 Threads
   KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

  12.3 1T   2853   3392   3376   3511   3551   3494
       2T   2857   3389   3542   5540   5730   5595
       4T   7257  10326  10289  10997  11373  11100
       8T   6584  10325  10485  11175  11322  11189
 122.9 1T    362    379    347    546    623    978
       2T    516    530    508    726   1227   1840
       4T    598    658    548   1181   1556   2657
       8T    721    733    736   1181   1548   2653
 12288 1T     58     57     84    123    173    334
       2T    111    111    182    248    348    664
       4T     87     85    276    463    687   1290
       8T    154    107    147    429    441   1242

          Total Elapsed Time   12.7 seconds

 ########## T7 New Long Version

 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.59

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

 12.3 1T   2166   2774   3181   3307   3377   3263
      2T   3924   5188   5207   5754   5759   5805
      4T   7570  10011  10252  11165  11375  11777
      8T   3510   4786   9011   8318  11351  11544
122.9 1T    383    409    359    558    663    983
      2T    525    541    520    741   1241   1814
      4T    739    752    753   1219   1590   2776
      8T    735    741    753   1218   1607   2737
49152 1T     56     51     81    126    172    330
      2T     65     67    107    196    335    620
      4T     70     68    108    215    426    835
      8T     70     68    109    215    428    851

          Total Elapsed Time   48.2 seconds


 Maiximum RAM Speed Estimate = 68 x 16 = 1088 MB/second


 ############### BlueStacks Original ###############

 Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 17.44

    MB/Second Reading Data, 1, 2, 4 and 8 Threads
   KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

  12.3 1T   1600   1538   1641   1706   1600   1687
       2T   1600   1641   1745   1600   1687   1638
       4T   1600   1745   1745   1567   1638   1575
       8T   1476   1641   1602   1638   1575   1596
 122.9 1T   1000    923   1477   1600   1600   1688
       2T   1000    952   1477   1600   1567   1282
       4T    872   1163   1422   1567   1602   1576
       8T   1026   1164   1477   1527   1644   1580
 12288 1T    307    403    537   1075   1396   1512
       2T    302    409    708   1075   1417   1433
       4T    307    355    614   1024   1433   1535
       8T    307    384    661   1023   1404   1512

          Total Elapsed Time   13.9 seconds

 ############### BlueStacks ARM/Intel ##############

  ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.25

    MB/Second Reading Data, 1, 2, 4 and 8 Threads
   KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll

  12.3 1T   9999  18461  20000  20512  19692  21942
       2T  10909  17777  19999  19692  21942  20480
       4T   9599  18461  19692  19591  20480  19692
       8T  10666  17066  19948  20480  20480  19200
 122.9 1T   1500   1476   2742   5485  11636  13128
       2T   1428   1396   2792   5585  11170  13653
       4T   1396   1428   2954   5486  10973  13654
       8T   1280   1371   2744   5909  10974  14630
 12288 1T    460    439    645    631   1105   1331
       2T    230    268    480    806   1433   2234
       4T    256    307    575   1126   2010   2764
       8T    236    390    756   1105   1911   3574

          Total Elapsed Time   14.4 seconds
  

To Start


MP-RandMem Benchmark - MP-RndMemi.apk

This is a conversion of the longer running MP-RndMem2.apk Benchmark, as the original, short version, produced inconsistent performance measurements. It is a multithreading variety of RandMem Benchmark above. For further details and more results see Android MultiThreading Benchmark Apps. Log file details are provided below for the original version, that performed relatively badly on the Intel based tablet A1, and the ARM/Intel version, with cache based speeds up to 3.6 times faster with reading tests and 1.3 times with reading/writing. The new version, running on ARM based tablets, produced similar results to those from the original, with some slower.

Compared with early ARM based devices, tablet A1 ARM/Intel tests again demonstrated superior performance from RAM based data and from L2 cache on reading, but not that well using L1 cache.

August 2015 - Results provided for 64 bit T22 with Cortex-A53 CPU. Probably as performance is dependent on the complex indexing used, performance is not significantly faster at 64 bits.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.14

   MB/Second Using 1, 2, 4 and 8 Threads
   KB       SerRD SerRDWR   RndRD RndRDWR

 12.29 1T    1337    2505    1337    2509
       2T    2637    2513    2657    2521
       4T    3535    2420    3484    2454
       8T    3195    2403    3088    2406
 122.9 1T    1305    2280     963    1758
       2T    2581    2285    1945    1748
       4T    3588    2130    3125    1740
       8T    3211    2269    2949    1745
 12288 1T    1248    1962     101     215
       2T    2469    1940     191     214
       4T    3462    1954     323     214
       8T    3127    1926     318     212

          Total Elapsed Time   43.7 seconds

 #################### A1 ARM-Intel ######################

 ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.54

   MB/Second Using 1, 2, 4 and 8 Threads
   KB       SerRD SerRDWR   RndRD RndRDWR

 12.29 1T    4643    3593    4710    3641
       2T    8583    3552    8761    3564
       4T   12707    3450   12496    3384
       8T   10410    3389   10796    3408
 122.9 1T    3733    2874    2408    2150
       2T    7259    2871    4781    2165
       4T   11726    2897    7656    2133
       8T   11673    2853    7100    2113
 12288 1T    3153    2087     226     238
       2T    5782    2073     327     238
       4T    6451    1997     447     236
       8T    6471    2071     446     233

          Total Elapsed Time   41.5 seconds


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz
 Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.13

   MB/Second Using 1, 2, 4 and 8 Threads
   KB       SerRD SerRDWR   RndRD RndRDWR

 12.29 1T    6696    4438    6594    4483
       2T   12338    3078   12263    3573
       4T   12419    2834   12166    2907
       8T   12314    2903   11991    2934
 122.9 1T    3371    2916    1639    1748
       2T    6409    1922    2052    1097
       4T    6155    1892    2027    1186
       8T    6045    2105    2015    1192
 12288 1T    1394    1048     153     133
       2T    2245     985     285     123
       4T    2277    1002     285     132
       8T    2165    1001     286     127

          Total Elapsed Time   44.0 seconds

 #################### T11 ARM-Intel ####################

 ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 12.07

   MB/Second Using 1, 2, 4 and 8 Threads
   KB       SerRD SerRDWR   RndRD RndRDWR

 12.29 1T    6315    4486    6345    4484
       2T   11837    2910   11846    3112
       4T   11864    2835   11553    2858
       8T   11821    3003   11805    3198
 122.9 1T    3963    2681    1670    1704
       2T    6672    1782    2040    1125
       4T    6493    1817    2033    1218
       8T    6673    1738    2038    1303
 12288 1T    1805    1081     177     145
       2T    2543    1066     279     137
       4T    2600    1065     276     136
       8T    2662    1073     281     138

          Total Elapsed Time   43.7 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
     Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s

 Android MP-RndMem2 Benchmark V2.1 08-Jul-2015 16.33

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR

12.29 1T    5088    5325    4262    4711
      2T    9752    4902    8895    4570
      4T   17379    4653   17434    4096
      8T   19771    4698   17358    4424
122.9 1T    2714    2578    1923    2163
      2T    5614    2502    3483    2107
      4T   10859    2219    4835    1972
      8T   10654    2410    4904    1923
12288 1T    1798     952     186     204
      2T    3489     974     341     195
      4T    6515     943     563     196
      8T    6218     922     563     187

          Total Elapsed Time   42.3 seconds

 #################### T21 ARM-Intel #################### 

 ARM/Intel MP-RndMem Benchmark V1.1 09-Jul-2015 11.48

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR

12.29 1T    4186    3777    4055    3933
      2T    9324    3541    7710    3619
      4T   16594    3350   15731    3142
      8T   18117    3291   16187    3262
122.9 1T    2423    2043    1610    1683
      2T    5235    2029    3013    1641
      4T   10148    1935    4662    1565
      8T   10015    1834    4611    1474
12288 1T    1363     886     171     186
      2T    2643     845     325     187
      4T    5197     823     534     184
      8T    4801     835     542     184

          Total Elapsed Time   42.6 seconds


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.13
           Compiled for 32 bit ARM v7a

   MB/Second Using 1, 2, 4 and 8 Threads
   KB       SerRD SerRDWR   RndRD RndRDWR

 12.29 1T    2894    2438    2887    2433
       2T    5665    2402    5663    2403
       4T   10922    2369   11100    2310
       8T   10065    2293   10648    2265
 122.9 1T    2681    2368     757     758
       2T    5351    2360    1398     769
       4T   10056    2308    2121     772
       8T    8838    2351    1916     742
 12288 1T    2309    1662      80      78
       2T    3986    1683     164      73
       4T    5419    1684     283      82
       8T    4658    1694     279      82

###################### T22 64 Bit ######################

 ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.15
           Compiled for 64 bit ARM v8a

 12.29 1T    4445    3109    4455    3089
       2T    8010    3100    8072    3105
       4T   15909    3057   14711    3040
       8T   14764    3036   14570    3037
 122.9 1T    3457    2888     842     876
       2T    6537    2924    1524     876
       4T   11095    2892    2119     861
       8T   11729    2916    2080     874
 12288 1T    2475    1679      81      78
       2T    4155    1713     163      73
       4T    5503    1711     285      89
       8T    4519    1717     281      89

          Total Elapsed Time   48.1 seconds


 ##################### T7 Original ######################

  T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

 Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.17

   MB/Second Using 1, 2, 4 and 8 Threads
   KB       SerRD SerRDWR   RndRD RndRDWR

 12.29 1T    3120    3060    3128    3078
       2T    6098    3003    6083    3004
       4T   11354    2948   11188    2942
       8T   11403    2857   10412    2872
 122.9 1T     996     983     661     699
       2T    1868     984    1012     697
       4T    2600     982    1483     699
       8T    2534     976    1459     694
 12288 1T     335     286      91      80
       2T     640     288     113      82
       4T     892     286     130      82
       8T     925     287     127      81

          Total Elapsed Time   44.7 seconds

 #################### T7 ARM-Intel #####################

 ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.59

   MB/Second Using 1, 2, 4 and 8 Threads
   KB       SerRD SerRDWR   RndRD RndRDWR

 12.29 1T    3060    2001    2867    1904
       2T    5459    1879    5463    1867
       4T   10797    1852   10537    1856
       8T   10090    1802   10608    1813
 122.9 1T     968     823     588     547
       2T    1749     785     902     618
       4T    2716     812    1328     672
       8T    2733     810    1407     673
 12288 1T     329     274      90      82
       2T     636     272     112      82
       4T     849     271     128      82
       8T     869     271     126      81

          Total Elapsed Time   45.4 seconds
   

To Start


NEON-Linpack Benchmark - NEON-Linpacki.apk

Details of the benchmark can be found above and in android neon benchmarks.htm. The main point is that it was a complete surprise to discover that ARM NEON intrinsic functions could be converted to Intel SIMD SSE instructions, with significant performance improvement on an Atom based tablet. The use of NEON functions for ARM CPUs can be anticipated to produce similar performance ratings via the original and ARM/Intel versions, as reflected in the results below.

August 2015 - T22 results from 32 bit and 64 bit compilations were similar, as the programs use a limited number of identical intrinsic functions.

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with speed of 1446 MFLOPS at 2 bits.


     NEON Single Precision Floating Point MFLOPS

 ########################################################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 
  MFLOPS  Original   443.4    ARM-Intel   900.2


 ########################################################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz

  MFLOPS  Original  1334.9    ARM-Intel  1411.9


 ########################################################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
     Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s

  MFLOPS  Original  1250.1    ARM-Intel  1235.0


 ########################################################
 
  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

  MFLOPS  32 bit     407.1    64 bit      505.2


 ########################################################

  T7,  ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

  MFLOPS  Original   376.0    ARM-Intel   346.8


 ########################################################

  P33,  Snapdragon 810 2000 MHz, Android 5.0.2

  MFLOPS  32 bit    1446.4
   

To Start


NeonSpeed Benchmark - NeonSpeedi.apk

This benchmark carries out the same calculations as the MemSpeed Benchmark measuring data reading speeds in Mega Bytes per second, with functions accessing arrays of cache and RAM based data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m] single precision floating point with x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can calculated by dividing single precision MB/second by 4 and 8, for the two tests. The first set of calculations use normal functions followed by some using NEON Intrinsic Functions. The last two columns are NEON only results. For further details and results see android neon benchmarks.htm.

The native Intel code produced some performance gains, mainly using L1 cache based data, but speed in other areas is probably limited by data flow. The later compiler produced some slower speeds on ARM based tablet T11 and better/worse variations on T21.

August 2015 - Results provided for 64 bit T22. As with NEON-Linpack, many results from 32 bit and 64 bit compilations, via NEON intrinsic functions, were similar. With normal code, the 64 bit compilations were up to near four times faster than those at 32 bits.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android NeonSpeed Benchmark V1.1 02-Feb-2015 17.09

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   1778   3940   2807   5474   4997   5062
      32   1781   3576   2636   4431   4316   4291
      64   1772   3589   2639   4480   4337   4332
     128   1784   3589   2641   4423   4320   4320
     256   1766   3592   2642   4400   4347   4358
     512   1784   3585   2633   4375   4350   4355
    1024   1705   3253   2448   3760   3789   3788
    4096   1673   3021   2366   3257   3245   3237
   16384   1672   2948   2349   3062   3157   3151
   65536   1675   2967   2345   3190   3168   3168

          Total Elapsed Time   10.8 seconds

 #################### A1 ARM-Intel ######################

 ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 16.54

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   1816   5996   4916   6244   6882   6880
      32   1851   4703   3985   5200   5609   5711
      64   1862   3845   3121   4174   4441   4520
     128   1841   3929   3110   4179   4411   4487
     256   1863   3932   3092   4179   4412   4493
     512   1861   3938   3090   3894   4215   4415
    1024   1784   3475   2738   3130   3223   3443
    4096   1741   2376   2649   2998   3112   3139
   16384   1774   3086   2780   3116   3140   3145
   65536   1774   2987   2547   2328   3126   3072

          Total Elapsed Time   10.1 seconds


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz

 Android NeonSpeed Benchmark V1.1 09-Aug-2013 17.10

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   3793   9641   4375  13023  13456  13562
      32   5777  11410   4993  11718  11365  11143
      64   4122   6692   3855   6539   6682   7210
     128   4017   6565   3849   6475   6520   6983
     256   4067   6562   3836   6459   6495   7038
     512   3900   6531   3820   6428   6490   7095
    1024   1821   2544   1774   2532   2554   2539
    4096   1141   1645   1536   1612   1615   1635
   16384   1437   1695   1490   1576   1694   1668
   65536   1424   1675   1475   1699   1687   1694

          Total Elapsed Time   11.2 seconds

 #################### T11 ARM-Intel ####################

 ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.17

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   2252   4964   3321   6602   7304   7237
      32   4202   8364   4543   8366   8553   8101
      64   3710   6096   3860   6570   6348   6182
     128   3802   5581   3874   6044   5624   5877
     256   3654   5618   3501   6154   5655   5783
     512   3597   5688   3723   6130   5812   5684
    1024   1727   2466   1659   2481   2454   2472
    4096   1479   1718   1421   1714   1713   1706
   16384   1488   1704   1435   1576   1705   1694
   65536   1477   1755   1453   1754   1759   1752

          Total Elapsed Time   10.8 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
     Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s

 Android NeonSpeed Benchmark V1.1 23-Jul-2015 13.00

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   4324  13809   4498  14660  17501  18186
      32   3587   6845   2922   8073   6981   7035
      64   3347   6894   2912   8078   6964   6938
     128   3343   6651   2919   7922   6726   6999
     256   3511   6963   3002   8071   6902   6897
     512   3476   6628   3025   7827   6613   6818
    1024   3172   4627   2773   6424   4800   4806
    4096   2653   2051   2378   3613   2090   2054
   16384   2356   1891   2118   3165   1955   1962
   65536   2424   1923   2167   3368   1933   1925

          Total Elapsed Time    9.9 seconds

 #################### T21 ARM-Intel #################### 

 ARM/Intel NeonSpeed Benchmark V1.1 23-Jul-2015 13.03

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   3623  16704   4623  15187  17446  16719
      32   3455   9210   2997   8723   9280   9112
      64   3336   7721   3002   8544   8469   8581
     128   3415   7664   3111   8481   7549   7638
     256   3584   7526   3087   8500   7849   7805
     512   3538   7422   3154   8266   7567   7541
    1024   3513   7227   3067   7789   7294   7261
    4096   2302   1673   2413   3107   1693   1677
   16384   2286   1616   2323   3024   1620   1617
   65536   2322   1617   2271   2505   1634   1600

          Total Elapsed Time    9.9 seconds


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.32
           Compiled for 32 bit ARM v7a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16    971   3853   1807   4059   3957   4397
      32    970   3812   1800   3983   3891   4323
      64    927   3228   1605   3038   3269   3521
     128    926   3321   1681   3343   3354   3596
     256    936   3386   1693   3449   3413   3667
     512    898   2889   1578   2996   2927   3118
    1024    794   1859   1345   2057   1996   1924
    4096    794   1796   1250   1788   1813   1835
   16384    792   1773   1270   1820   1829   1864
   65536    796   1811   1289   1852   1832   1880

          Total Elapsed Time   11.3 seconds

 ###################### T22 64 Bit ######################

 ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.37
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   3054   4055   3605   4376   4911   5094
      32   2922   3787   3435   4198   4546   4682
      64   2795   3514   3259   3658   4050   4116
     128   2886   3529   3373   3924   4148   3963
     256   2883   3641   3264   3942   4193   4276
     512   2454   3165   2985   3385   3586   3542
    1024   1633   2000   1835   2043   2114   2105
    4096   1738   1893   1899   1900   1956   1955
   16384   1757   1870   1886   1802   1921   1846
   65536   1755   1875   1870   1903   1936   1937

          Total Elapsed Time   10.2 seconds


 ##################### T7 Original ######################

  T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

   Android NeonSpeed Benchmark 15-Dec-2012 14.38

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16    860   2575   2325   2918   3053   3245 L1
      32    950   2551   2400   2823   2944   3131
      64    744   1396   1329   1434   1465   1496 L2
     128    713   1342   1319   1365   1392   1417
     256    714   1339   1311   1357   1377   1400
     512    708   1323   1299   1348   1358   1383
    1024    608    875    869    917    930    952
    4096    460    493    492    481    488    504 RAM
   16384    460    498    487    507    506    504
   65536    459    495    469    251    503    505

          Total Elapsed Time   11.5 seconds

#################### T7 ARM-Intel #####################

 ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.07

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16    881   2440   2501   3334   3206   3465
      32    901   1868   1705   2260   2083   2186
      64    801   1395   1365   1573   1548   1581
     128    784   1282   1278   1405   1389   1411
     256    787   1279   1285   1420   1380   1409
     512    777   1266   1267   1409   1370   1394
    1024    604    786    762    769    770    828
    4096    458    479    477    463    486    488
   16384    436    447    448    469    470    469
   65536    450    472    469    240    482    483

          Total Elapsed Time   11.5 seconds
   

To Start


NEON-MFLOPS-MP Benchmark - NEON-MFLOPS2i-MP.apk

NEON-MFLOPS-MP carries out the same calculations as MP-MFLOPS Benchmarks above, but with NEON intrinsic functions used for all calculations. For further results see android neon benchmarks.htm.

Results for the original NEON version and a sample of MP-MFLOPS are provided below. NEON produced significant performance improvements across the board, including The Atom based tablet, via the ARM to Intel conversion layer. As might be expected using intrinsics, compilation via a later version of gcc made little difference in speed of ARM systems but the Intel native code increased performance by more than twice, on CPU speed limited tests.

Following the performance details are the numeric results of calculations from the fixed parameters used in the new version, for both ARM and Intel. It seems that Tablet T11 has an intermittent fault, as it occasionally fails to calculate a correct answer or causes the Tablet to crash and reboot. Now, this also appears to happen using the older version.

August 2015 - T22 NEON 64 bit compilation produced a small performance gain over 32 bit results, at 2 operations per word, but near double speed at 32 operations, the latter suffering from fewer registers for the variables. Using one core, maximum speed was 2.77 GFLOPS, rising to 10.8 GFLOPS via four cores (best so far relative to CPU GHz). The one core speed equated to just over two floating point operation per clock cycle. This is disappointing, compared with Intel processors, such as the Core 2 onwards, at 6 per clock cycle out of a maximum of 8, with SSE SIMD code (See Linux results).

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, at 64 bits. Performance, with 8 threads, is up to 23.6 GFLOPS, and up to nearly 3.5 results per clock cycle, using one core.


 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android NEON-MFLOPS-MP Benchmark V1.1 07-Feb-2015 18.37

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1110    1319     878    1188    1139    1226
 2T     2470    2114     996    2406    2427    2390
 4T     3159    2211     988    4148    3487    4006
 8T     2066    2486    1003    4144    3944    4077

          Total Elapsed Time    3.6 seconds

 Not NEON
 4T     1571    1627     979    2238    2255    2258

 Android NEON-MFLOPS2-MP Benchmark V2.1 07-Feb-2015 18.38

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1796    1520    1025    1231    1228    1227
 2T     3354    2959    1047    2427    2445    2445
 4T     4627    5508     978    4690    4791    4733
 8T     3861    6307    1030    4611    4869    4742

          Total Elapsed Time   88.3 seconds


 #################### A1 ARM-Intel ######################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.17

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     2151    1962    1064    2619    2694    2650
 2T     4421    3849    1048    5296    5463    5343
 4T     5886    6652     982    9592   10735   10362
 8T     3744    7284    1018    9085   10791    9493

          Total Elapsed Time   13.8 seconds

 ############### A1 ARM-Intel 1000 MHz #################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 16.04

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1939    1266     674    2503    2388    2351
 2T     3670    2652     679    4919    4792    4640
 4T     3102    3051     676    4688    4678    4672
 8T     3189    3425     657    4813    4869    4639

          Total Elapsed Time   19.4 seconds


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
             Dual core, Measured 1.7 GHz

 Android NEON-MFLOPS-MP Benchmark V1.1 13-Sep-2013 13.44

         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1847    1415     597    3772    4096    3545
 2T     3649    3309     664    8065    7966    7505
 4T     3670    3922     658    7753    8148    7490
 8T     5664    5570     681    8092    8355    7672
 
        Total Elapsed Time   13.0 seconds

 Not NEON
 2T     1593    1668     648    3140    3067    2977

 #################### T11 ARM-Intel ####################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.07

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1965    1630     582    3792    4077    3521
 2T     3789    2690     663    8497    8133    7297
 4T     5714    4883     654    8364    8192    7554
 8T     5414    6316     673    7976    8437    6635

          Total Elapsed Time   13.0 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
     Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s

 Android NEON-MFLOPS2-MP Benchmark V2.1 25-Jul-2015 18.44

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     2757    2576     771    2808    2825    2800
 2T     5662    5525    1516    5631    5664    5570
 4T     6550    7846    1945   11167   11281   10939
 8T    10273   10928    1981   10851   11211   11350

          Total Elapsed Time   40.0 seconds

 Not NEON
 4T     2338    2959    1836    4867    4911    4859

 #################### T21 ARM-Intel #################### 

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 28-Jun-2015 16.32

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     3049    2857     622    2923    2874    2098
 2T     5508    4887    1009    5477    5736    4349
 4T     5643    5282    1410   11244   11601    8564
 8T     9294   11156    1681   11288   11605    8946

          Total Elapsed Time   14.0 seconds


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.35
           Compiled for 32 bit ARM v7a

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      619     613     575    1444    1446    1426
 2T     1174    1206     889    2894    2902    2839
 4T     1585    1616     901    5679    5726    5596
 8T     2075    2130     944    5400    5585    5519

          Total Elapsed Time   25.8 seconds

 ###################### T22 64 Bit ######################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.38
           Compiled for 64 bit ARM v8a

      FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      726     745     647    2766    2774    2639
 2T     1397    1402     903    5523    5552    5371
 4T     1871    1930     898   10780   10479   10439
 8T     2496    2876    1011    9736   10679    9900

          Total Elapsed Time   15.1 seconds


##################### P33 64 Bit ##################### 

 P33 Quad-core 2 GHz Qualcomm Snapdragon 810, Android 5.0.2 

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 16-Sep-2015 17.59
           Compiled for 64 bit ARM v8a

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     2811    3126    1089    6943    6589    6342
 2T     2488    4114    1541   12084   10559    8809
 4T     4759    5480    2038   16516   14826   11960
 8T     4840    8985    2452   22082   23563   12461

          Total Elapsed Time    7.6 seconds


 ##################### T7 Original ######################

  T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
           Quad core,  Measured 1200 MHz

 Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 16.57

   FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      532     402     124    1135    1044     960
 2T     1255     798     213    2041    1987    1916
 4T     2441    1553     229    4185    4034    3450
 8T     1922    2403     226    3774    3996    3346

          Total Elapsed Time    4.5 seconds

 Not NEON
 4T      716     655     233    2367    2316    2240

 #################### T7 ARM-Intel #####################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.24

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      657     407     132    1077    1074    1053
 2T     1265     817     222    2147    2150    2078
 4T     2024    1695     234    4214    4276    3555
 8T     2435    2495     234    4196    4100    3523

          Total Elapsed Time   39.0 seconds

 ##################### New Results #####################

       Results x 100000, 12345 indicates ERRORS

       ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1
 1T    44934   86735   99850   36770   79897   99759
 2T    44934   86735   99850   36770   79897   99759
 4T    44934   86735   99850   36770   79897   99759
 8T    44934   86735   99850   36770   79897   99759
 
 T11   44934   12345   99850   36770   79897   99759
 
       Android NEON-MFLOPS-MP Benchmark V1.1
 1T    86735   98519   99984   79897   97638   99975
 2T    86735   98519   99984   79897   97638   99975
 4T    86735   98519   99984   79897   97638   99975
 8T    86735   98519   99984   79897   97638   99975
 
       Android NEON-MFLOPS2-MP Benchmark V2.1 
 1T    40015   66980   99522   35216   54898   99234
 2T    40015   66980   99522   35216   54898   99234
 4T    40015   66980   99522   35216   54898   99234
 8T    40015   66980   99522   35216   54898   99234
  

To Start


NEON-Linpack-MP Benchmark - NEON-Linpacki-MP.apk

This is a multithreading version of NEON-Linpack Benchmark. Further details and results can be found in android neon benchmarks.htm. The benchmark is run on 100x100, 500x500 and 1000x1000 matrices using 0, 1, 2 and 4 separate threads, the programming code for zero theads being the same as the earlier example. Multithreading performance, using this standard linear equation solver, is severely degraded, due to overheads, the zero thread results being the only ones of real use.

Performance, using native Intel compilation, is shown to be twice as fast, except at N = 1000, which is mainly dependent on calculations from data in RAM. Speed from ARM can also be somewhat faster (or slower). T21, with the Qualcomm Snapdragon 800, obtains significantly fastest results, at unthreaded N = 500.

The program checks that the same numeric results are produced, irrespective of the number of threads used, at each matrix size. Then, due to rounding effects, these are slightly different from ARM and Intel hardware, as shown below.

August 2015 - T22 results from 32 bit and 64 bit compilations were again similar, due to the programs use a limited number of identical intrinsic functions.


    MFLOPS 0 to 4 Threads, N 100, 500, 1000

 #################### A1 Original #######################

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Threads      None        1        2        4

 N  100     452.39    21.00    23.48    17.48
 N  500     663.38   275.56    88.66   312.71
 N 1000     617.04   380.60   191.26   195.61

 #################### A1 ARM-Intel ######################

 ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 13.58

 Threads      None        1        2        4

 N  100     971.71    37.72    36.36    39.66
 N  500    1311.37   488.73   487.85   488.98
 N 1000     945.97   727.85   737.95   742.34

       Total Elapsed Time   59.966 seconds


 #################### T11 Original #####################

 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
                Measured 1.7 GHz

  Threads      None        1        2        4

  N  100    1399.82    54.86    55.31    54.66
  N  500    1154.21   434.16   434.06   436.97
  N 1000     571.26   482.57   487.25   485.80

 #################### T11 ARM-Intel ####################

 ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.44

  Threads      None        1        2        4

  N  100    1497.90    61.13    63.13    61.87
  N  500    1399.10   491.49   489.29   494.69
  N 1000     586.14   499.00   504.97   497.49

       Total Elapsed Time   43.952 seconds


 #################### T21 Original #####################

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
     Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s

 Android Linpack NEON SP MP Benchmark 26-Jul-2015 11.46

  Threads      None        1        2        4

  N  100    1311.08    12.38    12.93    15.05
  N  500    2271.56   344.04   419.52   381.73
  N 1000     837.30   540.99   523.52   564.87

      Total Elapsed Time  143.534 seconds

 #################### T21 ARM-Intel #################### 

 ARM/Intel Linpack NEON SP MP Benchmark 26-Jul-2015 11.51

  Threads      None        1        2        4

  N  100    1308.07    14.89    11.77    11.63
  N  500    2341.17   407.96   481.02   415.12
  N 1000     901.21   551.80   566.77   564.31

       Total Elapsed Time  145.750 seconds


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

  ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.52
            Compiled for 32 bit ARM v7a

  Threads      None        1        2        4

  N  100     460.74    22.35    23.16    23.82
  N  500     480.63   336.52   339.94   303.66
  N 1000     470.02   405.86   403.01   405.98

 ###################### T22 64 Bit ######################

 ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.57
           Compiled for 64 bit ARM v8a

  Threads      None        1        2        4

  N  100     548.67    27.70    33.93    37.00
  N  500     470.04   285.95   297.79   301.67
  N 1000     519.02   441.84   443.47   441.91


 ##################### T7 Original ######################

  T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
            Measured 1200 MHz

 Threads      None        1        2        4

  N  100     413.47    45.95    48.22    48.34
  N  500     253.08   187.51   189.69   189.94
  N 1000     148.76   135.49   136.08   136.17

#################### T7 ARM-Intel #####################

 ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.40

  Threads      None        1        2        4

  N  100     385.49    28.79    29.06    29.25
  N  500     272.07   184.85   183.70   183.18
  N 1000     147.09   131.92   132.44   130.05

       Total Elapsed Time   64.318 seconds


################### Numeric Results ###################

 NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1

 N              100             500            1000

 ARM
 NR            1.60            3.96           11.32
 RE  3.80277634e-05  4.72068787e-04  2.70068645e-03
 MA  1.19209290e-07  1.19209290e-07  1.19209290e-07
 X0 -1.38282776e-05  5.26905060e-05  1.62243843e-04
 XN -7.51018524e-06  3.26633453e-05 -6.65783882e-05

 Intel
 NR            1.68            3.96           11.39
 RE  4.00543213e-05  4.72545624e-04  2.71725655e-03
 MA  1.19209290e-07  1.19209290e-07  1.19209290e-07
 X0 -1.38282776e-05  5.26905060e-05  1.62243843e-04
 XN -7.51018524e-06  3.26633453e-05 -6.65783882e-05
   

To Start

FFT Benchmarks - fft1.apk, fft3c.apk

The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results are displayed and saved in a log file (FFT-tests.txt), with FFT running time in milliseconds. Besides Android, the bechmarks are available to run via Windows and Linux. Two versions are available FFT1, original version and with optimised C code as FFT3c. Further details, results, and links for benchmarks and source code are in FFTBenchmarks.htm. Below is an example of results.


    Kindle Fire HDX 7, 2.2 GHz  Quad Core Qualcomm Snapdragon 800

       ARM/Intel FFT Benchmark 3c.0 08-Sep-2015 23.15
             Compiled for 32 bit ARM v7a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.155     0.352     1.341     0.087     0.073     0.073 
    2     0.812     0.814     0.750     0.201     0.187     0.251 
    4     1.751     1.658     1.776     0.414     0.405     0.443 
    8     3.712     1.083     1.065     0.930     0.899     0.890 
   16     2.880     3.356     2.430     2.579     2.658     2.380 
   32     6.124     6.541     5.605     5.907     6.070     5.681 
   64    13.430    12.566    12.774    13.792    13.556    13.997 
  128    30.737    27.408    27.132    33.318    33.088    33.071 
  256    64.472    63.394    64.690    73.288    72.546    72.786 
  512   153.609   150.383   156.046   155.788   156.304   163.178 
 1024   315.283   306.323   307.409   369.426   337.074   336.684 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    6.5 seconds
   

To Start

System Details




 A1      Asus MemoPad 7 ME176CEX, 1.86 GHz Atom Intel Atom Z3745 
         Screen pixels w x h 800 x 1216
         Android Build Version      4.4.2
         Processor : ARMv7 processor rev 1 (v7l)
         BogoMIPS : 1500.0
         Features : neon vfp swp half thumb fastmult edsp vfpv3
         CPU implementer : 0x69
         CPU architecture: 7
         CPU variant : 0x1
         CPU part : 0x001
         CPU revision : 1
         Hardware : placeholder
         Revision : 0001
         Linux version 3.10.20
         Mainly runs at 1.86 GHz Turbo Boost

 T7      Device Google Nexus 7 quad core CPU 1.3, GHz 1.2 GHz > 1 core
         RAM 1 GB DDR3L-1333 Bandwidth 5.3 GB/sec
         Screen pixels w x h 1280 x 736 MHz 
         Twelve-core Nvidia GeForce ULP graphics 416 MHz
         Android Build Version      4.1.2
         Processor : ARMv7 Processor rev 9 (v7l)
         processor : 0  BogoMIPS : 1993.93
         processor : 1  BogoMIPS : 1993.93
         processor : 2  BogoMIPS : 1993.93
         processor : 3  BogoMIPS : 1993.93
         Features  : swp half thumb fastmult vfp edsp neon vfpv3 tls 
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant     : 0x2
         CPU part        : 0xc09             - Cortex-A9
         CPU revision    : 9
         Hardware        : grouper           - nVidia Tegra 3 T30L
         Revision        : 0000
         Linux version    3.1.10
         Runs at 1.2 GHz

 T11     Voyo A15, Samsung EXYNOS 5250 Dual core 2.0 GHz Cortex-A15, 
         Mali-T604 GPU, 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/s
         Screen pixels w x h 1920 x 1032 
         Android Build Version      4.2.2  - Jelly Bean
         Processor       : ARMv7 Processor rev 4 (v7l)
         processor       : 0
         BogoMIPS        : 992.87
         processor       : 1
         BogoMIPS        : 997.78
         Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4
                           idiva idivt 
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant     : 0x0
         CPU part        : 0xc0f
         CPU revision    : 4
         Hardware        : SMDK5250
         Linux version 3.4.35Ut
         Runs at 1.7 GHz

 T21     Kindle Fire HDX 7, 2.2 GHz  Quad Core Qualcomm Snapdragon 800 (Krait 400) 
         2 x 32 Bit LPDDR3-1866 Memory, 14.9 GB/s, GPU Qualcomm Adreno 330, 578 MHz
         Device Amazon KFTHWI
         Screen pixels w x h 1200 x 1803 
         Android Build Version      4.4.3
         Processor       : ARMv7 Processor rev 0 (v7l)
         processor       :  0, 1, 2, 3
         BogoMIPS        : 38.40
         Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
         CPU implementer : 0x51
         CPU architecture: 7
         CPU variant     : 0x2
         CPU part        : 0x06f
         CPU revision    : 0
         Hardware        : Qualcomm MSM8974
         Revision        : 0000
         Linux version 3.4.0-perf (gcc version 4.7)  

 T22     Lenovo Tab 2 A8-50, 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 
         1 GB LPDDR3, GPU Mali T720  MP2
         Device LENOVO Lenovo TAB 2 A8-50F
         Screen pixels w x h 800 x 1216
         Android Build Version      5.0.2
         Processor : AArch64 Processor rev 3 (aarch64)
         processor : 0, 1, 2
         BogoMIPS  : 26.0
         Features : fp asimd aes pmull sha1 sha2 crc32
         CPU implementer : 0x41
         CPU architecture: AArch64
         CPU variant : 0x0
         CPU part : 0xd03
         CPU revision : 3
         Hardware : MT8161
         Linux version 3.10.65 

 P33     Sony Xperia Z3+ E6533, Quad-core 1.5 GHz & Quad-core 2 GHz Qualcomm
         Snapdragon 810 64-bit CPU
         Screen pixels w x h 1080 x 1776
         Android Build Version      5.0.2
         Processor : AArch64 Processor rev 1 (aarch64)
         processor : 0 to 7
         Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
         CPU implementer : 0x41
         CPU architecture: 8
         CPU variant : 0x1
         CPU part : 0xd07
         CPU revision : 1
         Hardware : Qualcomm Technologies, Inc MSM8994
         Linux version 3.?10.?49

 BS1     BlueStacks Emulator on 3 GHz Phenom via Windows 7
         Screen pixels w x h 1024 x 600
         Android Build Version      2.3.4

 BS2     BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
         Screen pixels w x h 1440 x 852
         Android Build Version      4.4.2 
   

To Start


Roy Longbottom at Linkedin  Roy Longbottom January 2016



The Official Internet Home for my Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection