Logo

Roy Longbottom's Android Benchmark Apps


Logo

Contents

General Java Whetstone Benchmark Java Benchmark Operation
Java Benchmark Results Java Numeric Results Native Whetstone Benchmark
Native Whetstone Results Linpack Benchmark Linpack Results
Dhrystone 2 Benchmark Dhrystone 2 Results Livermore Loops Benchmark
Livermore Loops Results MemSpeed Benchmark BusSpeed Benchmark
RandMem Benchmark
Systems Used


General

Roy Longbottom’s PC Benchmark Collection comprises numerous FREE benchmarks and reliability testing programs, for processors, caches, memory, buses, disks, flash drives, graphics, local area networks and Internet. Original ones run under DOS and later ones under all varieties of Windows. Most have also been converted to run under Linux. Android is the natural progression from the latter, but we will have to wait to see what is possible.

Initial development with Java was via Eclipse Integrated Development Environment for Java from here and using Android Software Development Kit from here. In this case, programs were developed on a PC via 64-Bit Windows 7. The development environment provides a range Android version emulators, testable on what are displayed as real phones and tablets.

With Java programs being compiled at run time, they might not run very fast, compared with a pre-compiled version, using high optimisation levels. The latter for Android can be generated from C/C++ code using Native Development Kit, downloadable from here. To use this via Windows, a Linux-like environment, provided by Cygwin is required. In my case, this would not install properly via 64-Bit Windows 7. So, NDK benchmarks were developed on a PC that runs Ubuntu Linux.

Source code and other required files, for all benchmarks, are available in www.roylongbottom.org.uk/Android Benchmarks.zip.

To Start


Logo Java Whetstone Benchmark

The first of these is a Java version of the Whetstone Benchmark. The original, written in Fortran, was the first general purpose benchmark that set industry standards of computer performance. It was released in 1972, based on research by Brian Wichmann, and produced by Harold Curnow. Later updates became my responsibility. The three of us were UK Government employees. Speed was measured in terms of Million Whetstone Instructions Per Second (MWIPS). Later, in order to identify compiler over-optimisation, speeds of individual tests were shown as MOPS or MFLOPS - Millions of Operations or Floating Point Operations Per Second. In this version, test functions are for a minimum of one second, milliseconds for the originally defined pass count being used for MWIPS calculations, as 10,000 / Total milliseconds.

This compilation celebrates the 40th anniversary of the benchmark. The .apk application file can be downloaded from www.roylongbottom.org.uk/Java Whetstone.apk. See also Whetstone Benchmark History and Results for performance of computers from the 1960’s to modern times, and Whetstone Benchmark Results on PCs, these including speeds using Java code via Windows and Linux.

To Start


Java Benchmark Operation

Installation Click on the link for the apk file and download for the icon to appear in a Download list or in a Download folder on an SD card. In Settings, Applications, tick allow installation of non-Market applications. Tap the app icon and buttons to install and run it.

On loading the app, three buttons are provided. Run executes the benchmark and this normally takes between 10 and 20 seconds. The results shown below, up to Total Elapsed Time, are then displayed. The Info button produces three more buttons to provide a summary of the benchmark, a link to this HTML document and another to the History HTML pages. If the benchmark has been run, Email sends the full results shown below to results@roylongbottom.org.uk (assuming that local Email has been set up). Note that the Email system might not display the results with a monospaced font, but the space characters appear to be there, allowing copying/pasting to a document with the right font (like Courier New 10).

Screen pixel dimensions and Android Build Version are obtained using Android Java functions with other information read from files /proc/cpuinfo and /proc/version. On tapping the Email button, an Edit Box is provided for manual input of other details of the system under test (Device details below).


 Android Java Whetstone Benchmark 17-Jan-2012 14.22

 Test        MFLOPS    MOPS   millisecs    Results

 N1 float      3.18             6.040  -1.124750137
 N2 float      3.91            34.340  -1.131330490
 N3 if                 4.29    24.140   1.000000000
 N4 fixpt              9.59    32.840  12.000000000
 N5 cos                0.42   196.100   0.499110103
 N6 float      2.90           186.000   0.999999821
 N7 equal              3.28    56.300   3.000000000
 N8 exp                0.22   171.100   0.751108646

 MWIPS        14.15           706.860

 Total Elapsed Time   14.6 seconds


 System Information

 Screen pixels w x h 600 x 1024 

 Android Build Version      2.2

 Processor : ARM926EJ-S rev 5 (v5l)
 BogoMIPS  : 797.97
 Features    : swp half thumb fastmult edsp java 
 CPU implementer     : 0x41
 CPU architecture: 5TEJ
 CPU variant     : 0x0
 CPU part     : 0x926
 CPU revision       : 5

 Hardware     : WMT
 Revision     : 0000
 Serial           : 0000000000000000

 Linux version 2.6.32.9-default (jodyfu@szmce13) 
 (gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50)
 ) #100 Wed Sep 21 08:25:24 HKT 2011

 Device TTFone M013S 10.1 inch tablet, 300-800 MHz VIA 8650
  


To Start


Java Benchmark Results

The first measurements obtained were via emulators running on a 3 GHz quad core Phenom, the benchmark only using one core, of course. They suggest a slightly slower performance using a screen with a higher pixel density and much better performance with a later Android version and/or a more modern CPU.

Compared with the ARM 926EJ CPU, the v7 has VFPv3 enhanced floating point hardware. P1 and P2 v7 Cortex-A8 processors have performance proportional to CPU MHz. The v7 Cortex-A9 CPU has dual cores but only one will be used. Performance improvements are through higher performance VFPv3 and a new out-of-order speculative issue superscalar execution pipeline.


 System  ARM   MHz Android MWIPS  ---- MFLOPS ----   COS   EXP FIXPT    IF  EQUAL
 See     CPU        Build           1     2     3   ---------- MOPS ------------

 T1    926EJ   800    2.2   14.2   3.2   3.9   2.9   0.4   0.2   9.6   4.3   3.3
 T2    v7-A9   800  2.3.4  224.0  40.8  62.7  35.4  11.2   4.9 139.5  53.1  24.4


 P1    v7-A8   600  2.3.5   83.3  11.3  18.2  13.5   2.9   1.6  55.2  40.6  17.8
 P2    v7-A8  1000    2.2  137.9  15.9  31.9  22.6   4.7   2.6  91.6  68.6  29.7
 P3    v7-A9  1000  2.3.6  286.7  53.7  84.7  46.7  14.5   5.4 183.0  69.7  33.2

 EP1   926EJ  Emul    2.2   12.7   2.6   3.9   2.2   0.4   0.2   8.3   3.1   3.0
 ET1   926EJ  Emul    2.2   11.1   2.3   3.4   2.0   0.3   0.2   7.1   2.7   2.6
 ET2   v7-A8  Emul   4.03   38.8   8.7  11.6  10.0   1.0   0.6  38.3  11.1   6.7
 
       Core2  1000  Linux  802.1 338.3 316.3 191.3  20.3  11.3 708.3 333.3 187.5
       Atom   1000  Linux  372.7 220.9 163.3 105.0   8.0   6.5 240.1  85.8  87.0

                System - T = Tablet, P = Phone, E = Emulator

The last two sets of results are for the same Java code running on Intel CPUs under Linux, adjusting the speeds to represent processors running at 1 GHz. The ARM processors appear to be catching up with the Atom on fixed point operation, and do particularly well on standard and trigonometric functions. The latter have significant impact on the overall MWIPS score (see millisecs in example results).


To Start


Java Numeric Results

Both emulated and real numeric results using ARMv7 are different from ARM926EJ for some floating point calculations. This is not unusual for different compilers or types of processor and is due to variations in instruction sequences or hardware rounding arrangements. It looks as though these two processors are not logically identical or program optimisation procesures are different. ARMv7 P3 has enhanced architecture that probably changes the calculated results of the lasts test. Results from Native Code versions are also provided.


  Test         ARM926EJ-S   P1 ARMv7-A8   P2 ARMv7-A8   P3 ARMv7-A9

  N1 float   -1.124750137  -1.124750137  -1.124750137  -1.124750137
  N2 float   -1.131330490  -1.131330490  -1.131330490  -1.131330490
  N3 if       1.000000000   1.000000000   1.000000000   1.000000000
  N4 fixpt   12.000000000  12.000000000  12.000000000  12.000000000
  N5 cos      0.499110103   0.499110132   0.499110132   0.499110132
  N6 float    0.999999821   0.999999821   0.999999821   0.999999821
  N7 equal    3.000000000   3.000000000   3.000000000   3.000000000
  N8 exp      0.751108646   0.762195110   0.762195110   0.830691695

  Native Code Versions 
                                              Fast FPU
               ARM926EJ-S   T2 ARMv7-A9    T2 ARMv7-A9          

  N5 cos       0.499109834  0.499109805    0.499109805
  N8 exp       0.751108646  0.762195110    0.830691695
  


To Start


Logo Native Whetstone Benchmark

The second Android benchmark uses the same Java front end code as #1, producing identical output format, but headed “Android Native Whetstone Benchmark”, with the C/C++ program using Java Native Interface and saved in a jni folder (see zip file).

The C code does not produce any out put (in this case?), except returning results to the Java program in a string. The code is compiled into a library, through an Android.mk file, in my case via a Terminal command ~/workspace/NativeWhetstone/jni$ ~/Eclipse/android-ndk-r7/ndk-build. The Java program includes a function to load the library.

Unexpectedly, the first compilation produced slower performance than a Java version - See T2 above and T2 @5 below. The solution was a new Application.mk file in the jni folder. This has a single directive (APP_ABI := armeabi armeabi-v7a) to build two libraries, one for older ARM5 CPUs and one for ARM7 with vfpv3 high performance floating point units, these being automatically selected at run time. As can be seen for T2 @7 below, a remarkable performance improvement was produced.

The .apk application file can be downloaded from www.roylongbottom.org.uk/NativeWhetstone.apk. See also Whetstone Benchmark History and Results for performance of computers from the 1960’s to modern times, and Whetstone Benchmark Results on PCs, these including speeds via Windows and Linux.

To Start


Native Whetstone Benchmark Results

For these tests, the emulator was run using one CPU of a 2.4 GHz Core 2 Duo. Again, this showed a performance gain emulating v7-A, with a further improvement using instructions for a vfpv3 FPU. Unlike tablet T2, results on tablet T1, with the 926EJ CPU, show that the first compilation produces the same performance as the second, with the faster FPU options included in the .apk execution file.

Note: The programming code for the fixed point, if and equal tests produces identical results irrespective of the number of passes. A modern optimising compiler can opt to only run one pass and produce an indication of unachievable performance. The functions have been tweaked to, at least, execute some instructions in each pass. These tests use little time (see log above) with negligible effect on the overall rating but can be compared with other systems running the same .apk app. Do not compare these non-standard results with those from other compilations.


 System   ARM  MHz Android MWIPS  ------MFLOPS-------   COS   EXP  FIXPT     IF  EQUAL
 See      CPU       Build           1      2      3     ------------MOPS--------------

 T1  @5 926EJ  800    2.2   31.2   10.2   10.2   11.4   0.6   0.3   38.8  278.4  219.4
 T1  @7 926EJ  800    2.2   30.3   10.2    9.3   11.5   0.6   0.3   39.0  293.5  220.1
 T2  @5 v7-A9  800  2.3.4  170.9   20.4   21.4   28.4   7.6   2.2   85.5  756.0  764.3
 T2  @7 v7-A9  800  2.3.4  687.4  165.4  149.9  153.4  15.9   9.3  723.1 1082.1  725.3

 EP1 @5 926EJ  Emul   2.2   20.1    7.0    6.7    9.3   0.4   0.2   30.9  218.6   98.5
 ET2 @5 v7-A8  Emul  4.03   43.7    7.2    7.0    9.3   1.1   0.6   30.8  225.1  100.9
 ET2 @7 v7-A8  Emul  4.03   96.7   37.0   32.1   36.1   1.6   1.3  121.9  238.4  216.4

       Atom    800  Linux  184.9  107.1   60.0   27.9   6.3   3.5  100.4   78.8   45.6
       Core 2  800  Linux  496.3  179.3  177.7   98.0  16.0   7.4  266.3  238.3   94.0

          System - T = Tablet, P = Phone, E = Emulator,  @7 for vfpv3 FPU

Results are also shown for Linux C compilations on two Intel processors, adjusted to assume clock speeds of 800 MHz. These show that ARM CPUs with the fast FPU, running at the same clock speed as Intel processors, can produce equal or better performance.

To Start

Logo Linpack Benchmark Logo

The Linpack Benchmark was produced from the "LINPACK" package of linear algebra routines. It became the primary benchmark for scientific applications, particularly under Unix, from the mid 1980's, with a slant towards supercomputer performance. The original double precision C version, used here, operates on 100x100 matrices. Performance is governed by an inner loop in function daxpy() with a linked triad dy[i] = dy[i] + da * dx[i], and is measured in Millions of Floating Point Operations Per Second (MFLOPS). This version uses a Java front end, again providing Run, Info and Email buttons, with the main C code compiled by Android Native Development Kit. Two varieties are available. Linpackv5.apk (LP5), using old, slow instructions, and Linpackv7.apk (LPK) that will use faster vfpv3 hardware, if available.

The .apk application files can be downloaded from www.roylongbottom.org.uk/Linpackv5.apk and www.roylongbottom.org.uk/Linpackv7.apk. Further details of the Linpack benchmark, and results from Windows and Linux based PCs, can be found in Linpack Results.htm.

Output results provide the same System Information as shown for the Whetstone Benchmark, preceded by MFLOPS speed and numeric results, examples being shown below. In this case, calculations from both versions produce the same numeric results. These are also identical to those from Microsoft Visual C under Windows and Linux using 64-Bit GCC on PCs, with other compilers used producing differences.


 Android Linpack v7 Benchmark         Android Linpack v5 Benchmark

 Speed              101.39 MFLOPS     Speed               10.56 MFLOPS

 norm. resid                 1.7      norm. resid                 1.7
 resid            7.41628980e-14      resid            7.41628980e-14
 machep           2.22044605e-16      machep           2.22044605e-16
 x[0]-1          -1.49880108e-14      x[0]-1          -1.49880108e-14
 x[n-1]-1        -1.89848137e-14      x[n-1]-1        -1.89848137e-14


To Start


Linpack Benchmark Results

Results below again show that the compilation using the vfpv3 FPU library produces much faster speed on the tablet with the Cortex-A9 processor and the alternative library is used with the older CPU. Results scaled to represent Intel processor speeds, at 800 MHz, are also shown. This time, the Cortex-A9 MFLOPS are similar to those for the Atom CPU, but the latter is really twice as fast, as it runs at 1600 MHz.

Results from this benchmark are significantly faster than the apparently popular Linpack For Android That produced 30 MFLOPS on the xTAB-70 tablet, compared with 101 MFLOPS with my benchmark. The main difference appears to be that this one uses pre-compiled C code and the other is Java based, reflecting earlier Whetstone Benchmark comparisons.


 System   ARM    MHz   Android    Linpackv5    Linpackv7 
 See                                MFLOPS       MFLOPS 

  T1    926EJ    800       2.2        5.63         5.67
  T2    v7-A9    800     2.3.4       10.56       101.39

  EP1   926EJ   Emul       2.2        4.27         4.54   
  ET2   v7-A8   Emul      4.03        4.39        12.24

         Atom    800     Linux                    94.11
         Atom    800   Windows                    87.98      
        Core 2   800     Linux                   429.33 
        Core 2   800   Windows                   438.43 

        System - T = Tablet, P = Phone, E = Emulator

Java Linpack is also available to run via Windows and Linux browsers. On tunning this the 800 MHz ratings for the Atom and Core 2 CPUs are 55 and 377 MFLOPS.

To Start

Logo Dhrystone 2 Benchmark Logo

The Dhrystone "C" benchmark provides a measure of integer performance (no floating point instructions). It became the key standard benchmark from 1984, with the growth of Unix systems. The first version was produced by Reinhold P. Weicker in ADA and translated to "C" by Rick Richardson. Two versions are available - Dhrystone versions 1.1 and 2.1. The second version, used here, was produced to avoid over-optimisation problems encountered with version 1, but some is still possible. Because of this, optimised and non-optimised compilations are provided. Speed was originally measured in Dhrystones per second. This was later changed to VAX MIPS by dividing Dhrystones per second by 1757, the DEC VAX 11/780 result, the latter being regarded as the first 1 MIPS minicomputer.

The optimised .apk app file (DS2) can be downloaded from www.roylongbottom.org.uk/Dhrystone2.apk and the non-optimised one (DSN) from www.roylongbottom.org.uk/Dhry2Nopt.apk. Further details of the Dhrystone benchmark, and results from Windows and Linux based PCs, can be found in Dhrystone Results.htm.

The same format Java front end, described above, is used, with the two C programs compiled using Android NDK. Examples of results is below, the Emailed version including the standard System Information.


 Dhrystone 2 Benchmark 10-Feb-2012 19.08   Dhry2 NoOpt Benchmark 14-Feb-2012 12.15

 Nanoseconds one Dhrystone run       592   Nanoseconds one Dhrystone run      1244
 Dhrystones per Second           1689546   Dhrystones per Second            804020
 VAX MIPS rating                     962   VAX MIPS rating                     458


To Start


Dhrystone 2 Benchmark Results

Unlike when using floating point, on this benchmark, the Cortex-A9 CPU is less than three times faster than the 926EJ on all measurements, a ratio similar to that provided by the BogoMIPS results, shown in System Information. Measurements for Intel Atom and Core 2 CPUs are also provided for Windows (Watcom 32 Bit) and Linux (GCC 32 Bit and 64 Bit) compilations. Relative to CPU MHz, the A9 performance is similar to the Atom 32 bit compilations, the latter being faster at optimised 64 bits, probably due to more registers being available. Core 2 results show considerable variations, highlighting the Danger, in comparing results from different compilers. The optimised benchmark produces 1.2 Vax MIPS/MHz for the Cortex-A9. ARM, themselves, quote 2.5 Vax MIPS (DMIPS) per MHz for the same processor, probably just another different compiler variation.


                                     Opt      No Opt
 System   ARM    MHz   Android       Vax       Vax      Bogo 
 See                                MIPS      MIPS      MIPS

  T1    926EJ    800       2.2       356       196       798
  T2    v7-A9    800     2.3.4       962       458      2036

  EP1   926EJ   Emul       2.2       227       122        
  ET2   v7-A8   Emul      4.03       286       160

 32 Bit  Atom   1666     Linux      2055      1194
 64 Bit  Atom   1666     Linux      2704      1098
 32 Bit  Atom   1666     Windows    1948       780           
 32 Bit Core 2  2400     Linux      5852      3348
 64 Bit Core 2  2400     Linux     12265      3288
 32 Bit Core 2  2400     Windows    6466      1251
 
        System - T = Tablet, P = Phone, E = Emulator


To Start


Logo Livermore Loops Benchmark

This original main benchmark for supercomputers was first introduced in 1970, initially comprising 14 kernels of numerical application, written in Fortran. This was increased to 24 kernels in the 1980s. Performance measurements are in terms of Millions of Floating Point Operations Per Second or MFLOPS. The kernels are executed three times with different double precision data array sizes. Following are overall MFLOPS results for Cray 1, geometric mean being the official average performance.

             ---------------- MFLOPS ---------------       
CPU     MHz  Maximum Average Geomean Harmean Minimum       
Cray 1   80    82.1    22.2    11.9     6.5     1.0        

The benchmark execution file can be downloaded from www.roylongbottom.org.uk/LivermoreLoops.apk. Further details of the Livermore Loops benchmark, and results from Windows and Linux based PCs, can be found in Livermore Loops Results.htm.

The same format Java front end, described above, is used, with the C program compiled using Android NDK. An example of results is below, the Emailed version including the standard System Information.


            800 MHz ARM Cortex-A9

 Android Livermore Loops Benchmark 12-Feb-2012 21.55

  MFLOPS for 24 loops Do Span 471
   172.6   127.5   253.2   248.6    71.6   141.2
   197.6   190.4   202.3   109.2    55.2    51.2
    54.1    51.5   100.0   144.1   192.1   139.4
   130.1   105.4   111.2    63.1   136.3    56.8

 Overall Weighted MFLOPS Do Spans 471, 90, 19
 Maximum Average Geomean Harmean Minimum
   253.2   129.3   115.3   101.6    46.7

 Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

 Total Elapsed Time  11.9 seconds

So far, results of the last two calculations have been identical on all benchmark runs.

To Start


Livermore Loops Benchmark Results

System T2, with the high speed vfpv3 hardware, is again shown to be around 20 times faster than the tablet T1, on these floating point calculations. This time, T2 performance is quite respectable, compared with an Intel Atom, running at twice the clock speed.


 Sys See  ARM   MHz Android  Run Time           MFLOPS for 24 loops Do Span 471

   T1    926EJ  800    2.2    97.3 secs    5.6     6.4     6.2     6.1     4.6     4.9
                                           5.9     6.1     6.0     9.0     5.8     3.9
  Max   Average Geomean Harmean   Min      4.0     3.6     3.8     5.6     7.6     4.5
  9.9     5.6     5.4     5.2     2.4      5.7     4.3     5.2     2.5     5.7     7.4

   T2    v7-A9  800  2.3.4    11.9 secs  172.6   127.5   253.2   248.6    71.6   141.2
                                         197.6   190.4   202.3   109.2    55.2    51.2
  Max   Average Geomean Harmean   Min     54.1    51.5   100.0   144.1   192.1   139.4 
 253.2   129.3   115.3   101.6    46.7   130.1   105.4   111.2    63.1   136.3    56.8

  ET2    v7-A8  Emul  4.03   124.8 secs    5.0     4.8     5.1     4.8     4.3     4.9
                                           4.7     4.4     4.6     7.1     5.0     3.0
  Max   Average Geomean Harmean   Min      3.5     3.6     3.2     3.8     5.5     3.4
  7.2     4.3     4.2     4.1     2.3      4.7     3.2     5.1     3.5     4.3     5.4
 
  Atom 1666 MHz Linux                      Core 2 2400 MHz Linux
  Max   Average Geomean Harmean   Min      Max   Average Geomean Harmean   Min
 465.2   212.2   185.1   157.4    49.7   2384.9  1038.1   805.8   582.1   161.0


To Start


Logo MemSpeed Benchmark

This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing double precision MB/second by 8 and 16, for the two tests, and single precision speeds by 4 and 8. Assembly listings for integer tests show that Millions of Instructions Per Second can be found by multiplying MB/second by 0.78 with 2 adds and 0.66 for the other test. Cache sizes are indicated by varying performance as memory usage changes. Download the app from www.roylongbottom.org.uk/MemSpeed.apk.

Emailed results include System Information as provided above. Results follow. System T2 appears to have 32 KB L1 cache and 128 KB L2 cache and maximum MFLOPS of 130 DP and 133 SP with Integer MIPS up to 1227. The results for T1 are 7.5 and 11 MFLOPS and 468 MIPS, with only a 16 KB L1 cache.

The program code used is the same as Linux Multithreading Benchmarks.htm and (nearly) MemSpd2k Results.htm. Results on an Intel Atom, for a single thread, using the multithreading benchmark, are shown below. On a per MHz basis, the Cortex-A9 performs well using L1 cache but (DDR2) RAM speeds are particularly slow.


 System T2, ARM Cortex-A9  800 MHz, Android  2.3.4

  Android MemSpeed Benchmark 17-Feb-2012 17.41

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16   1002    533   1574   1742    812   1639
      32   1042    530   1533   1717    701   1751
      64    994    461    984   1144    644    942
     128    656    396    691    696    511    673
     256    269    259    273    271    255    280
     512    249    246    244    256    244    247
    1024    249    249    244    240    253    244
    4096    246    244    247    246    242    245
   16384    253    236    252    254    241    246
   65536    254    241    253    250    252    241

          Total Elapsed Time   19.4 seconds


 System T1, ARM 926EJ  800 MHz, Android 2.2  
 
  Android MemSpeed Benchmark 17-Feb-2012 17.47

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int

      16     60     44    600     93     76    694
      32     46     38    146     60     56    146
      64     48     37    154     66     54    144
     128     48     36    155     65     53    144
     256     48     36    153     65     56    135
     512     48     38    153     65     57    142
    1024     47     37    153     65     57    142
    4096     47     37    152     67     55    142
   16384     47     37    152     70     63    138
   65536     44     37    153    106     92    142

          Total Elapsed Time   93.5 seconds


          Atom 1666 MHz, DDR2 RAM 533 MHz, Linux

      16   1892    943   1979   2759   1329   2813
      64   1647    879   1690   2334   1269   2323
   65535   1515    834   1517   2010   1208   1945


To Start


Logo BusSpeed Benchmark

This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments. Potential maximum speed can be estimated by multiplying this minimum value by 8. With this burst rate, measured speed at 32 word and 16 word increments are likely to be the same. Cache sizes are indicated by varying speed as memory use changes. Note, with smallest L1 cache demands, measured speed can be low due to overheads when reading little data.

The program C source code is as used for Linux, See BusSpd2K Results.htm. This has unrolled loops with up to 64 AND statements (& array[i ] to & array[i+63]). The Linux compiler for Intel CPUs translates this into 64 assembly instructions ANDing data from indexed memory locations #####. In this case, Integer MIPS approximately equals MB/second divided by 4 (See Atom results below at 16 KB Read All). The Android NDK compiler generates 64 ANDs, 64 loads and 64+ adds/moves and this reduces comparative performance #####. The results for memory data transfers also indicate that the Cortex-A9 CPU can be slower than the older ARM processor.

#### Details of RISC micro-instructions might tell a different tale.

This benchmark application can be downloaded from www.roylongbottom.org.uk/BusSpeed.apk.


  System T2, ARM Cortex-A9  800 MHz, Android  2.3.4

   Android BusSpeed Benchmark 19-Feb-2012 14.00

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16   1748   1347   2154   2331   2331   2285
      32   1038   1446   1474   1678   1735   1899
      64    407    490    508    592    489    826
     128    180    213    183    258    266    530
     256     47     42     57     83     79    132
     512     41     39     47     73     68    137
    1024     39     38     52     70     57    135
    4096     38     26     60     69     67    115
   16384     39     32     59     71     59    135
   65536     34     33     59     67     63    123

          Total Elapsed Time    6.9 seconds


             Typical variation in results

      16    403    421    503   2316   2331   2285
      32   1344   1446   1428   1658   1750   1943


  System T1, ARM 926EJ  800 MHz, Android 2.2  
 
   Android BusSpeed Benchmark 19-Feb-2012 13.47

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All

      16     96     95    199    407    426    467
      32     35     34     34     68    124    201
      64     29     29     30     58    108    174
     128     30     30     29     57    108    182
     256     29     30     30     56    107    169
     512     28     29     29     57    106    181
    1024     28     29     29     55     99    176
    4096     28     29     29     57    106    177
   16384     28     28     29     53    103    181
   65536     28     29     29     56    106    179

          Total Elapsed Time    6.3 seconds


          Atom 1666 MHz, DDR2 RAM 533 MHz, Linux

      16   5024   5502   6040   6312   6382   6412
      64    493    404    786   1485   2588   3941
   65536    136    262    521   1036   2008   3295 


To Start


Logo Randmem Benchmark

RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches.

The benchmark uses the first four tests described in RandMem Results.htm and can be downloaded from www.roylongbottom.org.uk/RandMem.apk. The program structure is as follows, with array xi indexing via sequential or random numbers stored in the array.

        Read -       toti = toti & xi[xi[i+0]] | xi[xi[i+2]
                                 & xi[xi[i+4]] | xi[xi[i+4]]  and &|  to i+30
        Read/write - xi[xi[i+2]] = xi[xi[i+0]]; 
                     xi[xi[i+6]] = xi[xi[i+4]];   repeated to i+30 and i+28 

The results below show that random access performance is approximately the same as BusSpeed with address increments of 32 words, the burst reading effect. This program is again based on indexed memory addressing where the older technology CPU can be faster than than the Cortex-A9. This might be due to poor implementation of the memory bus interface on this tablet, as noted on PC tests. Atom results are provided, again showing better relative performance, particularly when using data from RAM. As with BusSpeed, and not noticed so far on the other benchmarks, measured speeds using L1 cache are sometimes slow to start with.


  System T2, ARM Cortex-A9  800 MHz, Android  2.3.4
  
  Android RandMem Benchmark 20-Feb-2012 16.45

    MBytes/Second transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16     1777     1879     1669     1809
       32     1359     1394     1185     1505
       64      799      861      621      755
      128      394      202      295      333
      256      147      146       92      104
      512      133      136       71       42
     1024      125      125       53       62
     4096      129       98       41       53
    16384      128      113       42       45
    65536      121      115       30       32

          Total Elapsed Time   11.7 seconds


  System T1, ARM 926EJ  800 MHz, Android 2.2  

  Android RandMem Benchmark 20-Feb-2012 16.51

    MBytes/Second transferring 4 Byte Words
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt

       16      841     1119      666      955
       32      222      147       83       62
       64      145      169       56       53
      128      198      181       48       57
      256      191      178       44       58
      512      196      180       27       32
     1024      189      180       22       26
     4096      193      181       19       23
    16384      195      177       19       22
    65536      186      166       19       22

          Total Elapsed Time   40.5 seconds


       Atom 1666 MHz, DDR2 RAM 533 MHz, Linux

       16     3976     5132     4100     5134
       64     3086     3215     1042     1349
    65536     2708     1290       49       74


To Start




Systems Used


 T1      Device TTFone M013S 10.1 inch tablet, 300-800 MHz VIA 8650
         Screen pixels w x h 600 x 1024
         Android Build Version      2.2
         Processor        : ARM926EJ-S rev 5 (v5l)
         BogoMIPS        : 797.97
         Features        : swp half thumb fastmult edsp java 
         CPU part        : 0x926
         Linux version 2.6.32.9

 T2      Device WayTeq xTAB-70 7 inch tablet, 800 MHz Cortex-A9
         Screen pixels w x h 600 x 800 
         Android Build Version      2.3.4
         Processor     : ARMv7 Processor rev 1 (v7l)
         BogoMIPS     : 2035.71
         Features : swp half thumb fastmult vfp edsp neon vfpv3 
         CPU part   : 0xc09                    - Cortex-A9
         Linux version 2.6.34

 P1      Device Motorola Milestone 1 CyanogenMod 7 ROM overclocked
         Screen pixels w x h 854 x 480
         Android Build Version      2.3.5
         Processor : ARMv7 Processor rev 3 (v7l)
         BogoMIPS : 598.90
         Features : swp half thumb fastmult vfp edsp neon vfpv3 
         CPU part : 0xc08                       - Cortex-A8
         Linux version 2.6.32.9

 P2      Device Samsung Galaxy s
         Screen pixels w x h 480 x 800
         Android Build Version      2.2
         Processor : ARMv7 Processor rev 2 (v7l)
         BogoMIPS : 996.00
         Features : swp half thumb fastmult vfp edsp neon vfpv3
         CPU part : 0xc08                       - Cortex-A8
         Linux version 2.6.32.9

 P3      Device Motorola Milestone 3 (XT860)
         Screen pixels w x h 960 x 540
         Android Build Version      2.3.6
         Processor : ARMv7 Processor rev 2 (v7l)
         processor : 0                          - CPU 1 of 2
         BogoMIPS : 598.90                      - too low?
         Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3
         CPU part : 0xc09                       - Cortex-A9         
         Linux version 2.6.35.7

 EP1     Device Emulator 3 GHz Phenom
                      or 2.4 GHz Core 2
         Screen pixels w x h 240 x 320
         Android Build Version      2.2
         Processor  : ARM926EJ-S rev 5 (v5l)
         BogoMIPS    : 522.64
         Linux version 2.6.29

 ET1     Device Emulator 3 GHz Phenom
         Screen pixels w x h 600 x 1024
         Android Build Version      2.2
         Processor : ARM926EJ-S rev 5 (v5l)
         BogoMIPS  : 530.84
         Linux version 2.6.29

 ET2     Device Emulator 3 GHz Phenom
                      or 2.4 GHz Core 2
         Screen pixels w x h 600 x 1024
         Android Build Version      4.0.3
         Processor       : ARMv7 Processor rev 0 (v7l)
         BogoMIPS : 527.56
         Linux version 2.6.29
  
To Start




Roy Longbottom February 2012

The Official Internet Home for my Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection