Contents
Download Benchmark Apps
A Settings, Security option may need changing to allow installation of non-Market applications
|
NativeWhetstone2.apk
First standard benchmark
|
|
|
|
Dhrystone2i.apk
First integer benchmark
|
|
|
LinpackDP2.apk
First comptutational benchmark
|
|
|
|
LinpackSP2.apk
Single precision Linpack
|
|
|
LivermoreLoops2.apk
First supercomputer benchmark
|
|
|
|
MemSpeedi.apk
Floating Point Cache and RAM Test
|
|
|
BusSpeedv7i.apk
Integer Bus, Cache and RAM Test
|
|
|
|
RandMemi.apk
Random/Serial Access Cache and RAM Test
|
|
|
MP-MFLOPSi.apk
CPU, Cache, RAM MFLOPS Test
|
|
|
|
MP-MFLOPS2i.apk
Long Running MP-MFLOPS
|
|
|
MP-WHETSi.apk
Whetstone Floating and Fixed Point Tests
|
|
|
|
MP-Dhryi.apk
Dhrystone Integer Benchmark
|
|
|
MP-BusSpdi.apk
Multithreaded BusSpeed
Benchmark
|
|
|
|
MP-RndMemi.apk
Multithreaded RandMem
Benchmark
|
|
|
NEON-Linpacki.apk
Linpack Benchmark using ARM
NEON Intrinsic Functions
|
|
|
|
NeonSpeedi.apk
NEON Memory Speed Test
Using Intrinsic Functions
|
|
|
NEON-MFLOPS2i-MP.apk
MP-MFLOPS using ARM
NEON Intrinsic Functions
|
|
|
|
NEON-Linpacki-MP.apk
Linpack MP Benchmark nsing
NEON Intrinsic Functions
|
|
|
MP-BusSpd2i.apk
Long running vesion
with staggered start
|
|
|
|
fft1.apk
Original FFT Benchmark
|
|
|
fft3c.apk
Optimised FFT Benchmark
|
|
|
|
|
All the above were produced using gcc 4.8, via Eclipse, running under Linux Ubuntu 14.04
General
Intel Atom processors are appearing in a number of Android devices. When running existing ARM apps that are compiled to produce native code, rather than via Java, Android, for these devices, has a compatibility layer, called Houdini, that maps ARM instructions into X86 instructions. This is known to produce poor performance, with questions on battery drain.
My existing Android benchmarks
were produced on Linux Ubuntu based PCs, using Eclipse. Many use a Java front end, with C/C++ code compiled using a Java Native Interface. These projects can be downloaded from
Android Benchmarks.zip,
Android Graphics Benchmarks.zip,
Android NEON Benchmarks.zip,
and
Android MP Benchmarks.zip.
The JNI directory contains the C/C++ code and an Application.mk file that tells the compiler which platform to produce machine code for. The mk file, for original benchmarks, had parameters APP_ABI := armeabi-v7a, for ARM V7 CPUs, or = armeabi armeabi-v7a, to include earlier technology, the appropriate one being selected at run time.
I was surprised to find that gcc 4.8 provided parameters to produce native Intel code, and others. Those currently available are arm64-v8a, armeabi, armeabi-v7a, mips, mips64, x86 and x86-64. I use APP_ABI := all, to at least run the programs via ARM and Intel CPUs. Although the Atom is a 64 bit CPU, the currently installed Android 4.4 will not run x86-64 compilations.
Eclipse projects for the new compilations are in
Android Intel-ARM Benchmarks.zip
Initial comparisons provided are for tablets with Intel Atom, ARM Cortex-A9 and ARM Cortex-A15 CPUs, plus via BlueStacks Emulator running under Windows 7, on a 3.0 GHz Phenom, and Windows 8 on a 3.7 GHz Core i7. The results are for the original ARM only compilations and the latest with ARM and Intel native instructions.
These benchmarks should also run on 64 bit CPUs with 64 bit versions of Android. Some slight changes are being included in the programs to identify which section of the software is being used. They are being run on a Lenovo Tab 2 A8-50, 8 Inch Tablet, with a 1.3 GHz MediaTek mt8161 quad core processor (64 bit ARM Cortex-A53) and Android 5.0.2. Further details are in
Android 64 Bit Benchmarks.htm
and results are included below.
To Start
Logged Configuration
All the benchmarks were run on an Asus MeMO Pad 7 ME176CX that has a quad core Intel Atom Z3745, rated as 1.33 GHz but mainly running at the Turbo Boost Speed of 1.86 GHz. All benchmarks have an option save results via Email, and this includes details of system used. Following are example details provided for this Asus MeMo Pad 7.
Similar details of other Android deices are in
Android Benchmarks.htm. Those provided later are a brief summary.
Intel CPU Code
Device Asus K013
Screen pixels w x h 800 x 1216
Android Build Version 4.4.2
d : 0, siblings : 4, core id : 3, cpu cores : 4, apicid : 6, initial apicid : 6
fdiv_bug : no, f00f_bug : no, coma_bug : no, fpu : yes, fpu_exception : yes
cpuid level : 11, wp : yes
flags : fpu vme + numerous others including up to SSE4
bogomips : 2666.77
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 55
model name : Intel(R) Atom(TM) CPU Z3745 @ 1.33GHz
stepping : 8
microcode : 0x81b
cpu MHz : 1862.000
cache size : 1024 KB
physical i
Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP
PREEMPT Tue Sep 16 10:49:37 CST 2014
With ARM CPU Code
Screen pixels w x h 800 x 1216
Android Build Version 4.4.2
Processor : ARMv7 processor rev 1 (v7l)
BogoMIPS : 1500.0
Features : neon vfp swp half thumb fastmult edsp vfpv3
CPU implementer : 0x69
CPU architecture: 7
CPU variant : 0x1
CPU part : 0x001
CPU revision : 1
Hardware : placeholder
Revision : 0001
Serial : 0000000000000001
Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP
PREEMPT Tue Sep 16 10:49:37 CST 2014
|
To Start
Whetstone Benchmark - NativeWhetstone2.apk
This provides an overall rating in MWIPS, plus separate results for the eight test procedures in MFLOPS (floating point) and MOPS (functions and integer). For full details and results via Windows. Linux, Android and via different programming languages, see
Whetstone Benchmark Results on PCs.
Native Intel code produced average performance gains of 1.93 times using Atom A1. The original version was slow running on the Phenom based BlueStacks Android emulator, not the case with the later BlueStacks version, running on the 3.7 GHz Core i7, with both being much faster on the newer benchmark, apparently running native Intel instructions, rather than conversion to ARM. With the later ARM code, MWIPS was much lower on the Cortex CPUs, entirely due to the slow EXP functions test.
July 2015 - ARM/Intel version speeds are similar to the original on ARM CPUs reported here, except the COS tests on T7 and T11 which produces significant impact on the overall MWIPS rating.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Results at 32 and 64 bits were not that different.
System ARM MHz Android MWIPS ------MFLOPS------- ------------MOPS--------------
See CPU Build 1 2 3 COS EXP FIXPT IF EQUAL
Original ARM Version
A1 Z3745 1866 4.4.2 1075.4 373.8 311.5 284.5 21.9 14.2 1421.1 1839.2 797.0
T7 v7-A9 1200 4.1.2 1115.0 271.3 250.7 256.4 25.8 14.6 1190.0 1797.0 1198.7
T22 v8-A53 1300 5.0.2 1433.7 348.0 319.3 308.2 36.3 19.8 1551.4 1861.9 611.0
T11 v7-A15 1700 4.2.2 1477.7 363.9 220.6 307.5 39.7 18.0 1690.5 2527.9 1127.9
T21 QU-800 2150 4.4.3 2035.1 665.7 640.0 531.6 45.2 23.1 3535.2 3180.4 2120.0
BS1 Emul Phen 3000 2.3.4 103.6 36.9 32.6 37.7 1.8 1.4 130.2 414.0 374.1
BS2 Emul i7 3700 4.4.2 844.5 428.6 351.8 343.6 14.6 10.9 1909.1 533.5 478.8
ARM/Intel 32 Bit Version
A1 Z3745 1866 4.4.2 1888.4 665.8 504.4 492.0 35.7 27.5 3191.4 3585.8 2146.7
T7 v7-A9 1200 4.1.2 731.1 273.6 253.0 252.8 28.0 5.0 1185.2 2383.4 1192.1
T11 v7-A15 1700 4.2.2 907.4 363.3 327.1 303.1 33.6 6.3 1506.9 2476.5 1122.6
T21 QU-800 2150 4.4.3 1973.8 679.6 648.4 525.6 44.7 21.9 3516.7 3147.2 1567.7
T22 v8-A53 1300 5.0.2 834.7 348.9 312.7 310.9 36.7 5.4 1556.7 1867.2 570.5
BS1 Emul Phen 3000 2.3.4 2992.3 897.2 707.4 623.6 76.3 37.8 3705.9 4423.1 2281.5
BS2 Emul i7 3700 4.4.2 5086.9 1066.7 1120.0 963.2 166.4 56.4 6300.0 11436.5 3786.9
ARM/Intel 64 Bit Version
T22 v8-A53 1300 5.0.2 1494.2 347.1 307.0 305.9 37.5 20.6 1552.2 1863.7 1239.1
|
To Start
Dhrystone Benchmark - Dhrystone2i.apk
The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). Further details of the Dhrystone benchmark, and results from Windows and Linux based PCs, can be found in
Dhrystone Results.htm.
The ratio MIPS/MHz is often quoted, but this depends on compiler optimisation (or over-optimisation)
The new version, with native Intel code, produces a 33% gain in performance, with BlueStacks Emulator 9.2 times faster. Arm Cortex speeds are somewhat slower.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation produced a significant improvement.
System ARM MHz Android Vax MIPS
See MIPS /MHz
Original ARM Version
A1 Z3745 1866 4.4.2 1840 0.99
T7 v7-A9 1200 4.1.2 1610 1.34
T22 v8-A53 1300 5.0.2 1683 1.29
T11 v7-A15 1700 4.2.2 3189 1.88
T21 QU-800 2150 4.4.3 3854 1.79
BS1 Emul Phen 3000 2.3.4 484 0.16
BS2 Emul i7 3700 4.4.2 746 0.20
ARM/Intel 32 Bit Version
A1 Z3745 1866 4.4.2 2451 1.31
T7 v7-A9 1200 4.1.2 1317 1.10
T22 v8-A53 1300 5.0.2 1423 1.09
T11 v7-A15 1700 4.2.2 2551 1.50
T21 QU-800 2150 4.4.3 3319 1.54
BS1 Emul Phen 3000 2.3.4 4464 1.49
BS2 Emul i7 3700 4.4.2 8841 2.39
ARM/Intel 64 Bit Version
T22 v8-A53 1300 5.0.2 2569 1.98
|
To Start
Linpack Benchmark - LinpackDP2.apk, LinpackSP2.apk
The Linpack benchmark speed is measured in MFLOPS, officially for double precision floating point calculations. A version was produced using NEON functions, that only provides single precision operation. So, for comparison purposes, an available C code option, to define single precision data, was used to produce a new version and this has usually lead to a higher MFLOPS speed.
Results from various hardware and software platforms can be found in
Linpack Results.htm.
Performance of the Linpack benchmark is almost entirely dependent on the calculation x[i]=x[i]+c*y[i]. Later ARM processors include vfpv4 instructions that execute fused multiply-accumulate instructions, possibly doubling performance. Compilation of these seems to have appeared in compiler gcc 4.8. Tablet T11 has vfpv4 but T7 does not - See System Details. The result is that the T11 DP benchmark runs much faster on the recompiled code (same with T21). The Intel Native code compilation, running on A1, was more than twice as fast as the original, produced by gcc 4.4. Some of the gain is due to using the new compiler, with conversion to ARM instructions, and others due to native Intel code.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation increased speed by almost 2 times with double precision calculations and 2.7 times at single precision.
September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with SP speed of 1277 MFLOPS at 64 bits.
BlueStacks is particularly fast running with the native Intel version.
System ARM MHz Android LinpackDP LinpackSP
See MFLOPS MFLOPS
Original ARM Version
A1 Z3745 1866 4.4.2 168.16 296.63
T7 v7-A9 1200 4.1.2 151.05 201.30
T22 v8-A53 1300 5.0.2 156.70 184.09
T11 v7-A15 1700 4.2.2 459.17 803.04
T21 QU-800 2150 4.4.3 389.52 751.95
BS1 Emul Ph 3000 2.3.4 16.61 26.53
BS2 Emul i7 3700 4.4.2 138.85 227.42
GCC 4.8 ARM Version
A1 Z3745 1866 4.4.2 282.29
ARM/Intel 32 Bit Version
A1 Z3745 1866 4.4.2 362.63 408.87
T7 v7-A9 1200 4.1.2 159.34 199.84
T22 v8-A53 1300 5.0.2 172.28 180.64
T11 v7-A15 1700 4.2.2 826.36 952.88
T21 QU-800 2150 4.4.3 629.92 790.83
BS1 Emul Ph 3000 2.3.4 1808.57 1474.70
BS2 Emul i7 3700 4.4.2 3390.95 1886.36
ARM/Intel 64 Bit Version
T22 v8-A53 1300 5.0.2 340.18 482.43
P33 QU-810 2000 5.0.2 1277.76
|
To Start
Livermore Loops Benchmark - LivermoreLoops2.apk
The Livermore Loops comprise 24 kernels of numerical application with speeds calculated in MFLOPS. A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. As for other of these benchmarks, details and results are provided, in this case, in
Livermore Loops Results.htm.
This time, the new compiler produces some slower results on Tablet T11, with the Atom, running native code, being faster on average, and 2.56 times faster than via that ARM conversion Houdini layer. T21 MFLOPS can also be different.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Here, 64 bit/32 bit geometric mean performance ratio is 1.5.
System ARM MHz Android
See Max Average Geomean Harmean Min
Original ARM Version
A1 Z3745 1866 4.4.2 535.8 201.9 172.4 146.7 48.8
T7 v7-A9 1200 4.1.2 391.9 202.1 181.3 160.9 68.1
T11 v7-A15 1700 4.2.2 1252.8 476.0 375.8 288.8 90.8
T21 QU-800 2150 4.4.3 1075.5 437.1 356.7 284.4 100.3
BS2 Emul i7 3700 4.4.2 321.7 134.4 118.1 101.8 29.3
ARM/Intel 32 Bit Version
A1 Z3745 1866 4.4.2 1031.2 480.0 429.8 378.6 154.7
T22 v8-A53 1300 5.0.2 393.4 188.3 158.3 124.6 27.1
T7 v7-A9 1200 4.1.2 396.6 207.6 175.6 136.1 26.8
T11 v7-A15 1700 4.2.2 1411.4 471.2 342.1 219.5 34.3
T21 QU-800 2150 4.4.3 1159.4 446.9 356.0 280.3 112.3
BS2 Emul i7 3700 4.4.2 5422.6 2232.1 1784.4 1372.7 350.5
ARM/Intel 64 Bit Version
T22 v8-A53 1300 5.0.2 772.2 265.9 232.5 206.3 97.8
|
To Start
MemSpeed Benchmark - MemSpeedi.apk
This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing double precision MB/second by 8 and 16, for the two tests, and single precision speeds by 4 and 8. Assembly listings for integer tests show that Millions of Instructions Per Second (MIPS) can be found by multiplying MB/second by 0.78 with 2 adds and 0.66 for the other test. Cache sizes are indicated by varying performance as memory usage changes. For more details and further results see
MemSpeed in Android Benchmarks.htm.
The native ARM/Intel results, on Intel Atom based A1, averaged 44% faster via L1 cache data, 27% using L2 and 14% from RAM. Result on tablets T7. T11 and T21 showed some gains and some losses. The Intel native code is particularly demonstrated by results using the BlueStacks App Player, running on an Intel Core i7 based PC.
August 2015 - Results provided for 64 bit T22. The 64 bit compilation was nearly twice as fast as the 32 bit version with double precision floating point calculations, using cached data, and provided a 33% increase from RAM. Corresponding single precision ratios were 2.6 and 2.0 times and integer ratios of 2.2 and 1.5.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android MemSpeed Benchmark 1.1 01-Feb-2015 10.06
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 2773 1745 2821 5993 3274 3094 L1
32 3088 1690 2451 4849 2769 2896
64 3066 1694 2245 3883 2434 2568 L2
128 3084 1695 2261 3886 2466 2524
256 3158 1732 2285 3964 2264 2176
512 2666 1721 2295 3959 2505 2561
1024 2938 1659 2163 3567 2356 2443
4096 2775 1653 2123 3055 2307 2395 RAM
16384 2827 1659 2121 3208 2321 2411
65536 2840 1661 2112 3248 2314 2406
Total Elapsed Time 10.8 seconds
#################### A1 ARM-Intel ######################
ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 11.46
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 3287 1859 4560 9789 4688 7316
32 3233 1856 3807 6633 3990 4030
64 3304 1860 2965 4457 2996 3894
128 3303 1855 3006 4463 3113 3992
256 3306 1860 2978 4463 3093 3946
512 3307 1862 2964 4377 3097 3958
1024 3031 1778 2766 3993 2867 3472
4096 2863 1776 2692 3129 2763 3046
16384 2857 1776 2702 3063 2768 3050
65536 2865 1765 2702 3176 2782 3087
Total Elapsed Time 10.1 seconds
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2000 MHz Cortex-A15, Android 4.2.2
Measured 1700 MHz
Android MemSpeed Benchmark 1.1 09-Aug-2013 17.04
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 7296 4159 3513 9375 5453 6211 L1
32 7253 4540 3882 7364 4873 4839
64 6902 4265 3878 7026 4373 4274 L2
128 6735 4032 2480 4005 2797 3288
256 5859 3775 2192 4527 3263 3676
512 5795 3781 3568 6282 3819 3818
1024 2609 1757 1754 2607 1805 1825
4096 1614 1422 1471 1654 1342 1441 RAM
16384 1624 1412 1474 1642 1336 1443
65536 1617 1408 1479 1368 1321 1423
Total Elapsed Time 10.7 seconds
#################### T11 ARM-Intel ####################
ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 12.26
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 6540 4359 4580 10119 6292 6502
32 8185 5132 4682 8729 4622 4465
64 5770 3530 3473 5780 3447 3782
128 5311 3386 3475 5225 3441 3451
256 5667 3642 3678 5805 3643 3726
512 5047 3318 3334 4869 3303 3337
1024 2015 1469 1423 2050 1452 1386
4096 1535 1322 1342 1598 1381 1385
16384 1505 1379 1406 1584 1387 1384
65536 1509 1306 1332 1585 1387 1382
Total Elapsed Time 10.8 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Android MemSpeed Benchmark 1.1 02-Jun-2015 11.01
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 8922 4635 3566 12412 5648 3774 L1
32 5116 3542 2773 7594 4827 3657 L2
64 5174 3393 2684 5652 3757 3130
128 5286 3387 2648 5443 3758 3194
256 4937 3446 2889 7469 4624 3449
512 4941 3459 2915 7452 4566 3724
1024 4837 3449 2848 7065 4455 3722
4096 2840 2606 2343 2581 2458 2567 RAM
16384 2606 2423 2232 2395 2238 2338
65536 2653 2453 2257 2457 2312 2420
Total Elapsed Time 9.7 seconds
Maximum SP MFLOPS 1159 Integer MIPS 2802
#################### T21 ARM-Intel ####################
ARM/Intel MemSpeed Benchmark 1.1 02-Jun-2015 11.27
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 8074 4831 2603 11252 5065 3892 L1
32 5302 4138 3709 7252 4985 3693 L2
64 4801 3510 2832 5739 3684 3015
128 4502 3783 3577 5991 3914 3547
256 4907 3913 3934 6876 4280 4056
512 4686 3883 3921 6236 4215 4060
1024 4716 3808 3823 6131 4185 3942
4096 2691 2603 2679 2249 2634 2709 RAM
16384 2227 2223 2420 1798 2191 2445
65536 2099 2106 2306 1738 2040 2346
Total Elapsed Time 9.9 seconds
Maximum SP MFLOPS 1207 Integer MIPS 2898
###################### T22 32 Bit ######################
ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.16
Compiled for 32 bit ARM v7a
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 1940 971 1693 2470 1278 2084 L1
32 1879 955 1676 2378 1255 1967
64 1801 938 1615 2254 1218 1912 L2
128 1706 941 1620 2279 1224 1872
256 1818 935 1570 2291 1155 1875
512 1633 884 1451 2008 1132 1704
1024 1276 781 1181 1454 938 1324 RAM
4096 1335 808 1260 1533 1010 1386
16384 1342 813 1270 1487 1013 1419
65536 1346 809 1274 1546 1031 1252
Total Elapsed Time 11.7 seconds
###################### T22 64 Bit ######################
ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.29
Compiled for 64 bit ARM v8a
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 4092 2198 3951 5293 3611 4408
32 3753 2496 3630 4651 3300 3992
64 3407 2388 3368 3715 3023 3677
128 3496 2462 3521 4137 3139 3844
256 3535 2481 3573 4199 3322 3911
512 3054 2248 3126 3556 2548 3372
1024 1714 1704 2029 2069 1854 2099
4096 1832 1595 1841 1914 1780 1897
16384 1844 1601 1850 1925 1798 1891
65536 1859 1608 1837 1921 1795 1812
Total Elapsed Time 10.2 seconds
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 1 GB DDR3 RAM
Measured 1200 MHz
Android MemSpeed Benchmark 17-Oct-2012 20.19
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 1735 888 2456 2726 1364 2818 L1
32 1448 760 1474 1700 1039 1648
64 1318 719 1290 1468 952 1385 L2
128 1279 715 1289 1443 944 1336
256 1268 714 1279 1435 943 1313
512 1158 691 1204 1321 892 1228
1024 729 553 735 772 632 742
4096 445 392 425 442 421 439 RAM
16384 435 390 428 435 412 431
65536 445 404 393 450 432 449
Total Elapsed Time 12.2 seconds
#################### T7 ARM-Intel #####################
ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.24
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 1856 1019 2537 2913 1459 2544
32 1416 832 1327 1508 920 1345
64 1286 779 1198 1418 908 1296
128 1282 781 1195 1424 912 1305
256 1278 774 1190 1433 878 1298
512 1197 752 1122 1340 862 1216
1024 833 626 822 903 695 857
4096 463 420 456 463 440 459
16384 459 426 453 455 435 458
65536 463 430 411 462 443 452
Total Elapsed Time 11.5 seconds
#################### BS2 Original ######################
BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
Android MemSpeed Benchmark 1.1 25-Apr-2015 12.58
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 1523 1777 731 1406 1939 1163
32 1306 1641 787 1641 1939 1023
64 1524 1230 511 1422 1662 1143
128 1524 1707 787 1641 1641 948
256 1456 1670 853 1525 1708 1094
512 1527 1642 853 1642 1779 948
1024 1528 1646 853 1646 1713 1094
4096 1535 1809 853 1809 1945 1194
16384 1638 1638 819 1774 1872 1170
65536 1404 1747 819 1747 1820 1156
Total Elapsed Time 12.5 seconds
#################### BS2 ARM-Intel #####################
ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.47
Reading Speed in MBytes/Second
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
KBytes Dble Sngl Int Dble Sngl Int
16 35555 9309 14065 30476 19393 19394
32 30476 19394 14222 35555 18518 17066
64 26666 16623 17778 30476 18286 16410
128 26667 17778 17778 29092 18286 19051
256 25098 16675 16327 27354 19395 18825
512 25100 13063 12190 26666 19395 17793
1024 24631 17589 16415 24623 16415 16415
4096 24638 17783 16644 24638 17093 17783
16384 14745 12639 11000 14000 13611 12834
65536 14043 11359 12336 15490 10649 10649
Total Elapsed Time 12.6 seconds
|
To Start
BusSpeed Benchmark - BusSpeedv7i.apk
This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum speed can be estimated by multiplying this minimum value by 16. With this burst rate, measured speed at 32 word and 16 word increments are likely to be the same. Cache sizes are indicated by varying speed as memory use changes. Note, with smallest L1 cache demands, measured speed can be low due to overheads when reading little data. For more details and further results see
BusSpeed in Android Benchmarks.htm.
The native code ARM/Intel version provided no real performance improvement on tablet A1, with the Atom Z3745 CPU. In ARM mode, there was also little difference on Tablets T21, T11 and T7. The main reason for these similarities is the long sequence of identical C arithmetic statements is easy to convert for efficient processing. BlueStacks speed on the Intel CPU were again outstanding.
August 2015 - Results provided for 64 bit T22. Reading all data, 64/32 bit comparison ratios were up to 2.0 from L1 cache, 1.5 from L2 cache and 1.25 from RAM.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android BusSpeed Benchmark 1.1 v7 21-Dec-2014 16.06
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 4178 3473 6270 6713 6759 6869 L1
32 1420 1529 2252 2686 3702 5108
64 1385 1498 2276 2629 3657 5108 L2
128 1394 1542 2278 2614 3640 5092
256 1410 1576 2258 2607 3259 5110
512 1417 1574 2274 2602 3700 5119
1024 349 428 888 1431 2848 4306 RAM
4096 215 265 593 1181 2289 3891
16384 210 266 596 1181 2278 3897
65536 220 272 600 1193 2346 3886
Total Elapsed Time 5.1 seconds
#################### A1 ARM-Intel ######################
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 4845 5705 6403 6926 7094 7167 L1
32 1407 1716 2255 2646 3713 5094
64 1395 1703 2257 2689 3754 4843 L2
128 1283 1571 2108 2620 3671 5135
256 1416 1753 2288 2679 3687 5178
512 1439 1372 2251 2510 3679 5183
1024 350 409 942 1696 2792 4403
4096 213 253 564 1188 2173 3631 RAM
16384 219 259 600 1189 2330 3920
65536 218 259 599 1102 2323 3716
Total Elapsed Time 5.1 seconds
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
2 GB DDR3-1600 RAM, dual channel, 12.8 GB/sec
Android BusSpeed Benchmark 1.1 v7 09-Aug-2013 17.07
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 3193 3451 4412 5272 5389 6191 L1
32 1298 1558 1990 3478 4264 4420
64 804 928 1209 2442 3263 3426 L2
128 784 904 1175 2321 3148 3333
256 780 908 1181 2336 3142 3327
512 788 907 1165 2312 3120 3300
1024 360 387 384 803 1348 1744
4096 145 146 194 507 648 1378 RAM
16384 141 136 190 507 638 1373
65536 142 141 191 506 643 1371
Total Elapsed Time 5.3 seconds
#################### T11 ARM-Intel ####################
ARM/Intel BusSpeed Benchmark 1.1 v7 23-Apr-2015 12.15
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 2085 3208 4055 4553 5272 5758
32 1282 1811 2498 4182 4867 5163
64 600 864 1309 2974 3504 3841
128 614 892 1310 3027 3500 3826
256 614 892 1337 3050 3509 3828
512 618 888 1319 3042 3382 3811
1024 425 479 444 1244 1803 2291
4096 146 146 191 590 1050 1751
16384 141 139 186 585 1039 1725
65536 139 139 187 585 1039 1721
Total Elapsed Time 5.3 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Android BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 1382 1350 3122 4300 4938 5283 L1
32 1106 1118 2026 2637 3786 5210 L2
64 1064 1118 2058 2679 3820 5251
128 1123 1170 2081 2688 3669 4166
256 1121 1196 2109 2623 3873 3429
512 940 1127 2050 2684 3777 4795
1024 951 1124 2038 2655 3759 4950
4096 239 375 472 806 1486 2679 RAM
16384 239 370 464 806 1476 2656
65536 239 368 495 854 1537 2792
Total Elapsed Time 5.0 seconds
#################### T21 ARM-Intel ####################
ARM/Intel BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 1328 1442 2797 4291 4699 5685 L1
32 1165 1100 1933 2848 3603 5844 L2
64 1147 1055 2007 2846 3586 5890
128 1181 1136 2008 2711 3600 5878
256 1185 1126 2018 2716 3568 5873
512 1022 1026 1805 2525 3378 5611
1024 796 843 1584 2202 3088 5053
4096 199 294 431 657 1166 2409 RAM
16384 200 299 430 659 1167 2408
65536 205 301 436 668 1173 2380
Total Elapsed Time 5.2 seconds
###################### T22 32 Bit ######################
T22, ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 10.57
Compiled for 32 bit ARM v7a
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 874 932 1814 2302 2355 2263 L1
32 758 803 1309 1820 2323 2386
64 653 671 1203 1741 2206 2332 L2
128 603 620 1107 1693 2222 2351
256 574 589 1075 1711 2211 2327
512 332 372 681 1075 1863 2120
1024 137 193 371 578 1322 2129 RAM
4096 172 179 351 567 1151 2126
16384 172 178 351 504 1117 2136
65536 172 177 349 478 882 2129
Total Elapsed Time 5.3 seconds
###################### T22 64 Bit ######################
T22, ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 11.02
Compiled for 64 bit ARM v8a
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 3188 3635 3937 4327 4372 4462
32 1478 1607 2246 3382 3853 4144
64 600 622 1163 2011 2972 3585
128 558 575 1056 1889 2892 3525
256 538 550 1028 1826 2837 3260
512 371 425 813 1490 2403 3202
1024 136 196 382 728 1423 2750
4096 170 177 346 669 1340 2652
16384 169 174 341 678 1352 2663
65536 168 174 341 676 1347 2611
Total Elapsed Time 5.2 seconds
##################### T7 Original ######################
T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, 1 GB DDR3 RAM
Android BusSpeed Benchmark 19-Oct-2012 17.29
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 2723 2420 3044 3364 3499 3500 L1
32 1054 1087 1061 1382 1565 2145
64 436 433 419 652 751 1160 L2
128 345 337 337 542 633 943
256 329 309 322 522 614 961
512 339 299 311 506 574 937
1024 170 168 180 269 349 629
4096 59 55 84 127 176 338 RAM
16384 56 56 83 125 173 335
65536 56 56 82 125 174 334
Total Elapsed Time 5.7 seconds
#################### T7 ARM-Intel #####################
ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.30
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 2940 3344 3625 3866 3862 3893
32 698 707 682 1071 1208 1826
64 448 477 465 726 851 1357
128 367 355 292 542 657 1070
256 334 344 341 546 651 1059
512 326 336 336 531 629 1025
1024 169 175 197 309 411 749
4096 58 58 83 131 191 395
16384 56 57 83 129 189 392
65536 56 48 82 129 187 388
Total Elapsed Time 5.6 seconds
#################### BS2 Original ######################
BS 2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
Android BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.57
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 1428 1280 1280 1422 1333 1489
32 1428 1280 1280 1365 1706 1602
64 1066 1481 1600 1463 1463 1707
128 1666 1365 1489 1463 1463 1833
256 1429 1706 1293 1425 1466 1823
512 1333 1463 1603 1425 1468 1565
1024 1280 1463 1710 1468 1565 1730
4096 1282 1367 1475 1730 1310 1617
16384 412 943 958 1258 1398 1677
65536 449 958 1078 1304 1677 1677
Total Elapsed Time 6.8 seconds
#################### BS2 ARM-Intel #####################
ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.49
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
KBytes Words Words Words Words Words All
16 13333 12800 22222 13675 18285 14224
32 10666 10666 12190 21333 21367 21334
64 6666 6666 10666 13333 21333 21337
128 6826 6400 10240 17067 21335 18290
256 4266 5120 8533 13654 18290 20483
512 2667 2667 5335 9103 16386 20515
1024 2560 2560 5692 9105 15608 22806
4096 2673 2752 5470 9175 17126 21880
16384 741 943 2070 4404 8808 14680
65536 542 838 1572 3595 6710 11930
Total Elapsed Time 6.5 seconds
|
To Start
RandMem Benchmark - RandMemi.apk
RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches. For more details and further results see
RandMem in Android Benchmarks.htm.
On A1 Atom based tablet, the native code ARM/Intel version results showed gains of around 25% on all reading tests, but no difference with writing and reading. The same benchmark, running on Tablets T11 and T21, showed some improvement, using cache based data, but a variability in comparative performance on T7.
August 2015 - Results provided for 64 bit T22 showing 32 bit and 64 bit versions were not that different overall, each one slightly faster on some tests.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android RandMem Benchmark 1.1 01-Feb-2015 10.12
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 3434 5064 3462 5113 L1
32 2833 4042 2652 3645
64 2837 4058 2068 2561 L2
128 2822 4041 1809 2205
256 2828 4040 1435 1755
512 2816 3997 1245 1456
1024 2578 3256 379 445
4096 2412 1946 209 268 RAM
16384 2485 2039 179 217
65536 2457 2041 140 170
Total Elapsed Time 11.8 seconds
#################### A1 ARM-Intel ######################
ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 17.27
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 4291 5626 4584 5630
32 3217 3792 3492 3783
64 3677 4253 2629 2644
128 3666 4241 2299 2289
256 3688 3930 1829 1850
512 3682 4189 1522 1592
1024 3285 3558 562 667
4096 2999 2007 272 274
16384 3019 2065 210 220
65536 2989 2068 141 186
Total Elapsed Time 8.8 seconds
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
Android RandMem Benchmark 1.1 13-Aug-2013 17.29
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 2881 2478 3388 3650 L1
32 4301 2968 3197 3249
64 3669 2511 2201 2249 L2
128 3566 2560 1571 1566
256 3557 2461 1334 1256
512 3524 2547 1136 1098
1024 1933 1144 534 513
4096 1993 1064 184 173 RAM
16384 1970 1086 141 144
65536 1973 1117 106 104
Total Elapsed Time 9.1 seconds
#################### T11 ARM-Intel ####################
ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 20.42
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 3642 3102 5464 4114
32 5462 3409 4096 3737
64 4800 2785 2028 2064
128 4308 2575 1572 1589
256 4381 2574 1332 1260
512 4311 2544 1215 1097
1024 2033 1156 513 471
4096 1891 1042 213 178
16384 2028 1032 154 139
65536 2033 1055 109 106
Total Elapsed Time 9.2 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Android RandMem Benchmark 1.1 10-Jun-2015 12.43
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 4407 4704 3995 4900
32 2611 3071 2207 2703
64 2496 2797 1821 2139
128 2080 3173 1668 1758
256 2425 3183 1439 1520
512 2359 3116 1193 1355
1024 2366 3117 368 382
4096 2293 2280 201 209
16384 2293 2237 170 175
65536 2299 2261 146 150
Total Elapsed Time 8.5 seconds
#################### T21 ARM-Intel ####################
ARM/Intel RandMem Benchmark 1.1 10-Jun-2015 12.45
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 5005 4626 4067 4863
32 3253 2994 2246 2622
64 3223 2855 1986 2072
128 2861 3128 1912 1776
256 3246 3174 1666 1523
512 3195 3111 1469 1372
1024 3190 3079 369 383
4096 3027 2381 212 213
16384 3065 2300 174 177
65536 3080 2281 150 150
Total Elapsed Time 8.6 seconds
###################### T22 32 Bit ######################
T22, ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.29
Compiled for 32 bit ARM v7a
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 2807 3606 2753 3595 L1
32 2719 3433 1429 1930
64 2615 3266 914 1166 L2
128 2592 3243 705 828
256 2570 3223 637 720
512 2367 2684 237 347
1024 2137 1855 120 163 RAM
4096 1918 1658 83 97
16384 2152 1665 74 85
65536 2104 1652 72 64
Total Elapsed Time 11.6 seconds
###################### T22 64 Bit ######################
T22, ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.32
Compiled for 64 bit ARM v8a
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 3865 3033 3798 3027
32 3622 2760 3105 2734
64 3094 2803 1011 1077
128 3074 2740 776 801
256 3050 2771 718 693
512 2420 2463 270 371
1024 1322 1853 131 164
4096 1754 1598 87 100
16384 1791 1586 75 91
65536 1856 1609 57 68
Total Elapsed Time 14.6 seconds
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
Android RandMem Benchmark 20-Oct-2012 11.14
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 2788 3041 2795 3041 L1
32 2769 3011 2767 3020
64 1027 1038 839 911 L2
128 916 918 616 649
256 904 905 514 538
512 899 907 475 499
1024 712 699 345 354
4096 323 284 92 88 RAM
16384 316 282 73 70
65536 314 281 65 62
Total Elapsed Time 10.9 seconds
#################### T7 ARM-Intel #####################
ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.33
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 2521 3175 2490 3038
32 1427 1451 1218 1446
64 1133 1052 853 907
128 1039 871 646 650
256 1028 909 543 518
512 1025 895 499 502
1024 700 489 242 236
4096 487 282 90 88
16384 483 281 71 70
65536 478 274 63 62
Total Elapsed Time 11.3 seconds
#################### BS2 Original ######################
BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
Android RandMem Benchmark 1.1 25-Apr-2015 12.59
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 4069 5008 4069 2174
32 4439 5426 4069 1953
64 3974 5682 3552 1860
128 3721 5209 3758 1717
256 4342 5210 3157 1204
512 4167 5342 2845 1141
1024 4350 5208 2606 1000
4096 3475 5709 1938 867
16384 4343 5120 747 400
65536 3657 5818 533 256
Total Elapsed Time 14.2 seconds
#################### BS2 ARM-Intel #####################
ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.50
BlueStacks on 3.9 GHz Core i7
MBytes/Second Transferring 4 Byte Words
Memory Serial....... Random.......
KBytes Read Rd/Wrt Read Rd/Wrt
16 23252 24414 19148 29593
32 25432 27127 25432 24038
64 21552 23674 14533 9301
128 21702 20834 12020 8140
256 22727 19934 9470 6513
512 22321 17362 5953 5686
1024 20840 18945 5691 4815
4096 21053 16693 2291 2291
16384 12308 10057 1067 1018
65536 10667 10338 753 711
Total Elapsed Time 8.3 seconds
|
To Start
MP-MFLOPS Benchmarks - MP-MFLOPSi and MP-MFLOPS2i
The benchmarks are recompilations of those in
www.roylongbottom.org.uk/Android MultiThreading Benchmarks.htm.
The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run.
The numeric results start with values of 1.0, with subsequent calculations reducing the values, the amount depending on the number of calculations.
An example of results for MP-MFLOPSi, from the log file, is provided below. showing identical numeric results, independent of the number of threads used (as it should be). This original version became too fast for later technology, producing inconsistent MFLOPS performance ratios. Versions with longer running versions were produced, to avoid this problem, in this case MP-MFLOPS2i with 50 times more calculations, producing the expected reduction in result values. The numeric results from ARM processors are slightly different, due to rounding effects (see Short and Long below).
Examination of disassembled code, using default compile parameters, showed that Intel SIMD and ARM NEON instructions were not being produced. These could execute such as four linked multiply and add instructions simultaneously, providing MFLOPS speeds of up to eight times CPU MHz, per core. The type of instructions used are shown below, where Intel varieties used only one word out of four in SSE registers (Single Instruction Single Data - SISD), and ARM code employed single word scalar registers. The latter were vector type, using three registers, including such as floating-point multiply-accumulate single precision (fmacs).
The released versions were recompiled, using the compile options shown below, but made no difference to the type of code used. Intel compilations used more registers that produced faster speeds at 32 operations per word. ARM code was virtually identical, producing similar performance.
Intel CPU Short - 5000 Repeat Passes
ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 642 717 658 1053 1026 987
2T 1052 1366 1016 2018 2108 2063
4T 1752 2483 956 3817 3676 3894
8T 1436 2217 992 3213 3428 3289
Results x 100000, 0 indicates ERRORS
1T 86735 98519 99984 79894 97641 99975
2T 86735 98519 99984 79894 97641 99975
4T 86735 98519 99984 79894 97641 99975
8T 86735 98519 99984 79894 97641 99975
Total Elapsed Time 3.6 seconds
Intel CPU Long - 100000 Repeat Passes
1T-8T 40392 76406 99700 35296 66012 99521
######################################################
ARM CPU Short
1T-8T 86735 98519 99984 79897 97638 99975
ARM CPU Long
1T-8T 40392 76406 99700 35218 66014 99520
######################################################
Android.mk LOCAL_CFLAGS
ifeq ($(TARGET_ARCH_ABI),x86)
LOCAL_CFLAGS += -ffast-math -mtune=atom -mssse3 -mfpmath=sse
endif
ifeq ($(TARGET_ARCH_ABI),x86_64)
LOCAL_CFLAGS += -ffast-math -mtune=slm -msse4.2
endif
ifeq ($(TARGET_ARCH_ABI),armeabi-v7a)
LOCAL_ARM_NEON := true
LOCAL_CFLAGS += -mfpu=neon
endif
ifeq ($(TARGET_ARCH_ABI),arm64-v8a)
LOCAL_CFLAGS += -DHAVE_NEON64=1
endif
######################################################
Intel SSE SISD Instructions - not SIMD
mulss 36(%esp), %xmm2 addss %xmm1, %xmm2
ARM Vector Instructions - not NEON
fmuls s15, s15, s10 fmacs s15, s14, s23
|
To Start
MP-MFLOPS Benchmark Results
Below are MFLOPS results, mainly for the longer running versions, including those from
the original ARM compilations.
The first ones are for tablet A1, with the quad core Intel Atom CPU, where results for the the shorter running version are also provided, showing some slower speeds. In this case, performance from the native Intel code was up to nearly twice as fast as the ARM converted test run. In both cases, with 2 operations per word, maximum MP gains were on using L2 cache based data, with RAM speed limitations, but requiring two threads for maximum speed. With 32 operations per word, the quad cores provided performance gains of nearly four times.
Tablet T11 had some slightly slower results on the ARM/Intel variety, with tablet T7 providing little variation. Except for RAM based data, and 2 operations per word, appropriate performance gains were produced in line with the number of cores.
T21, with the Qualcomm Snapdragon 800, produced similar speeds using the old and ARM/Intel versions. Calculation speeds, with 1 and 2 threads, could be slower than T11, Cortex-A15, but RAM speed was much faster. The opposite applied, compared with A1 Atom, using native code.
August 2015 - Results provided for 64 bit T22 showing that, at 32 operations per word, it was just over twice as fast at 64 bits, then up to 3.7 times, at 2 operations per word, with cache based data. The reason is that 64 bit vector SIMD instructions were produced, instead of scalars.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android MP-MFLOPS2 Benchmark V2.1 04-Feb-2015 11.03
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 502 501 476 575 575 573
2T 1012 975 921 1133 1140 1115
4T 1571 1627 979 2238 2255 2258
8T 1550 1890 1007 2235 2239 2217
Total Elapsed Time 117.4 seconds
#################### A1 ARM-Intel ######################
ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 642 717 658 1053 1026 987
2T 1052 1366 1016 2018 2108 2063
4T 1752 2483 956 3817 3676 3894
8T 1436 2217 992 3213 3428 3289
V7 Short Version Total Elapsed Time 3.6 seconds
ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.24
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 695 696 661 1061 1061 1055
2T 1335 1382 1058 2088 2086 2102
4T 1832 2635 979 3993 4125 4145
8T 2026 2557 1007 3842 4044 4110
Total Elapsed Time 65.8 seconds
-- Single Thread MFLOPS No Extra Compile Options --
704 713 675 773 779 774
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Dual Core CPU Measured GHz = 1.7
Android MP-MFLOPS2 Benchmark V2.1 29-Apr-2015 10.22
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 845 817 544 1546 1539 1512
2T 1593 1668 648 3140 3067 2977
4T 1974 1775 645 2963 3093 2845
8T 1935 2059 652 3108 3147 2985
Total Elapsed Time 58.5 seconds
#################### T11 ARM-Intel ####################
ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 20.30
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 695 756 536 1537 1501 1476
2T 1319 1527 645 3151 3077 3000
4T 1604 1567 657 3035 3095 2997
8T 1604 1639 658 3108 3125 2996
Total Elapsed Time 59.1 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Quad Cote 2150 MHz Measured
Android MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 15.35
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 718 781 590 1214 1220 1228
2T 1572 1583 1118 2406 2436 2442
4T 2338 2959 1836 4867 4911 4859
8T 3148 3266 1866 4870 4916 4888
Total Elapsed Time 56.4 seconds
#################### T21 ARM-Intel ####################
ARM/Intel MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 16.50
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 822 768 636 1232 1228 1231
2T 1662 1637 1184 2460 2463 2446
4T 2509 3216 1659 4519 4762 4900
8T 2965 3193 1881 4847 4925 4880
###################### T22 32 Bit ######################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.17
Compiled for 32 bit ARM v7a
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 190 190 184 670 672 664
2T 377 378 370 1343 1345 1329
4T 707 755 725 2657 2669 2621
8T 722 736 714 2640 2672 2631
Total Elapsed Time 113.0 seconds
###################### T22 64 Bit ######################
ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.24
Compiled for 64 bit ARM v8a
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 705 701 636 1398 1394 1362
2T 1376 1395 942 2794 2797 2757
4T 2063 2602 962 5491 5546 5336
8T 2474 2611 957 5367 5500 5417
Total Elapsed Time 51.6 seconds
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Quad Core CPU Measured MGz = 1200
Android MP-MFLOPS2 Benchmark V2.1 05-Feb-2015 11.37
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 182 156 114 598 578 572
2T 365 321 194 1194 1163 1141
4T 716 655 233 2367 2316 2240
8T 717 682 233 2347 2371 2246
Total Elapsed Time 135.5 seconds
#################### T7 ARM-Intel #####################
ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.44
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 188 156 116 598 578 574
2T 365 319 197 1195 1161 1145
4T 682 709 237 2372 2345 2249
8T 678 731 237 2361 2381 2254
Total Elapsed Time 135.0 seconds
|
To Start
MP-Whetstone Benchmark - MP-WHETSi
For more information on Whetstone Benchmark see
stand alone version, above.
The multithreading version runs multiple copies of the same code, with separate variables. In this case, performance of each of the eight test functions and overall MWIPS ratings is invariably (nearly) proportional to the number of CPU cores available.
The driving program checks that calculations on every thread produce consistent numeric results.
The gcc 4.8 based ARM/Intel version, running on the Intel Atom tablet, is rated at twice the speed of the original, due to the use of native code. The fixed point results indicate overoptimisation, but the test uses little of the overall time, this being mainly dependent on the Cos, Exp and third MFLOPS tests.
The new native ARM version, running on tablets T11 and T7, produces a much slower overall MWIPS rating, mainly due to the Exp tests, but also influence by other slower results
(some same as above).
T21 indicates slower floating point calculations.
August 2015 - Results provided for 64 bit T22 showing that, at 64 bits, the Fixpt test was clearly nearly optimised out, but this makes little difference to the overall MWIPS rating, at 2.25 times faster than the 32 bit benchmark.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android MP-Whetstone Benchmark V1.1 04-Feb-2015 11.39
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 953.7 363.0 382.4 267.8 21.0 13.2 413.1 1842.4 392.3
2T 1921.2 726.0 663.5 541.4 42.6 27.0 816.1 3662.6 793.3
4T 3820.6 1419.2 1514.6 1081.5 84.1 54.0 1543.8 6292.4 1588.5
8T 4003.8 1912.9 1872.4 1114.1 86.5 56.4 2053.1 8292.6 1599.7
Overall Seconds 4.88 1T, 4.87 2T, 4.96 4T, 10.05 8T
#################### A1 ARM-Intel ######################
ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 17.35
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 1916.9 691.4 691.3 497.2 35.3 27.6 10209.8 2787.3 1351.8
2T 3800.3 1377.6 1381.2 980.0 70.1 54.7 20248.0 5252.8 2748.7
4T 7604.9 2713.2 2711.8 1977.1 140.2 110.0 33906.3 9526.5 5550.8
8T 7798.1 3141.5 3627.2 2064.2 141.2 110.2 59590.6 12743.7 5711.5
Overall Seconds 4.94 1T, 5.00 2T, 5.06 4T, 10.11 8T
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
Android MP-Whetstone Benchmark V1.1 06-Sep-2013 12.49
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 1308.2 345.9 379.0 294.1 30.8 17.2 1351.4 1265.7 843.1
2T 2886.6 782.1 782.6 614.0 80.1 34.3 2775.2 2463.7 1667.5
4T 3086.0 998.6 788.1 610.6 79.2 44.5 3472.0 2526.4 2191.4
8T 2930.0 788.2 843.5 616.5 80.5 35.0 2846.0 2799.1 1686.2
Overall Seconds 3.54 1T, 3.30 2T, 6.62 4T, 13.16 8T
#################### T11 ARM-Intel ####################
ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.23
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 837.2 340.1 341.7 191.2 39.1 6.2 1521.1 2532.8 629.3
2T 1676.2 596.2 683.2 387.3 77.8 12.4 3056.9 5055.1 1263.6
4T 1697.7 687.5 869.4 394.5 78.1 12.4 2980.7 6518.4 1258.8
8T 1685.2 685.9 691.0 389.7 78.3 12.4 3086.3 5113.7 1262.0
Overall Seconds 4.06 1T, 4.07 2T, 8.12 4T, 16.19 8T
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Android MP-Whetstone Benchmark V1.1 06-Jul-2015 10.42
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 1877.1 645.2 642.6 524.1 44.0 22.3 1364.7 1572.1 898.9
2T 3668.6 1220.2 1262.4 1021.9 85.9 43.8 2663.5 3078.4 1753.4
4T 7426.9 2375.5 2474.7 2097.7 175.7 88.2 5052.6 6240.4 3555.0
8T 7706.6 2692.2 2746.2 2186.9 180.1 90.3 5822.5 6902.7 3681.3
Overall Seconds 4.44 1T, 4.62 2T, 4.64 4T, 9.00 8T
Total Elapsed Time 24.1 seconds
#################### T21 ARM-Intel ####################
ARM/Intel MP-Whetstone Benchmark V1.1 22-Jul-2015 12.02
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 1598.0 512.1 508.7 311.7 43.6 22.1 1142.9 2123.3 598.4
2T 3161.2 960.0 996.7 614.2 86.7 43.8 2258.9 3820.9 1194.7
4T 6348.0 1593.5 2019.5 1231.5 174.2 88.5 4471.1 8139.4 2398.3
8T 6419.6 2058.2 2077.5 1252.6 175.0 88.7 4520.9 8875.0 2409.0
Overall Seconds 4.88 1T, 5.00 2T, 5.05 4T, 9.92 8T
Total Elapsed Time 29.2 seconds
###################### T22 32 Bit ######################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.30
Compiled for 32 bit ARM v7a
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 676.4 275.9 281.9 147.9 35.4 5.3 600.3 901.0 285.5
2T 1362.5 533.8 561.7 298.0 70.9 10.8 1203.1 1838.9 574.0
4T 2698.6 903.9 1071.7 594.4 141.2 21.5 2346.1 3305.5 1138.5
8T 2830.1 1463.2 1393.0 614.2 152.5 21.9 3243.9 4418.3 1171.4
Overall Seconds 4.95 1T, 4.94 2T, 5.11 4T, 10.09 8T
###################### T22 64 Bit ######################
ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.34
Compiled for 64 bit ARM v8a
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 1524.8 328.6 348.8 297.6 37.3 19.9 1462579 1867.2 1238.0
2T 3062.5 688.8 697.9 596.0 75.5 39.8 2097113 3726.7 2481.3
4T 6085.4 1214.9 1360.5 1185.4 150.5 79.4 2449153 7055.0 4951.8
8T 6222.4 1495.2 1545.6 1204.2 152.2 80.6 3869846 9218.8 5154.1
Overall Seconds 4.92 1T, 4.90 2T, 5.05 4T, 9.97 8T
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
Android MP-Whetstone Benchmark V1.0 17-Oct-2012 13.49
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 1033.7 247.4 235.4 266.0 25.3 15.0 448.4 630.9 513.5
2T 2058.1 456.3 473.0 532.4 50.0 30.1 898.1 1198.4 1026.6
4T 4122.8 831.9 944.7 1064.6 100.7 60.1 1797.0 2392.2 2053.4
8T 4163.2 1016.0 948.2 1069.5 101.8 60.9 1808.0 2414.2 2051.5
Overall Seconds 5.28 1T, 5.34 2T, 5.42 4T, 10.81 8T
#################### T7 ARM-Intel #####################
ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.32
Using 1, 2, 4 and 8 Threads
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
1 2 3 MOPS MOPS MOPS MOPS MOPS
1T 602.2 242.3 242.3 140.2 27.2 4.9 482.8 1425.2 239.1
2T 1208.7 481.2 484.2 280.8 55.0 9.9 970.0 2869.6 478.7
4T 2398.7 805.4 966.7 562.5 109.5 19.5 1938.2 5722.5 957.1
8T 2429.1 974.6 1076.2 562.4 110.9 19.7 1981.5 5816.1 963.6
Overall Seconds 4.94 1T, 4.93 2T, 5.08 4T, 9.93 8T
|
To Start
MP Dhrystone Benchmark - MP-Dhryi.apk
For further details see Dhrystone Benchmark
above
and the following, including further results
Android MultiThreading Benchmark Apps.
This multithreading benchmark runs using 1, 2, 4 and 8 threads, executing multiple copies of the same program. An initial calibration, using a single thread, determines the number of passes needed for an overall execution time of 1 second. Then all threads are run using the same pass count, running time being extended when there are more threads than CPUs. The same calculations are carried out on each thread. Separate data arrays are used for each thread but some variables can be used by all threads. The latter is probably responsible for failure to increase throughput, using multiple threads.
The new ARM/Intel version demonstarted similar speeds on the systems tested. Unlike other systems, the Intel Atom based tablet produced slower performance using multiple threads. Tests on a PC, via BlueStacks emulator, appeared to demonstrate that native Intel instructions were being used.
T21, with the Qualcomm Snapdragon 800, sometimes crashed running this benchmark and apparently every time, trying the ARM-Intel version. When running, the eigth thread performance is also highly suspect.
August 2015 - Results provided for 64 bit T22 showing that the 64 bit version was much faster than via the 32 bit variety.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.00
Threads 1 2 4 8
Seconds 0.96 3.27 6.83 13.79
Dhrystones per Second 4147126 2449335 2343954 2320745
VAX MIPS rating 2360 1394 1334 1321
#################### A1 ARM-Intel ######################
ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.02
Threads 1 2 4 8
Seconds 0.96 3.44 6.88 13.80
Dhrystones per Second 4154551 2323340 2324139 2318280
VAX MIPS rating 2365 1322 1323 1319
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
Android MP-Dhrystone 2 Benchmark V1.1 10-Aug-2013 09.55
Threads 1 2 4 8
Seconds 0.50 0.53 1.05 2.18
Dhrystones per Second 3990211 7522450 7600539 7328598
VAX MIPS rating 2271 4281 4326 4171
#################### T11 ARM-Intel ####################
ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.22
Threads 1 2 4 8
Seconds 0.99 1.12 2.33 4.45
Dhrystones per Second 4031981 7127449 6856521 7196710
VAX MIPS rating 2295 4057 3902 4096
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Android MP-Dhrystone 2 Benchmark V1.1 06-Jul-2015 11.22
Threads 1 2 4 8
Seconds 0.64 0.83 0.94 1.23
Dhrystones per Second 5007132 7722435 13592474 20769050
VAX MIPS rating 2850 4395 7736 11821
Total Elapsed Time 4.4 seconds
#################### T21 ARM-Intel ####################
Failed to run
###################### T22 32 Bit ######################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.32
Compiled for 32 bit ARM v7a
Threads 1 2 4 8
Seconds 0.64 0.71 0.90 1.70
Dhrystones per Second 2481286 4495793 7094180 7540038
VAX MIPS rating 1412 2559 4038 4291
###################### T22 64 Bit ######################
ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.36
Compiled for 64 bit ARM v8a
Threads 1 2 4 8
Seconds 0.89 1.06 1.64 3.24
Dhrystones per Second 4476736 7574470 9768350 9861922
VAX MIPS rating 2548 4311 5560 5613
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
Android MP-Dhrystone 2 Benchmark V1.0 17-Oct-2012 13.59
Threads 1 2 4 8
Seconds 0.72 0.83 1.19 2.55
Dhrystones per Second 2782404 4829150 6740332 6271011
VAX MIPS rating 1584 2749 3836 3569
#################### T7 ARM-Intel #####################
ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.18
Threads 1 2 4 8
Seconds 0.78 0.95 1.27 2.44
Dhrystones per Second 2572642 4214238 6280420 6565767
VAX MIPS rating 1464 2399 3575 3737
################ BlueStacks Emulator ##################
PC with 3 GHz Phenom x4, windows 7
VAX MIPS Original 474 465 453 449
VAX MIPS ARM/Intel 4844 4670 4623 4724
|
To Start
MP-BusSpeed Benchmark - MP-BusSpdi.apk
This is a multithreading version of
BusSpeed Benchmark above.
Here, single thread performance of A1 Atom tablet was similar to that obtained unthreaded, with the ARM/Intel version again providing no improvement. Except for calculating bus speeds, the last column is the only one of real interest, where four cores produced gains of up to 3.7 times, using caches, and 1.9 times via RAM. The latter provided even better relative performance compared to ARM based systems. ARM/Intel version results are not shown for tablets T11 and T7, as they were both essentially the same as those obtained using the original MP benchmark.
For further details and more results see
Android MultiThreading Benchmark Apps.
Some ARM/Intel results for T21 are slower than the original, but this might be due to the short running time.
Results from the PC based BlueStacks emulator are also shown below, to confirm that native Intel instructions were being used in the revised benchmark.
Estimated maximum data transfer speeds, based on burst reading results (like 16 x 1018 for T21). can exceed the specification. This is caused be shared data in the L3 cache, and the way that the program is run.
MP-BusSpd2i.apk is a revised version for Android. Running time is longer and, rather than all threads reading data from the beginning, starting addresses are staggered.
This can result in slower speed as there of fewer calculations in the inner loop, but increased speed, due to cached shared data, appears to no longer be applicable and burst results can be used to estimate maximum RAM throughput (as shown).
August 2015 - Results included for T22 with 64 bit CPU and 64 bit Android 5.0. Just considering the Read All data, A53 64/32 bit L1 cache, L2 cache and RAM performance ratios averaged 2.2, 1.8 and 1.0.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 13.02
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 3990 4458 6123 6512 6438 6729
2T 3894 5699 8948 10299 11800 12555
4T 5046 7109 11952 14750 15533 23304
8T 4533 7464 13097 16970 21674 22225
122.9 1T 1304 1613 2291 2661 3667 5063
2T 2568 3145 4529 5365 7440 10147
4T 4117 4801 7963 7495 8239 18911
8T 3130 5016 7355 8543 11648 15845
12288 1T 190 265 601 1203 2316 3832
2T 244 448 995 1771 3599 6575
4T 427 584 860 1741 3439 7449
8T 395 510 855 1613 3547 6776
Total Elapsed Time 13.5 seconds
#################### A1 ARM-Intel ######################
ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.28
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 5925 6494 6778 6979 7047 7026
2T 3966 7029 9689 11689 12856 13654
4T 4438 8698 16739 22057 23946 25729
8T 4455 8619 15787 19934 22576 20804
122.9 1T 1490 1975 2360 2802 3818 5330
2T 2881 3798 4647 5531 7536 10546
4T 4452 6338 5910 10217 14650 19903
8T 4096 5075 6264 9213 12610 15821
12288 1T 206 273 593 1198 2343 3935
2T 276 455 842 1821 3319 6591
4T 445 730 1401 2076 4457 7525
8T 424 539 954 1829 3688 7064
Total Elapsed Time 13.0 seconds
########## A1 New Long Version
ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.50
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 5431 6110 6780 6262 6655 7313
2T 3550 4464 7375 9825 11777 12442
4T 2027 4442 4399 8841 17611 23509
8T 983 2477 5063 4433 8568 15867
122.9 1T 1499 1991 2357 2839 3818 5382
2T 2816 3808 4708 5592 7557 10677
4T 4316 6313 7991 9816 14335 19993
8T 4235 5610 7917 8791 12828 19661
49152 1T 215 275 611 1183 2328 3922
2T 276 435 787 1671 3323 6507
4T 398 455 884 1754 3490 6971
8T 376 511 867 1746 3512 7510
Total Elapsed Time 48.6 seconds
Maiximum RAM Speed Estimate = 511 x 16 = 8176 MB/second
#################### T11 ARM-Intel ####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.45
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 2165 3591 4256 5587 5998 6109
2T 4121 6469 9530 11381 11846 11936
4T 4106 6438 8827 6793 9802 12080
8T 4098 6390 9534 10141 10996 11603
122.9 1T 464 740 1173 2395 3276 3340
2T 579 989 1934 3994 5431 5792
4T 579 988 1930 3873 5469 5821
8T 580 985 1915 3999 5408 5812
12288 1T 134 172 211 462 602 1904
2T 269 343 387 934 1217 2685
4T 252 231 374 768 991 2625
8T 231 254 367 781 1104 2782
Total Elapsed Time 12.1 seconds
########## T11 New Long Version
ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 17.07
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 3499 4539 5499 5505 6134 6045
2T 3775 7202 8377 10605 10457 11319
4T 3982 6676 7687 9326 9707 10807
8T 2546 3643 7891 8003 10725 11097
122.9 1T 672 901 1336 2784 3274 3334
2T 568 969 1931 3894 5427 5221
4T 574 971 1912 3831 5256 4811
8T 559 971 1917 3878 5387 5162
49152 1T 140 142 193 575 989 1499
2T 221 223 342 769 1379 2355
4T 228 223 344 783 1382 2376
8T 223 223 342 787 1385 2352
Total Elapsed Time 49.9 seconds
Maiximum RAM Speed Estimate = 223 x 16 = 2568 MB/second
Initial Results
12.3 1T 693 936 1266 2522 3264 3329
2T 557 900 1539 3459 3317 3613
4T 551 903 1557 2902 3475 3616
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
L1 caches 4 x 32 KB, L2 cache shared 2048 KB
Android MP-BusSpd v7 Benchmark V1.1 29-Jun-2015 18.37
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 2580 2206 5048 5176 5679 5989
2T 4062 5175 9340 9868 10971 11281
4T 4688 10324 16552 17196 21714 23708
8T 8467 9834 16698 18183 21936 23693
122.9 1T 1152 1052 2068 3035 3927 5723
2T 1710 1840 3094 5001 7963 11475
4T 2047 2002 5031 9267 14698 22920
8T 2235 2275 5223 9348 14234 21783
12288 1T 262 382 508 867 1466 2661
2T 464 766 1049 1754 3186 5735
4T 612 1018 1796 3149 5892 9095
8T 575 680 1277 2308 4987 7948
Total Elapsed Time 12.7 seconds
#################### T21 ARM-Intel ####################
ARM/Intel MP-BusSpd v7 Benchmark V1.1 23-May-2015 17.05
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 1840 2073 3512 3554 4829 5243
2T 3432 4591 7128 7651 9120 9821
4T 4398 7855 13752 15428 18530 20235
8T 6692 9507 13857 16110 18143 18796
122.9 1T 860 753 2011 2841 3205 5282
2T 1505 1609 3076 5038 8089 10421
4T 1924 1981 4299 7588 14614 20754
8T 1909 1988 4264 7980 13884 19027
12288 1T 270 379 538 856 1626 2859
2T 471 677 1098 1849 3304 5924
4T 549 787 1066 1874 6274 10781
8T 713 853 1649 2258 4664 8321
Total Elapsed Time 13.1 seconds
########## T21 New Long Version
ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.39
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 2247 2616 4010 4443 4909 5614
2T 3558 4725 7241 9048 9747 10892
4T 6074 8303 13442 16937 18525 21068
8T 3998 5106 14314 13615 18200 20740
122.9 1T 874 1198 2024 2935 4529 5345
2T 1686 1702 3174 5357 7688 10545
4T 1988 2139 4465 8171 14969 21169
8T 1972 2139 4468 8195 15261 21132
49152 1T 292 406 516 899 1663 2929
2T 449 541 962 1569 2851 4776
4T 495 605 1109 2439 4161 8243
8T 530 564 1156 2149 4172 7907
Total Elapsed Time 48.0 seconds
Maiximum RAM Speed Estimate = 605 x 16 = 9680 MB/second
###################### T22 32 Bit ######################
T22, Tab 2 A8-50, 1.3 GHz quad core 64 bit ARM Cortex-A53
Single Channel RAM, LPDDR3 666 MHz, 5.3 GB/second
ARM/Intel MP-BusSpd Benchmark V1.2 12-Aug-2015 16.13
Compiled for 32 bit ARM v7a
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 1849 2140 2079 2211 2270 2297
2T 3663 4252 4294 4400 4370 4580
4T 4630 5574 5691 5893 6015 6083
8T 5331 5775 6033 6622 7968 8023
122.9 1T 597 621 1119 1815 2135 2237
2T 869 943 1644 2992 3740 4412
4T 949 951 1922 3736 6468 7779
8T 948 978 1911 3717 6464 7542
12288 1T 123 174 344 678 1215 1840
2T 243 310 672 1332 2383 3974
4T 302 285 594 1282 2271 4606
8T 279 295 654 1198 2749 4660
Total Elapsed Time 12.8 seconds
########## T22 Long Version
ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.14
Compiled for 32 bit ARM v7a
12.3 1T 1877 2124 2176 2266 2296 2343
2T 3625 4198 4341 4468 4536 4613
4T 5733 7541 8293 8830 8024 9042
8T 2985 3829 7438 6117 8108 8923
122.9 1T 604 625 1142 1846 2150 2284
2T 924 950 1793 3277 4270 4504
4T 962 989 1939 3765 6798 8862
8T 965 993 1933 3748 6651 8239
49152 1T 165 175 344 677 1285 1979
2T 234 238 482 961 1907 3547
4T 266 298 562 1224 2296 4478
8T 272 275 538 1098 2149 4282
Total Elapsed Time 48.8 seconds
###################### T22 64 Bit ######################
ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.18
Compiled for 64 bit ARM v8a
12.3 1T 2610 2472 2586 2727 2748 5841
2T 4404 4681 4994 5369 5420 11297
4T 6546 8125 9105 10243 10319 20610
8T 3380 4023 7919 7146 9871 19852
122.9 1T 604 621 1110 1872 2446 5100
2T 919 948 1855 3433 4853 10037
4T 961 974 1984 3924 7491 14935
8T 963 942 1931 3915 7572 14689
49152 1T 173 177 340 692 1300 2653
2T 266 241 479 968 1883 3724
4T 304 277 556 1130 2126 4328
8T 279 278 544 1138 2179 4275
Total Elapsed Time 49.4 seconds
#################### T7 ARM-Intel #####################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.35
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 2853 3392 3376 3511 3551 3494
2T 2857 3389 3542 5540 5730 5595
4T 7257 10326 10289 10997 11373 11100
8T 6584 10325 10485 11175 11322 11189
122.9 1T 362 379 347 546 623 978
2T 516 530 508 726 1227 1840
4T 598 658 548 1181 1556 2657
8T 721 733 736 1181 1548 2653
12288 1T 58 57 84 123 173 334
2T 111 111 182 248 348 664
4T 87 85 276 463 687 1290
8T 154 107 147 429 441 1242
Total Elapsed Time 12.7 seconds
########## T7 New Long Version
ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.59
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 2166 2774 3181 3307 3377 3263
2T 3924 5188 5207 5754 5759 5805
4T 7570 10011 10252 11165 11375 11777
8T 3510 4786 9011 8318 11351 11544
122.9 1T 383 409 359 558 663 983
2T 525 541 520 741 1241 1814
4T 739 752 753 1219 1590 2776
8T 735 741 753 1218 1607 2737
49152 1T 56 51 81 126 172 330
2T 65 67 107 196 335 620
4T 70 68 108 215 426 835
8T 70 68 109 215 428 851
Total Elapsed Time 48.2 seconds
Maiximum RAM Speed Estimate = 68 x 16 = 1088 MB/second
############### BlueStacks Original ###############
Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 17.44
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 1600 1538 1641 1706 1600 1687
2T 1600 1641 1745 1600 1687 1638
4T 1600 1745 1745 1567 1638 1575
8T 1476 1641 1602 1638 1575 1596
122.9 1T 1000 923 1477 1600 1600 1688
2T 1000 952 1477 1600 1567 1282
4T 872 1163 1422 1567 1602 1576
8T 1026 1164 1477 1527 1644 1580
12288 1T 307 403 537 1075 1396 1512
2T 302 409 708 1075 1417 1433
4T 307 355 614 1024 1433 1535
8T 307 384 661 1023 1404 1512
Total Elapsed Time 13.9 seconds
############### BlueStacks ARM/Intel ##############
ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.25
MB/Second Reading Data, 1, 2, 4 and 8 Threads
KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll
12.3 1T 9999 18461 20000 20512 19692 21942
2T 10909 17777 19999 19692 21942 20480
4T 9599 18461 19692 19591 20480 19692
8T 10666 17066 19948 20480 20480 19200
122.9 1T 1500 1476 2742 5485 11636 13128
2T 1428 1396 2792 5585 11170 13653
4T 1396 1428 2954 5486 10973 13654
8T 1280 1371 2744 5909 10974 14630
12288 1T 460 439 645 631 1105 1331
2T 230 268 480 806 1433 2234
4T 256 307 575 1126 2010 2764
8T 236 390 756 1105 1911 3574
Total Elapsed Time 14.4 seconds
|
To Start
MP-RandMem Benchmark - MP-RndMemi.apk
This is a conversion of the longer running
MP-RndMem2.apk Benchmark,
as the original, short version, produced inconsistent performance measurements.
It is a multithreading variety of
RandMem Benchmark above.
For further details and more results see
Android MultiThreading Benchmark Apps.
Log file details are provided below for the original version, that performed relatively badly on the Intel based tablet A1, and the ARM/Intel version, with cache based speeds up to 3.6 times faster with reading tests and 1.3 times with reading/writing.
The new version, running on ARM based tablets, produced similar results to those from the original, with some slower.
Compared with early ARM based devices, tablet A1 ARM/Intel tests again demonstrated superior performance from RAM based data and from L2 cache on reading, but not that well using L1 cache.
August 2015 - Results provided for 64 bit T22 with Cortex-A53 CPU. Probably as performance is dependent on the complex indexing used, performance is not significantly faster at 64 bits.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.14
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 1337 2505 1337 2509
2T 2637 2513 2657 2521
4T 3535 2420 3484 2454
8T 3195 2403 3088 2406
122.9 1T 1305 2280 963 1758
2T 2581 2285 1945 1748
4T 3588 2130 3125 1740
8T 3211 2269 2949 1745
12288 1T 1248 1962 101 215
2T 2469 1940 191 214
4T 3462 1954 323 214
8T 3127 1926 318 212
Total Elapsed Time 43.7 seconds
#################### A1 ARM-Intel ######################
ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.54
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 4643 3593 4710 3641
2T 8583 3552 8761 3564
4T 12707 3450 12496 3384
8T 10410 3389 10796 3408
122.9 1T 3733 2874 2408 2150
2T 7259 2871 4781 2165
4T 11726 2897 7656 2133
8T 11673 2853 7100 2113
12288 1T 3153 2087 226 238
2T 5782 2073 327 238
4T 6451 1997 447 236
8T 6471 2071 446 233
Total Elapsed Time 41.5 seconds
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.13
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 6696 4438 6594 4483
2T 12338 3078 12263 3573
4T 12419 2834 12166 2907
8T 12314 2903 11991 2934
122.9 1T 3371 2916 1639 1748
2T 6409 1922 2052 1097
4T 6155 1892 2027 1186
8T 6045 2105 2015 1192
12288 1T 1394 1048 153 133
2T 2245 985 285 123
4T 2277 1002 285 132
8T 2165 1001 286 127
Total Elapsed Time 44.0 seconds
#################### T11 ARM-Intel ####################
ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 12.07
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 6315 4486 6345 4484
2T 11837 2910 11846 3112
4T 11864 2835 11553 2858
8T 11821 3003 11805 3198
122.9 1T 3963 2681 1670 1704
2T 6672 1782 2040 1125
4T 6493 1817 2033 1218
8T 6673 1738 2038 1303
12288 1T 1805 1081 177 145
2T 2543 1066 279 137
4T 2600 1065 276 136
8T 2662 1073 281 138
Total Elapsed Time 43.7 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
Android MP-RndMem2 Benchmark V2.1 08-Jul-2015 16.33
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 5088 5325 4262 4711
2T 9752 4902 8895 4570
4T 17379 4653 17434 4096
8T 19771 4698 17358 4424
122.9 1T 2714 2578 1923 2163
2T 5614 2502 3483 2107
4T 10859 2219 4835 1972
8T 10654 2410 4904 1923
12288 1T 1798 952 186 204
2T 3489 974 341 195
4T 6515 943 563 196
8T 6218 922 563 187
Total Elapsed Time 42.3 seconds
#################### T21 ARM-Intel ####################
ARM/Intel MP-RndMem Benchmark V1.1 09-Jul-2015 11.48
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 4186 3777 4055 3933
2T 9324 3541 7710 3619
4T 16594 3350 15731 3142
8T 18117 3291 16187 3262
122.9 1T 2423 2043 1610 1683
2T 5235 2029 3013 1641
4T 10148 1935 4662 1565
8T 10015 1834 4611 1474
12288 1T 1363 886 171 186
2T 2643 845 325 187
4T 5197 823 534 184
8T 4801 835 542 184
Total Elapsed Time 42.6 seconds
###################### T22 32 Bit ######################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.13
Compiled for 32 bit ARM v7a
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 2894 2438 2887 2433
2T 5665 2402 5663 2403
4T 10922 2369 11100 2310
8T 10065 2293 10648 2265
122.9 1T 2681 2368 757 758
2T 5351 2360 1398 769
4T 10056 2308 2121 772
8T 8838 2351 1916 742
12288 1T 2309 1662 80 78
2T 3986 1683 164 73
4T 5419 1684 283 82
8T 4658 1694 279 82
###################### T22 64 Bit ######################
ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.15
Compiled for 64 bit ARM v8a
12.29 1T 4445 3109 4455 3089
2T 8010 3100 8072 3105
4T 15909 3057 14711 3040
8T 14764 3036 14570 3037
122.9 1T 3457 2888 842 876
2T 6537 2924 1524 876
4T 11095 2892 2119 861
8T 11729 2916 2080 874
12288 1T 2475 1679 81 78
2T 4155 1713 163 73
4T 5503 1711 285 89
8T 4519 1717 281 89
Total Elapsed Time 48.1 seconds
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.17
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 3120 3060 3128 3078
2T 6098 3003 6083 3004
4T 11354 2948 11188 2942
8T 11403 2857 10412 2872
122.9 1T 996 983 661 699
2T 1868 984 1012 697
4T 2600 982 1483 699
8T 2534 976 1459 694
12288 1T 335 286 91 80
2T 640 288 113 82
4T 892 286 130 82
8T 925 287 127 81
Total Elapsed Time 44.7 seconds
#################### T7 ARM-Intel #####################
ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.59
MB/Second Using 1, 2, 4 and 8 Threads
KB SerRD SerRDWR RndRD RndRDWR
12.29 1T 3060 2001 2867 1904
2T 5459 1879 5463 1867
4T 10797 1852 10537 1856
8T 10090 1802 10608 1813
122.9 1T 968 823 588 547
2T 1749 785 902 618
4T 2716 812 1328 672
8T 2733 810 1407 673
12288 1T 329 274 90 82
2T 636 272 112 82
4T 849 271 128 82
8T 869 271 126 81
Total Elapsed Time 45.4 seconds
|
To Start
NEON-Linpack Benchmark - NEON-Linpacki.apk
Details of the benchmark can be found
above
and in
android neon benchmarks.htm.
The main point is that it was a complete surprise to discover that ARM NEON intrinsic functions could be converted to Intel SIMD SSE instructions, with significant performance improvement on an Atom based tablet.
The use of NEON functions for ARM CPUs can be anticipated to produce similar performance ratings via the original and ARM/Intel versions, as reflected in the results below.
August 2015 - T22 results from 32 bit and 64 bit compilations were similar, as the programs use a limited number of identical intrinsic functions.
September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with speed of 1446 MFLOPS at 2 bits.
NEON Single Precision Floating Point MFLOPS
########################################################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
MFLOPS Original 443.4 ARM-Intel 900.2
########################################################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
MFLOPS Original 1334.9 ARM-Intel 1411.9
########################################################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
MFLOPS Original 1250.1 ARM-Intel 1235.0
########################################################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
MFLOPS 32 bit 407.1 64 bit 505.2
########################################################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
MFLOPS Original 376.0 ARM-Intel 346.8
########################################################
P33, Snapdragon 810 2000 MHz, Android 5.0.2
MFLOPS 32 bit 1446.4
|
To Start
NeonSpeed Benchmark - NeonSpeedi.apk
This benchmark carries out the same calculations as the
MemSpeed Benchmark
measuring data reading speeds in Mega Bytes per second, with functions accessing arrays of cache and RAM based data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m] single precision floating point with x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can calculated by dividing single precision MB/second by 4 and 8, for the two tests. The first set of calculations use normal functions followed by some using NEON Intrinsic Functions. The last two columns are NEON only results.
For further details and results see
android neon benchmarks.htm.
The native Intel code produced some performance gains, mainly using L1 cache based data, but speed in other areas is probably limited by data flow. The later compiler produced some slower speeds on ARM based tablet T11 and better/worse variations on T21.
August 2015 - Results provided for 64 bit T22. As with NEON-Linpack, many results from 32 bit and 64 bit compilations, via NEON intrinsic functions, were similar. With normal code, the 64 bit compilations were up to near four times faster than those at 32 bits.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android NeonSpeed Benchmark V1.1 02-Feb-2015 17.09
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 1778 3940 2807 5474 4997 5062
32 1781 3576 2636 4431 4316 4291
64 1772 3589 2639 4480 4337 4332
128 1784 3589 2641 4423 4320 4320
256 1766 3592 2642 4400 4347 4358
512 1784 3585 2633 4375 4350 4355
1024 1705 3253 2448 3760 3789 3788
4096 1673 3021 2366 3257 3245 3237
16384 1672 2948 2349 3062 3157 3151
65536 1675 2967 2345 3190 3168 3168
Total Elapsed Time 10.8 seconds
#################### A1 ARM-Intel ######################
ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 16.54
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 1816 5996 4916 6244 6882 6880
32 1851 4703 3985 5200 5609 5711
64 1862 3845 3121 4174 4441 4520
128 1841 3929 3110 4179 4411 4487
256 1863 3932 3092 4179 4412 4493
512 1861 3938 3090 3894 4215 4415
1024 1784 3475 2738 3130 3223 3443
4096 1741 2376 2649 2998 3112 3139
16384 1774 3086 2780 3116 3140 3145
65536 1774 2987 2547 2328 3126 3072
Total Elapsed Time 10.1 seconds
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
Android NeonSpeed Benchmark V1.1 09-Aug-2013 17.10
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 3793 9641 4375 13023 13456 13562
32 5777 11410 4993 11718 11365 11143
64 4122 6692 3855 6539 6682 7210
128 4017 6565 3849 6475 6520 6983
256 4067 6562 3836 6459 6495 7038
512 3900 6531 3820 6428 6490 7095
1024 1821 2544 1774 2532 2554 2539
4096 1141 1645 1536 1612 1615 1635
16384 1437 1695 1490 1576 1694 1668
65536 1424 1675 1475 1699 1687 1694
Total Elapsed Time 11.2 seconds
#################### T11 ARM-Intel ####################
ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.17
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 2252 4964 3321 6602 7304 7237
32 4202 8364 4543 8366 8553 8101
64 3710 6096 3860 6570 6348 6182
128 3802 5581 3874 6044 5624 5877
256 3654 5618 3501 6154 5655 5783
512 3597 5688 3723 6130 5812 5684
1024 1727 2466 1659 2481 2454 2472
4096 1479 1718 1421 1714 1713 1706
16384 1488 1704 1435 1576 1705 1694
65536 1477 1755 1453 1754 1759 1752
Total Elapsed Time 10.8 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
Android NeonSpeed Benchmark V1.1 23-Jul-2015 13.00
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 4324 13809 4498 14660 17501 18186
32 3587 6845 2922 8073 6981 7035
64 3347 6894 2912 8078 6964 6938
128 3343 6651 2919 7922 6726 6999
256 3511 6963 3002 8071 6902 6897
512 3476 6628 3025 7827 6613 6818
1024 3172 4627 2773 6424 4800 4806
4096 2653 2051 2378 3613 2090 2054
16384 2356 1891 2118 3165 1955 1962
65536 2424 1923 2167 3368 1933 1925
Total Elapsed Time 9.9 seconds
#################### T21 ARM-Intel ####################
ARM/Intel NeonSpeed Benchmark V1.1 23-Jul-2015 13.03
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 3623 16704 4623 15187 17446 16719
32 3455 9210 2997 8723 9280 9112
64 3336 7721 3002 8544 8469 8581
128 3415 7664 3111 8481 7549 7638
256 3584 7526 3087 8500 7849 7805
512 3538 7422 3154 8266 7567 7541
1024 3513 7227 3067 7789 7294 7261
4096 2302 1673 2413 3107 1693 1677
16384 2286 1616 2323 3024 1620 1617
65536 2322 1617 2271 2505 1634 1600
Total Elapsed Time 9.9 seconds
###################### T22 32 Bit ######################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.32
Compiled for 32 bit ARM v7a
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 971 3853 1807 4059 3957 4397
32 970 3812 1800 3983 3891 4323
64 927 3228 1605 3038 3269 3521
128 926 3321 1681 3343 3354 3596
256 936 3386 1693 3449 3413 3667
512 898 2889 1578 2996 2927 3118
1024 794 1859 1345 2057 1996 1924
4096 794 1796 1250 1788 1813 1835
16384 792 1773 1270 1820 1829 1864
65536 796 1811 1289 1852 1832 1880
Total Elapsed Time 11.3 seconds
###################### T22 64 Bit ######################
ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.37
Compiled for 64 bit ARM v8a
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 3054 4055 3605 4376 4911 5094
32 2922 3787 3435 4198 4546 4682
64 2795 3514 3259 3658 4050 4116
128 2886 3529 3373 3924 4148 3963
256 2883 3641 3264 3942 4193 4276
512 2454 3165 2985 3385 3586 3542
1024 1633 2000 1835 2043 2114 2105
4096 1738 1893 1899 1900 1956 1955
16384 1757 1870 1886 1802 1921 1846
65536 1755 1875 1870 1903 1936 1937
Total Elapsed Time 10.2 seconds
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
Android NeonSpeed Benchmark 15-Dec-2012 14.38
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 860 2575 2325 2918 3053 3245 L1
32 950 2551 2400 2823 2944 3131
64 744 1396 1329 1434 1465 1496 L2
128 713 1342 1319 1365 1392 1417
256 714 1339 1311 1357 1377 1400
512 708 1323 1299 1348 1358 1383
1024 608 875 869 917 930 952
4096 460 493 492 481 488 504 RAM
16384 460 498 487 507 506 504
65536 459 495 469 251 503 505
Total Elapsed Time 11.5 seconds
#################### T7 ARM-Intel #####################
ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.07
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 881 2440 2501 3334 3206 3465
32 901 1868 1705 2260 2083 2186
64 801 1395 1365 1573 1548 1581
128 784 1282 1278 1405 1389 1411
256 787 1279 1285 1420 1380 1409
512 777 1266 1267 1409 1370 1394
1024 604 786 762 769 770 828
4096 458 479 477 463 486 488
16384 436 447 448 469 470 469
65536 450 472 469 240 482 483
Total Elapsed Time 11.5 seconds
|
To Start
NEON-MFLOPS-MP Benchmark - NEON-MFLOPS2i-MP.apk
NEON-MFLOPS-MP carries out the same calculations as
MP-MFLOPS Benchmarks
above, but with NEON intrinsic functions used for all calculations. For further results see
android neon benchmarks.htm.
Results for the original NEON version and a sample of MP-MFLOPS are provided below. NEON produced significant performance improvements across the board, including The Atom based tablet, via the ARM to Intel conversion layer. As might be expected using intrinsics, compilation via a later version of gcc made little difference in speed of ARM systems but the Intel native code increased performance by more than twice, on CPU speed limited tests.
Following the performance details are the numeric results of calculations from the fixed parameters used in the new version, for both ARM and Intel. It seems that Tablet T11 has an intermittent fault, as it occasionally fails to calculate a correct answer or causes the Tablet to crash and reboot. Now, this also appears to happen using the older version.
August 2015 - T22 NEON 64 bit compilation produced a small performance gain over 32 bit results, at 2 operations per word, but near double speed at 32 operations, the latter suffering from fewer registers for the variables. Using one core, maximum speed was 2.77 GFLOPS, rising to 10.8 GFLOPS via four cores (best so far relative to CPU GHz).
The one core speed equated to just over two floating point operation per clock cycle. This is disappointing, compared with Intel processors, such as the Core 2 onwards, at 6 per clock cycle out of a maximum of 8, with SSE SIMD code
(See Linux results).
September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, at 64 bits. Performance, with 8 threads, is up to 23.6 GFLOPS, and up to nearly 3.5 results per clock cycle, using one core.
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Android NEON-MFLOPS-MP Benchmark V1.1 07-Feb-2015 18.37
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 1110 1319 878 1188 1139 1226
2T 2470 2114 996 2406 2427 2390
4T 3159 2211 988 4148 3487 4006
8T 2066 2486 1003 4144 3944 4077
Total Elapsed Time 3.6 seconds
Not NEON
4T 1571 1627 979 2238 2255 2258
Android NEON-MFLOPS2-MP Benchmark V2.1 07-Feb-2015 18.38
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 1796 1520 1025 1231 1228 1227
2T 3354 2959 1047 2427 2445 2445
4T 4627 5508 978 4690 4791 4733
8T 3861 6307 1030 4611 4869 4742
Total Elapsed Time 88.3 seconds
#################### A1 ARM-Intel ######################
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.17
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 2151 1962 1064 2619 2694 2650
2T 4421 3849 1048 5296 5463 5343
4T 5886 6652 982 9592 10735 10362
8T 3744 7284 1018 9085 10791 9493
Total Elapsed Time 13.8 seconds
############### A1 ARM-Intel 1000 MHz #################
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 16.04
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 1939 1266 674 2503 2388 2351
2T 3670 2652 679 4919 4792 4640
4T 3102 3051 676 4688 4678 4672
8T 3189 3425 657 4813 4869 4639
Total Elapsed Time 19.4 seconds
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Dual core, Measured 1.7 GHz
Android NEON-MFLOPS-MP Benchmark V1.1 13-Sep-2013 13.44
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 1847 1415 597 3772 4096 3545
2T 3649 3309 664 8065 7966 7505
4T 3670 3922 658 7753 8148 7490
8T 5664 5570 681 8092 8355 7672
Total Elapsed Time 13.0 seconds
Not NEON
2T 1593 1668 648 3140 3067 2977
#################### T11 ARM-Intel ####################
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.07
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 1965 1630 582 3792 4077 3521
2T 3789 2690 663 8497 8133 7297
4T 5714 4883 654 8364 8192 7554
8T 5414 6316 673 7976 8437 6635
Total Elapsed Time 13.0 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
Android NEON-MFLOPS2-MP Benchmark V2.1 25-Jul-2015 18.44
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 2757 2576 771 2808 2825 2800
2T 5662 5525 1516 5631 5664 5570
4T 6550 7846 1945 11167 11281 10939
8T 10273 10928 1981 10851 11211 11350
Total Elapsed Time 40.0 seconds
Not NEON
4T 2338 2959 1836 4867 4911 4859
#################### T21 ARM-Intel ####################
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 28-Jun-2015 16.32
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 3049 2857 622 2923 2874 2098
2T 5508 4887 1009 5477 5736 4349
4T 5643 5282 1410 11244 11601 8564
8T 9294 11156 1681 11288 11605 8946
Total Elapsed Time 14.0 seconds
###################### T22 32 Bit ######################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.35
Compiled for 32 bit ARM v7a
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 619 613 575 1444 1446 1426
2T 1174 1206 889 2894 2902 2839
4T 1585 1616 901 5679 5726 5596
8T 2075 2130 944 5400 5585 5519
Total Elapsed Time 25.8 seconds
###################### T22 64 Bit ######################
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.38
Compiled for 64 bit ARM v8a
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 726 745 647 2766 2774 2639
2T 1397 1402 903 5523 5552 5371
4T 1871 1930 898 10780 10479 10439
8T 2496 2876 1011 9736 10679 9900
Total Elapsed Time 15.1 seconds
##################### P33 64 Bit #####################
P33 Quad-core 2 GHz Qualcomm Snapdragon 810, Android 5.0.2
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 16-Sep-2015 17.59
Compiled for 64 bit ARM v8a
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 2811 3126 1089 6943 6589 6342
2T 2488 4114 1541 12084 10559 8809
4T 4759 5480 2038 16516 14826 11960
8T 4840 8985 2452 22082 23563 12461
Total Elapsed Time 7.6 seconds
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Quad core, Measured 1200 MHz
Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 16.57
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 532 402 124 1135 1044 960
2T 1255 798 213 2041 1987 1916
4T 2441 1553 229 4185 4034 3450
8T 1922 2403 226 3774 3996 3346
Total Elapsed Time 4.5 seconds
Not NEON
4T 716 655 233 2367 2316 2240
#################### T7 ARM-Intel #####################
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.24
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 657 407 132 1077 1074 1053
2T 1265 817 222 2147 2150 2078
4T 2024 1695 234 4214 4276 3555
8T 2435 2495 234 4196 4100 3523
Total Elapsed Time 39.0 seconds
##################### New Results #####################
Results x 100000, 12345 indicates ERRORS
ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1
1T 44934 86735 99850 36770 79897 99759
2T 44934 86735 99850 36770 79897 99759
4T 44934 86735 99850 36770 79897 99759
8T 44934 86735 99850 36770 79897 99759
T11 44934 12345 99850 36770 79897 99759
Android NEON-MFLOPS-MP Benchmark V1.1
1T 86735 98519 99984 79897 97638 99975
2T 86735 98519 99984 79897 97638 99975
4T 86735 98519 99984 79897 97638 99975
8T 86735 98519 99984 79897 97638 99975
Android NEON-MFLOPS2-MP Benchmark V2.1
1T 40015 66980 99522 35216 54898 99234
2T 40015 66980 99522 35216 54898 99234
4T 40015 66980 99522 35216 54898 99234
8T 40015 66980 99522 35216 54898 99234
|
To Start
NEON-Linpack-MP Benchmark - NEON-Linpacki-MP.apk
This is a multithreading version of
NEON-Linpack Benchmark.
Further details and results can be found in
android neon benchmarks.htm.
The benchmark is run on 100x100, 500x500 and 1000x1000 matrices using 0, 1, 2 and 4 separate threads, the programming code for zero theads being the same as
the earlier example.
Multithreading performance, using this standard linear equation solver, is severely degraded, due to
overheads,
the zero thread results being the only ones of real use.
Performance, using native Intel compilation, is shown to be twice as fast, except at N = 1000, which is mainly dependent on calculations from data in RAM. Speed from ARM can also be somewhat faster (or slower).
T21, with the Qualcomm Snapdragon 800, obtains significantly fastest results, at unthreaded N = 500.
The program checks that the same numeric results are produced, irrespective of the number of threads used, at each matrix size. Then, due to rounding effects, these are slightly different from ARM and Intel hardware, as shown below.
August 2015 - T22 results from 32 bit and 64 bit compilations were again similar, due to the programs use a limited number of identical intrinsic functions.
MFLOPS 0 to 4 Threads, N 100, 500, 1000
#################### A1 Original #######################
A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
Threads None 1 2 4
N 100 452.39 21.00 23.48 17.48
N 500 663.38 275.56 88.66 312.71
N 1000 617.04 380.60 191.26 195.61
#################### A1 ARM-Intel ######################
ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 13.58
Threads None 1 2 4
N 100 971.71 37.72 36.36 39.66
N 500 1311.37 488.73 487.85 488.98
N 1000 945.97 727.85 737.95 742.34
Total Elapsed Time 59.966 seconds
#################### T11 Original #####################
T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
Measured 1.7 GHz
Threads None 1 2 4
N 100 1399.82 54.86 55.31 54.66
N 500 1154.21 434.16 434.06 436.97
N 1000 571.26 482.57 487.25 485.80
#################### T11 ARM-Intel ####################
ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.44
Threads None 1 2 4
N 100 1497.90 61.13 63.13 61.87
N 500 1399.10 491.49 489.29 494.69
N 1000 586.14 499.00 504.97 497.49
Total Elapsed Time 43.952 seconds
#################### T21 Original #####################
T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
Android Linpack NEON SP MP Benchmark 26-Jul-2015 11.46
Threads None 1 2 4
N 100 1311.08 12.38 12.93 15.05
N 500 2271.56 344.04 419.52 381.73
N 1000 837.30 540.99 523.52 564.87
Total Elapsed Time 143.534 seconds
#################### T21 ARM-Intel ####################
ARM/Intel Linpack NEON SP MP Benchmark 26-Jul-2015 11.51
Threads None 1 2 4
N 100 1308.07 14.89 11.77 11.63
N 500 2341.17 407.96 481.02 415.12
N 1000 901.21 551.80 566.77 564.31
Total Elapsed Time 145.750 seconds
###################### T22 32 Bit ######################
T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2
ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.52
Compiled for 32 bit ARM v7a
Threads None 1 2 4
N 100 460.74 22.35 23.16 23.82
N 500 480.63 336.52 339.94 303.66
N 1000 470.02 405.86 403.01 405.98
###################### T22 64 Bit ######################
ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.57
Compiled for 64 bit ARM v8a
Threads None 1 2 4
N 100 548.67 27.70 33.93 37.00
N 500 470.04 285.95 297.79 301.67
N 1000 519.02 441.84 443.47 441.91
##################### T7 Original ######################
T7, ARM Cortex-A9 1300 MHz, Android 4.1.2,
Measured 1200 MHz
Threads None 1 2 4
N 100 413.47 45.95 48.22 48.34
N 500 253.08 187.51 189.69 189.94
N 1000 148.76 135.49 136.08 136.17
#################### T7 ARM-Intel #####################
ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.40
Threads None 1 2 4
N 100 385.49 28.79 29.06 29.25
N 500 272.07 184.85 183.70 183.18
N 1000 147.09 131.92 132.44 130.05
Total Elapsed Time 64.318 seconds
################### Numeric Results ###################
NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1
N 100 500 1000
ARM
NR 1.60 3.96 11.32
RE 3.80277634e-05 4.72068787e-04 2.70068645e-03
MA 1.19209290e-07 1.19209290e-07 1.19209290e-07
X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04
XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05
Intel
NR 1.68 3.96 11.39
RE 4.00543213e-05 4.72545624e-04 2.71725655e-03
MA 1.19209290e-07 1.19209290e-07 1.19209290e-07
X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04
XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05
|
To Start
FFT Benchmarks - fft1.apk, fft3c.apk
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results are displayed and saved in a log file (FFT-tests.txt), with FFT running time in milliseconds.
Besides Android, the bechmarks are available to run via Windows and Linux.
Two versions are available FFT1, original version and with optimised C code as FFT3c. Further details, results, and links for benchmarks and source code are in
FFTBenchmarks.htm.
Below is an example of results.
Kindle Fire HDX 7, 2.2 GHz Quad Core Qualcomm Snapdragon 800
ARM/Intel FFT Benchmark 3c.0 08-Sep-2015 23.15
Compiled for 32 bit ARM v7a
Size milliseconds
K Single Precision Double Precision
1 0.155 0.352 1.341 0.087 0.073 0.073
2 0.812 0.814 0.750 0.201 0.187 0.251
4 1.751 1.658 1.776 0.414 0.405 0.443
8 3.712 1.083 1.065 0.930 0.899 0.890
16 2.880 3.356 2.430 2.579 2.658 2.380
32 6.124 6.541 5.605 5.907 6.070 5.681
64 13.430 12.566 12.774 13.792 13.556 13.997
128 30.737 27.408 27.132 33.318 33.088 33.071
256 64.472 63.394 64.690 73.288 72.546 72.786
512 153.609 150.383 156.046 155.788 156.304 163.178
1024 315.283 306.323 307.409 369.426 337.074 336.684
1024 Square Check Maximum Noise Average Noise
SP 9.999520e-01 3.346482e-06 4.565234e-11
DP 1.000000e+00 1.133294e-23 1.428110e-28
Total Elapsed Time 6.5 seconds
|
To Start
System Details
A1 Asus MemoPad 7 ME176CEX, 1.86 GHz Atom Intel Atom Z3745
Screen pixels w x h 800 x 1216
Android Build Version 4.4.2
Processor : ARMv7 processor rev 1 (v7l)
BogoMIPS : 1500.0
Features : neon vfp swp half thumb fastmult edsp vfpv3
CPU implementer : 0x69
CPU architecture: 7
CPU variant : 0x1
CPU part : 0x001
CPU revision : 1
Hardware : placeholder
Revision : 0001
Linux version 3.10.20
Mainly runs at 1.86 GHz Turbo Boost
T7 Device Google Nexus 7 quad core CPU 1.3, GHz 1.2 GHz > 1 core
RAM 1 GB DDR3L-1333 Bandwidth 5.3 GB/sec
Screen pixels w x h 1280 x 736 MHz
Twelve-core Nvidia GeForce ULP graphics 416 MHz
Android Build Version 4.1.2
Processor : ARMv7 Processor rev 9 (v7l)
processor : 0 BogoMIPS : 1993.93
processor : 1 BogoMIPS : 1993.93
processor : 2 BogoMIPS : 1993.93
processor : 3 BogoMIPS : 1993.93
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09 - Cortex-A9
CPU revision : 9
Hardware : grouper - nVidia Tegra 3 T30L
Revision : 0000
Linux version 3.1.10
Runs at 1.2 GHz
T11 Voyo A15, Samsung EXYNOS 5250 Dual core 2.0 GHz Cortex-A15,
Mali-T604 GPU, 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/s
Screen pixels w x h 1920 x 1032
Android Build Version 4.2.2 - Jelly Bean
Processor : ARMv7 Processor rev 4 (v7l)
processor : 0
BogoMIPS : 992.87
processor : 1
BogoMIPS : 997.78
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4
idiva idivt
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc0f
CPU revision : 4
Hardware : SMDK5250
Linux version 3.4.35Ut
Runs at 1.7 GHz
T21 Kindle Fire HDX 7, 2.2 GHz Quad Core Qualcomm Snapdragon 800 (Krait 400)
2 x 32 Bit LPDDR3-1866 Memory, 14.9 GB/s, GPU Qualcomm Adreno 330, 578 MHz
Device Amazon KFTHWI
Screen pixels w x h 1200 x 1803
Android Build Version 4.4.3
Processor : ARMv7 Processor rev 0 (v7l)
processor : 0, 1, 2, 3
BogoMIPS : 38.40
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
CPU implementer : 0x51
CPU architecture: 7
CPU variant : 0x2
CPU part : 0x06f
CPU revision : 0
Hardware : Qualcomm MSM8974
Revision : 0000
Linux version 3.4.0-perf (gcc version 4.7)
T22 Lenovo Tab 2 A8-50, 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53
1 GB LPDDR3, GPU Mali T720 MP2
Device LENOVO Lenovo TAB 2 A8-50F
Screen pixels w x h 800 x 1216
Android Build Version 5.0.2
Processor : AArch64 Processor rev 3 (aarch64)
processor : 0, 1, 2
BogoMIPS : 26.0
Features : fp asimd aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: AArch64
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 3
Hardware : MT8161
Linux version 3.10.65
P33 Sony Xperia Z3+ E6533, Quad-core 1.5 GHz & Quad-core 2 GHz Qualcomm
Snapdragon 810 64-bit CPU
Screen pixels w x h 1080 x 1776
Android Build Version 5.0.2
Processor : AArch64 Processor rev 1 (aarch64)
processor : 0 to 7
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd07
CPU revision : 1
Hardware : Qualcomm Technologies, Inc MSM8994
Linux version 3.?10.?49
BS1 BlueStacks Emulator on 3 GHz Phenom via Windows 7
Screen pixels w x h 1024 x 600
Android Build Version 2.3.4
BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
Screen pixels w x h 1440 x 852
Android Build Version 4.4.2
|
To Start
Roy Longbottom January 2016
The Official Internet Home for my Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|