Logo

Roy Longbottom at Linkedin Roy Longbottom's Android NEON
Benchmark Apps

For latest results see Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM, Intel and MIPS.

Logo

Contents


General NeonSpeed Benchmark NeonSpeed Results
NEON MP MFLOPS Benchmark NEON MP MFLOPS Results NEON-Linpack Benchmark
NEON Linpack Results NEON-Linpack-MP Benchmark Linpack-MP Thread Overheads
NEON-Linpack-MP Results Program Code Used Systems Used

Download Benchmark Apps

A Settings, Security option may need changing to allow installation of non-Market applications

Logo NeonSpeed.apk
Cache, RAM MBytes/second
Download
Logo NEON-MFLOPS-MP.apk
CPU, Cache, RAM MFLOPS
Download
Logo NEON-Linpack.apk
NEON Floating Point
Download
Logo NEON-Linpack-MP.apk
NEON MP Floating Point
Download

All have an option to save results via Email

Note that the apps will not install unless the more modern vfp3 hardware is available and should provide a cannot run message if NEON hardware is not detected. For maximum and consistent performance, some units might need setting of a CPU Mode (example ICS Settings, Developer Options, CPU Mode, Change Normal to Performance).

Logo Versions that will run on both ARM and Intel CPUs, in Native Mode, are available via Android Native ARM-Intel Benchmarks.htm. Results are included below. Besides running on x86 and the latest armeabi-v7a CPUs, code is included for the older armeabi, arm64-v8a, x86-64, mips and mips64 processors, automatically selected at run time, but not yet tested. Latest, from August 2015, are measurements on a 64 bit ARM CPU. Details are in Android 64 Bit Benchmarks.htm. Results are also below.

To Start


General

Roy Longbottom’s PC Benchmark Collection comprises numerous FREE benchmarks and reliability testing programs, for processors, caches, memory, buses, disks, flash drives, graphics, local area networks and Internet. Original ones run under DOS and later ones under all varieties of Windows. Most have also been converted to run under Linux and now many as Android Apps.

Android Benchmarks.htm provides details, results and links to download the apps and source code. The latter are written in Java or in C with a Java front end. Further variations are available as MultiThreading Benchmarks.

These benchmarks were developed on a Linux Ubuntu PC, using Eclipse Integrated Development Environment for Java, Android Software Development Kit and compiled C/C++ code from Native Development Kit (all free software). For these set of benchmarks, NEON instructions are produced by C/C++ functions using special intrinsic functions. For further detail see Program Code Used. Eclipse projects for the benchmarks, with source code, can be downloaded from www.roylongbottom.org.uk/Android NEON Benchmarks.zip.

Versions 1.1 January 2013 - Coloured display instead of black and white, background image, wider format for HD displays.

Results are now included for a Tablet (A1) using an Intel Atom CPU. Android versions have a compatibility layer called Houdini that maps ARM instructions into X86 instructions. This can impose a heavy overhead.

Standard Layout

Has right/left scroll to see all details. The Save button Emails results to me AND/OR whoever is desired.

Phone


To Start


Logo NEON Cache and RAM Benchmark

This benchmark carries out the same calculations as Android MemSpeed. It measures data reading speeds in Mega Bytes per second, with functions accessing arrays of cache and RAM based data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m] single precision floating point with x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can calculated by dividing single precision MB/second by 4 and 8, for the two tests. Cache sizes are indicated by varying performance as memory usage changes.

The first set of tests are run using normal compilations then via NEON intrinsic functions, with the second set only via NEON. Unlike MemSpeeed, double precision floating point tests are not used as this is not supported in NEON. Below is an example of results, with system information as included in Emailed results from all benchmarks.


    Android NeonSpeed Benchmark 15-Dec-2012 14.38

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16    860   2575   2325   2918   3053   3245
      32    950   2551   2400   2823   2944   3131
      64    744   1396   1329   1434   1465   1496
     128    713   1342   1319   1365   1392   1417
     256    714   1339   1311   1357   1377   1400
     512    708   1323   1299   1348   1358   1383
    1024    608    875    869    917    930    952
    4096    460    493    492    481    488    504
   16384    460    498    487    507    506    504
   65536    459    495    469    251    503    505

          Total Elapsed Time   11.5 seconds

 System Information

 Screen pixels w x h 1280 x 736 

 Android Build Version      4.2

 Processor       : ARMv7 Processor rev 9 (v7l)
 processor       : 0 to 3
 BogoMIPS        : 1993.93

 Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls 
 CPU implementer : 0x41
 CPU architecture: 7
 CPU variant     : 0x2
 CPU part        : 0xc09
 CPU revision    : 9

 Hardware        : grouper
 Revision        : 0000
 Serial          : 0f410a0001440200

 Linux version 3.1.10-g22b4fcd (android-build@vpbs1.mtv.corp.google.com) 
 (gcc version 4.6.x-google 20120106 (prerelease) (GCC) ) #1 SMP PREEMPT
 Fri Nov 2 10:55:26 PDT 2012
   


To Start


NeonSpeed Results

Using the particular instruction sequences, NEON can provide a floating point performance gain approaching three times, using data from L1 cache, and twice via L2. Integer performance using NEON provides a lesser gain and few gains are identified using data from RAM.

The v=v+s*v calculations are the same that determine the Linpack Benchmark single precision score. For example, the Nexus 7 SP result was 201 MFLOPS. Maximum L1 cache results here, without/with NEON, are 238 and 644 MFLOPS (950/4 and 2575/4) but Linpack can be influenced by the lower L2 cache speeds.

As with other benchmarks tablet T7 and T4 cache speeds are similar and, compared with P11, lower than might be expected. Results can be compared with Sngl (Normal Float) and SSE columns in SSE3DNow Benchmark Results On PCs.

August 2013 - Comparing the new Cortex-A15 in Tablet T11 results with those for the older Cortex-A9 in Phone P11, shows that the new processor is much faster executing normal and NEON floating point instructions. Adjusting for the same CPU MHz, the A15 is at least three times faster, using cached based data. Similar ratios apply to NEON integer tests. T11 was also run with Power Saving option On, when the CPU runs at 1000 MHz. Note the full speed version appears to kick off at 1000 MHz.

February 2015 - Atom system A1 is much faster than Cortex-A9 based devices, taking into account CPU MHz, but the Cortex-A15 averages 85% faster from cached based data. The Atom has a 64 bits memory bus width, compared with 32 bits for ARM processors. This leads to the Atom being over 70% faster, using calculations with RAM data.

The native Intel code, on the Atom, produced some performance gains, mainly using L1 cache based data, but speed in other areas is probably limited by data flow. The later compiler produced some slower speeds on ARM base tablet T11 and better/worse variations on T21.

August 2015 - Results provided for T22 with Cortex-A53 64 bit CPU and 64 bit Android 5.0. As with NEON-Linpack, many results from 32 bit and 64 bit compilations, via NEON intrinsic functions, were similar. With normal code, the 64 bit compilations were up to near four times faster than those at 32 bits.

October 2015 - T7 Nexus 7, Android 5.0.2 upgrade, slightly slower but back to normal with 5.1.1, also ARM/Intel version now same as older program (not shown). T22 Android 5.0.2 to 5.1 (ARM-v8 CPU) produced performance gains of around 3%, on the 32 bit benchmark and on most cache based tests at 64 bits, with no gain using RAM data (see below).


 *****************************************************
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

Android NeonSpeed Benchmark V1.1 02-Feb-2015 17.09

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   1778   3940   2807   5474   4997   5062
      32   1781   3576   2636   4431   4316   4291
      64   1772   3589   2639   4480   4337   4332
     128   1784   3589   2641   4423   4320   4320
     256   1766   3592   2642   4400   4347   4358
     512   1784   3585   2633   4375   4350   4355
    1024   1705   3253   2448   3760   3789   3788
    4096   1673   3021   2366   3257   3245   3237
   16384   1672   2948   2349   3062   3157   3151
   65536   1675   2967   2345   3190   3168   3168

          Total Elapsed Time   10.8 seconds

 #################### A1 ARM-Intel ######################

 ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 16.54

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   1816   5996   4916   6244   6882   6880
      32   1851   4703   3985   5200   5609   5711
      64   1862   3845   3121   4174   4441   4520
     128   1841   3929   3110   4179   4411   4487
     256   1863   3932   3092   4179   4412   4493
     512   1861   3938   3090   3894   4215   4415
    1024   1784   3475   2738   3130   3223   3443
    4096   1741   2376   2649   2998   3112   3139
   16384   1774   3086   2780   3116   3140   3145
   65536   1774   2987   2547   2328   3126   3072

          Total Elapsed Time   10.1 seconds

  
 *****************************************************
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2

 Android NeonSpeed Benchmark V1.1 09-Aug-2013 17.10

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int

      16   3793   9641   4375  13023  13456  13562
      32   5777  11410   4993  11718  11365  11143
      64   4122   6692   3855   6539   6682   7210
     128   4017   6565   3849   6475   6520   6983
     256   4067   6562   3836   6459   6495   7038
     512   3900   6531   3820   6428   6490   7095
    1024   1821   2544   1774   2532   2554   2539
    4096   1141   1645   1536   1612   1615   1635
   16384   1437   1695   1490   1576   1694   1668
   65536   1424   1675   1475   1699   1687   1694

          Total Elapsed Time   11.2 seconds

 Measured CPU MHz - 1700 (most of the time?)

  

Power Saving Mode - 1000 MHz

KBytes Norm Neon Norm Neon Float Int 16 3783 8153 3196 8499 8021 8050 32 2683 4925 2409 4789 4898 4853 64 2438 4125 2273 3927 4117 4388 128 2392 3866 2259 3815 3824 4076 256 2390 3867 2249 3780 3826 4164 512 2383 3821 2248 3788 3825 4098 1024 1404 2073 1337 2047 1994 2075 4096 1440 1559 1416 1561 1560 1562 16384 1447 1535 1418 1540 1540 1541 65536 1466 1540 1448 1161 1544 1524 Total Elapsed Time 11.6 seconds #################### T11 ARM-Intel #################### ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.17 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 2252 4964 3321 6602 7304 7237 32 4202 8364 4543 8366 8553 8101 64 3710 6096 3860 6570 6348 6182 128 3802 5581 3874 6044 5624 5877 256 3654 5618 3501 6154 5655 5783 512 3597 5688 3723 6130 5812 5684 1024 1727 2466 1659 2481 2454 2472 4096 1479 1718 1421 1714 1713 1706 16384 1488 1704 1435 1576 1705 1694 65536 1477 1755 1453 1754 1759 1752 Total Elapsed Time 10.8 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s Android NeonSpeed Benchmark V1.1 23-Jul-2015 13.00 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 4324 13809 4498 14660 17501 18186 32 3587 6845 2922 8073 6981 7035 64 3347 6894 2912 8078 6964 6938 128 3343 6651 2919 7922 6726 6999 256 3511 6963 3002 8071 6902 6897 512 3476 6628 3025 7827 6613 6818 1024 3172 4627 2773 6424 4800 4806 4096 2653 2051 2378 3613 2090 2054 16384 2356 1891 2118 3165 1955 1962 65536 2424 1923 2167 3368 1933 1925 Total Elapsed Time 9.9 seconds #################### T21 ARM-Intel #################### ARM/Intel NeonSpeed Benchmark V1.1 23-Jul-2015 13.03 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3623 16704 4623 15187 17446 16719 32 3455 9210 2997 8723 9280 9112 64 3336 7721 3002 8544 8469 8581 128 3415 7664 3111 8481 7549 7638 256 3584 7526 3087 8500 7849 7805 512 3538 7422 3154 8266 7567 7541 1024 3513 7227 3067 7789 7294 7261 4096 2302 1673 2413 3107 1693 1677 16384 2286 1616 2323 3024 1620 1617 65536 2322 1617 2271 2505 1634 1600 Total Elapsed Time 9.9 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.32 Compiled for 32 bit ARM v7a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 971 3853 1807 4059 3957 4397 32 970 3812 1800 3983 3891 4323 64 927 3228 1605 3038 3269 3521 128 926 3321 1681 3343 3354 3596 256 936 3386 1693 3449 3413 3667 512 898 2889 1578 2996 2927 3118 1024 794 1859 1345 2057 1996 1924 4096 794 1796 1250 1788 1813 1835 16384 792 1773 1270 1820 1829 1864 65536 796 1811 1289 1852 1832 1880 Total Elapsed Time 11.3 seconds ################ T22 Android 5.1 32 Bit ################ ARM/Intel NeonSpeed Benchmark V1.2 28-Sep-2015 21.23 Compiled for 32 bit ARM v7a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 974 3984 1864 4066 4079 4529 32 961 3532 1760 3682 3599 3908 64 965 3401 1735 3526 3450 3714 128 974 3522 1773 3661 3579 3868 256 974 3539 1769 3679 3597 3866 512 927 2856 1610 2899 2895 3071 1024 818 2054 1396 2081 2063 2116 4096 832 1845 1305 1831 1855 1886 16384 831 1846 1317 1747 1853 1886 65536 834 1846 1311 1879 1871 1909 Total Elapsed Time 11.2 seconds ###################### T22 64 Bit ###################### ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.37 Compiled for 64 bit ARM v8a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3054 4055 3605 4376 4911 5094 32 2922 3787 3435 4198 4546 4682 64 2795 3514 3259 3658 4050 4116 128 2886 3529 3373 3924 4148 3963 256 2883 3641 3264 3942 4193 4276 512 2454 3165 2985 3385 3586 3542 1024 1633 2000 1835 2043 2114 2105 4096 1738 1893 1899 1900 1956 1955 16384 1757 1870 1886 1802 1921 1846 65536 1755 1875 1870 1903 1936 1937 Total Elapsed Time 10.2 seconds ################ T22 Android 5.1 64 Bit ################ ARM/Intel NeonSpeed Benchmark V1.2 28-Sep-2015 22.26 Compiled for 64 bit ARM v8a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3143 4189 3726 4662 5100 5263 32 2913 3717 3420 4074 4408 4530 64 2830 3609 3341 3921 4161 4274 128 2972 3749 3387 4060 4309 4405 256 2985 3720 3503 4074 4340 4429 512 2635 3184 2515 3581 3768 3836 1024 1896 2020 2020 2092 2088 2091 4096 1799 1852 1887 1742 1949 1951 16384 1739 1864 1876 1777 1942 1944 65536 1697 1824 1743 1875 1882 1897 Total Elapsed Time 10.4 seconds ***************************************************** P30 Quad Core 1.9 GHz Qualcomm Snapdragon 600, Android 4.4.2 Android NeonSpeed Benchmark V1.1 02-Apr-2015 12.14 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3191 10541 3473 13321 15048 15180 32 3349 4349 2238 4601 4382 4476 64 2672 4620 1990 5798 4629 4699 128 1990 4492 2706 5766 4702 4665 256 2238 4408 2749 5360 4499 4605 512 2579 4636 2857 5950 3942 4644 1024 3213 3084 2467 3634 4244 3445 4096 1190 1349 1120 2306 1123 1356 16384 1254 1391 1188 1997 1316 1372 65536 1363 1363 1276 1826 1335 1369 Total Elapsed Time 11,5 seconds ***************************************************** P11 Galaxy SIII, Quad Cortex-A9 1.4 GHz, Android 4.0.4 Android NeonSpeed Benchmark 23-Dec-2012 14.31 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 696 2698 2874 2969 3071 3054 32 1111 2639 2830 2893 2998 3002 64 975 1829 1747 1898 1724 1931 128 971 1578 1609 1785 1818 1779 256 982 1721 1582 1790 1847 1850 512 982 1705 1284 1777 1834 1863 1024 916 1573 1520 1615 1735 1751 4096 895 1181 1240 1109 1180 1175 16384 895 1161 1210 1038 1155 1154 65536 893 1158 1214 575 1148 1157 Total Elapsed Time 11.9 seconds ***************************************************** T7 Nexus 7 Quad 1300 MHz Cortex-A9, Android 4.1.2 Android NeonSpeed Benchmark 15-Dec-2012 14.38 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 860 2575 2325 2918 3053 3245 L1 32 950 2551 2400 2823 2944 3131 64 744 1396 1329 1434 1465 1496 L2 128 713 1342 1319 1365 1392 1417 256 714 1339 1311 1357 1377 1400 512 708 1323 1299 1348 1358 1383 1024 608 875 869 917 930 952 4096 460 493 492 481 488 504 RAM 16384 460 498 487 507 506 504 65536 459 495 469 251 503 505 Total Elapsed Time 11.5 seconds Measured CPU MHz - 1200 #################### T7 ARM-Intel ##################### ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.07 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 881 2440 2501 3334 3206 3465 32 901 1868 1705 2260 2083 2186 64 801 1395 1365 1573 1548 1581 128 784 1282 1278 1405 1389 1411 256 787 1279 1285 1420 1380 1409 512 777 1266 1267 1409 1370 1394 1024 604 786 762 769 770 828 4096 458 479 477 463 486 488 16384 436 447 448 469 470 469 65536 450 472 469 240 482 483 Total Elapsed Time 11.5 seconds ***************************************************** T10 Samsung Galaxy Note GT-N7000 Dual core 1.4 GHz Cortex-A9 Android NeonSpeed Benchmark V1.1 25-jun-2013 09.39 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 1107 2665 2780 2905 2963 3120 L1 32 1029 1339 1500 2027 2251 2338 64 977 1721 1656 1698 1834 1806 L2 128 984 1654 1568 1676 1741 1760 256 960 1592 1530 1649 1720 1653 512 859 1586 1297 1300 1511 1474 1024 691 753 689 920 668 654 RAM 4096 488 524 467 500 502 505 16384 485 496 426 518 365 513 65536 488 463 446 257 507 494 Total Elapsed Time 12,0 seconds ***************************************************** T4 Miumiu w17 Pro 7 inch tablet, dual 1500 MHz Cortex-A9 Android NeonSpeed Benchmark 15-Dec-2012 16.01 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 888 2460 2274 2755 2867 3026 L1 32 835 1864 1799 2099 2163 2255 64 798 1454 1236 1498 1540 1582 L2 128 775 1376 1292 1418 1455 1486 256 766 1349 1256 1388 1425 1458 512 547 750 720 766 801 795 1024 388 443 438 447 456 456 RAM 4096 359 404 391 386 410 401 16384 288 389 396 407 414 404 65536 366 414 396 204 418 405 Total Elapsed Time 13.1 seconds Measured CPU MHz - 1200 ***************************************************** P24 LG Volt, Quad Core 1.2 GHz Snapdragon Cortex-A7 Android NeonSpeed Benchmark V1.1 21-Aug-2014 02.02 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 1014 1943 1518 2047 2021 2262 32 997 1691 1372 1746 1751 1927 64 912 1596 1317 1634 1639 1749 128 786 1279 1115 1550 1509 1721 256 933 1504 1270 1527 1549 1688 512 869 729 856 1048 924 1497 1024 624 970 904 1006 985 1086 4096 368 482 593 456 488 506 16384 340 455 604 475 478 494 65536 375 506 655 246 480 527 Total Elapsed Time 11.7 seconds ***************************************************** T2 WayTeq xTAB-70 7 inch tablet, 800 MHz Cortex-A9 Android NeonSpeed Benchmark 15-Dec-2012 15.32 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 609 1678 1427 1892 1755 1855 L1 32 591 1394 1099 1344 1458 1529 64 514 995 957 994 1023 1093 L2 128 419 716 624 664 736 636 256 247 275 261 260 281 263 RAM 512 247 245 248 250 236 228 1024 261 247 242 244 246 239 4096 246 247 245 228 241 242 16384 238 243 250 239 231 229 65536 249 239 244 118 237 254 Total Elapsed Time 20.1 seconds ***************************************************** ET1 Device ARM Emulator 2.4 GHz Core 2 Duo Android NeonSpeed Benchmark 15-Dec-2012 14.33 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 95 72 237 100 69 102 32 95 53 247 95 69 102 64 96 54 247 97 69 105 128 96 53 254 100 69 100 256 92 50 255 98 67 102 512 96 62 389 153 74 101 1024 96 53 255 98 68 101 4096 76 53 253 93 48 101 16384 83 46 249 97 70 36 65536 101 43 209 49 66 100 Total Elapsed Time 74.8 seconds ***************************************************** BS1 BlueStacks Emulator on 3 GHz Phenom Android NeonSpeed Benchmark 15-Dec-2012 14.47 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 128 131 568 1828 238 2461 32 138 144 682 1969 255 2461 64 138 144 639 2133 262 2462 128 102 132 568 1939 238 2462 256 132 144 639 1971 238 2327 512 124 142 639 1973 256 2565 1024 136 146 640 1969 255 2136 4096 132 142 602 1973 256 1984 16384 118 119 578 1638 258 1831 65536 136 152 595 1747 262 1820 Total Elapsed Time 25.6 seconds


To Start


Logo NEON MP MFLOPS

This is based on my MP-MFLOPS.apk benchmark program, details of which can be found in Android MultiThreading Benchmark Apps. The program is run using 1, 2, 4 and 8 threads. The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word. Data sizes are also limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). In this case, the operations sequences are generated by NEON Intrinsic Functions.

The program checks for consistent numeric results, primarily to show that all calculations are carried out in the threads. Each thread uses the same calculations but accesses different segments of the data. The same number of instructions are executed in each thread test at a particular memory size, to produce identical numeric results. Example performance and sumchecks are shown below.


 Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 16.57

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      532     402     124    1135    1044     960
 2T     1255     798     213    2041    1987    1916
 4T     2441    1553     229    4185    4034    3450
 8T     1922    2403     226    3774    3996    3346
 Results x 100000, 12345 indicates ERRORS
 1T    86735   98519   99984   79897   97638   99975
 2T    86735   98519   99984   79897   97638   99975
 4T    86735   98519   99984   79897   97638   99975
 8T    86735   98519   99984   79897   97638   99975

          Total Elapsed Time    4.5 seconds

 System Information - as NeonSpeed
   


To Start


NEON MP MFLOPS Results

Using 2 operations per word from 12.8 KB and 128 KB (L1 and L2 caches), performance can increase in line with the number of cores, but from 12.8 MB, is more limited by RAM speed. With 32 operations per word, performance is limited by CPU speed, with throughput increasing in line with the number of cores for all input data sizes.

Maximum 32 Ops/Word speeds from the original compiled version of MP-MFLOPS have been added to the results below. NEON performance gains are up to 80%. As with other multitasking benchmarks, P11 seems to be reluctant to use all cores with four threads. The first T1 test on this system was also slower than could be expected (should be >700).

August 2013 - Again for new Cortex-A15 versus Cortex-A9 shows that the former has a much better MFLOPS/MHz ratio. This benchmark was also run using the 1000 MHz Power saving mode, where many of the results were similar to the original. The benchmark was modified to run each test ten times longer. This produced some faster speeds (see below) but CPU clock frequency measurements indicated that it was still running at 1000 MHz for a lot of the time. Behaviour was similar to the non-NEON version .

February 2015 - Results for Atom system A1 suffer from the Intel to ARM instruction mapping, on the CPU speed limited tests at 32 operations per word. The native Intel code produced limited gains at two operations per word but was more than twice as fast on the CPU speed limited tests.

A revised version, with extended running time, is available via Android Long MP Benchmarks.htm.

To maintain compatibility with other versions of these tests, NEON intrinsic functions vaddq_f32, vmulq_f32 and vsubq_f32 were used. These operate on four floating point numbers at a time but are potentially not as fast as using vmlaq_f32, the linked add and multiply function (used in NeonSpeed). The 32 Ops tests also resort to repetitively loading the same constants from memory (L1 cache?), probably due to an insufficient number of registers and this might reduce data flow speed.

August 2015 - T22 NEON 64 bit compilation produced a small performance gain over 32 bit results, at 2 operations per word, but near double speed at 32 operations, the former suffering from fewer registers for the variables. Using one core, maximum speed was 2.77 GFLOPS, rising to 10.8 GFLOPS via four cores (best so far relative to CPU GHz).

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, at 64 bits. Performance, with 8 threads, is up to 23.6 GFLOPS, and up to nearly 3.5 results per clock cycle, using one core.

October 2015 - T7 Nexus 7, Android 5.0.2 then up tp 5.1.1 upgrades produced similar speeds (not shown). Except with 8 threads using the 64 bit version, T22 Android 5.0.2 to 5.1 (ARM-v8 CPU) produced performance gains, mainly of more than 3% (see below).


 *****************************************************
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

 Android NEON-MFLOPS-MP Benchmark V1.1 07-Feb-2015 18.37

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1110    1319     878    1188    1139    1226
 2T     2470    2114     996    2406    2427    2390
 4T     3159    2211     988    4148    3487    4006
 8T     2066    2486    1003    4144    3944    4077

          Total Elapsed Time    3.6 seconds
 
     Longer Tests

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T     1796    1520    1025    1231    1228    1227      573
 2T     3354    2959    1047    2427    2445    2445     1115
 4T     4627    5508     978    4690    4791    4733     2258
 8T     3861    6307    1030    4611    4869    4742     2217

          Total Elapsed Time   88.3 seconds

#################### A1 ARM-Intel ######################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.17

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T     2151    1962    1064    2619    2694    2650     1055
 2T     4421    3849    1048    5296    5463    5343     2102
 4T     5886    6652     982    9592   10735   10362     4145
 8T     3744    7284    1018    9085   10791    9493     4110

          Total Elapsed Time   13.8 seconds

  
 *****************************************************
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2

 Android NEON-MFLOPS-MP Benchmark V1.1 13-Sep-2013 13.44

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T     1878    1433     616    2556    3078    2893     1481
 2T     3672    2720     673    5789    5903    6451     2992
 4T     4833    4606     690    6578    7680    5135     3134
 8T     4019    4474     676    6607    7685    7256     2796

          Total Elapsed Time    1.9 seconds

 Measured CPU MHz - 1700 
 
        Power Saving Mode - 1000 MHz

        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1907    1024     619    2484    2326    2357
 2T     3664    2734     652    4871    4769    4609
 4T     3342    3125     656    4768    4855    4482
 8T     3121    3228     667    4763    4902    4582

         Total Elapsed Time    2.4 seconds
 
        Longer Tests - 10 Times

        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1847    1415     597    3772    4096    3545
 2T     3649    3309     664    8065    7966    7505
 4T     3670    3922     658    7753    8148    7490
 8T     5664    5570     681    8092    8355    7672

          Total Elapsed Time   13.0 seconds

#################### T11 ARM-Intel ####################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.07

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T     1965    1630     582    3792    4077    3521     1537
 2T     3789    2690     663    8497    8133    7297     3151
 4T     5714    4883     654    8364    8192    7554     3095
 8T     5414    6316     673    7976    8437    6635     3125

          Total Elapsed Time   13.0 seconds


 *****************************************************

 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
     Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s

 Android NEON-MFLOPS-MP Benchmark V1.1 27-Jul-2015 11.45

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T      556    1224     784    2695    2567    2872     1245
 2T     3655    3655    1361    5563    5590    5560     2426
 4T     5731    5107    1252    5357    6045    6350     4165
 8T     6757    5485    1419    7220    7846    8601     4885

          Total Elapsed Time    1.8 seconds

        Longer Tests

 Android NEON-MFLOPS2-MP Benchmark V2.1 25-Jul-2015 18.44

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     2757    2576     771    2808    2825    2800
 2T     5662    5525    1516    5631    5664    5570
 4T     6550    7846    1945   11167   11281   10939
 8T    10273   10928    1981   10851   11211   11350

          Total Elapsed Time   40.0 seconds


 ##################### P33 64 Bit ##################### 

 P33 Quad-core 2 GHz Qualcomm Snapdragon 810, Android 5.0.2 

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 16-Sep-2015 17.59
           Compiled for 64 bit ARM v8a

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     2811    3126    1089    6943    6589    6342
 2T     2488    4114    1541   12084   10559    8809
 4T     4759    5480    2038   16516   14826   11960
 8T     4840    8985    2452   22082   23563   12461

          Total Elapsed Time    7.6 seconds


 #################### T21 ARM-Intel #################### 

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 28-Jun-2015 16.32

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T     3049    2857     622    2923    2874    2098     1232
 2T     5508    4887    1009    5477    5736    4349     2463
 4T     5643    5282    1410   11244   11601    8564     4900
 8T     9294   11156    1681   11288   11605    8946     4880

          Total Elapsed Time   14.0 seconds


 ###################### T22 32 Bit ######################

  T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.35
           Compiled for 32 bit ARM v7a

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T      619     613     575    1444    1446    1426      672
 2T     1174    1206     889    2894    2902    2839     1345
 4T     1585    1616     901    5679    5726    5596     2669
 8T     2075    2130     944    5400    5585    5519     2672

          Total Elapsed Time   25.8 seconds

 ################ T22 Android 5.1 32 Bit ################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 28-Sep-2015 21.22
           Compiled for 32 bit ARM v7a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
KB     12.8     128   12800    12.8     128   12800
MFLOPS
1T      638     628     593    1501    1498    1475
2T     1256    1257     901    2957    2998    2953
4T     1670    2246     941    5676    5837    5853
8T     2221    2275    1019    5718    5699    5710

          Total Elapsed Time   24.9 seconds

 ###################### T22 64 Bit ######################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.38
           Compiled for 64 bit ARM v8a

      FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T      726     745     647    2766    2774    2639     1398
 2T     1397    1402     903    5523    5552    5371     2797
 4T     1871    1930     898   10780   10479   10439     5546
 8T     2496    2876    1011    9736   10679    9900     5500

          Total Elapsed Time   15.1 seconds

 ################ T22 Android 5.1 64 Bit ################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 28-Sep-2015 22.26
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
KB     12.8     128   12800    12.8     128   12800
MFLOPS
1T      786     780     702    2849    2868    2742
2T     1496    1542     943    5693    5682    5629
4T     1919    2065     995   10622   10687   10122
8T     2494    2691     997   10187   10793   10123

          Total Elapsed Time   14.6 seconds

 
 *****************************************************
 T23  Dual Core 1.6 GHz Intel Atom Z2560, Android 4.2

 Android NEON-MFLOPS-MP Benchmark V1.1 11-Aug-2015 22.06

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      268     286     293     868     800     893
 2T      432     459     546    1398    1708    1354
 4T      619     678     542    1779    2183    2117
 8T      600     583     567    2108    2515    2185

          Total Elapsed Time    6.4 seconds

 
 *****************************************************
 P30 Quad Core 1.9 GHz Qualcomm Snapdragon 600, Android 4.4.2

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1237    1096     514    1885    2223    2434
 2T     3679    2304    1151    3344    2890    4159
 4T     4599    3898    1205    5475    6692    6702
 8T     3187    4877    1081    5626    5909    5805

          Total Elapsed Time    2,6 seconds

  
 *****************************************************
 P11 Galaxy SIII, Quad Cortex-A9 1.4 GHz, Android 4.0.4

  Android NEON-MFLOPS-MP Benchmark V1.0 23-Dec-2012 14.33

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T      343     404     312    1194    1125    1172      675
 2T     1456     939     357    2317    2387    2151     1342
 4T     1899    1712     304    2946    3042    2828     1824
 8T     2037    2158     513    3517    3395    3420     2666

           Total Elapsed Time    4.0 seconds


 *****************************************************
 T7 Nexus 7 Quad 1300 MHz Cortex-A9, Android 4.1.2

 Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 16.57

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T      532     402     124    1135    1044     960      643
 2T     1255     798     213    2041    1987    1916     1193
 4T     2441    1553     229    4185    4034    3450     2374
 8T     1922    2403     226    3774    3996    3346     2385

          Total Elapsed Time    4.5 seconds

 Measured CPU MHz - 1200 

 #################### T7 ARM-Intel #####################

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.24

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
         2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      657     407     132    1077    1074    1053
 2T     1265     817     222    2147    2150    2078
 4T     2024    1695     234    4214    4276    3555
 8T     2435    2495     234    4196    4100    3523

          Total Elapsed Time   39.0 seconds


 *****************************************************
 T10 Samsung Galaxy Note GT-N7000 Dual core 1.4 GHz Cortex-A9

     FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      267     317     253    1111    1194    1111
 2T      726     580     227    1550    2313    2154
 4T     1437    1312     312    2322    2369    2171
 8T     1403    1352     330    2393    2320    2037

          Total Elapsed Time    5,0 seconds

  
 *****************************************************
 T4 Miumiu w17 Pro 7 inch tablet, dual 1500 MHz Cortex-A9

 Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 17.03

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T      596     390     115    1045    1057     931      588
 2T     1145     787     161    2009    1878    1858     1117
 4T     1130    1033     165    2016    2036    1902     1144
 8T     1171    1189     165    2018    2049    1879     1141

          Total Elapsed Time    5.8 seconds

 Measured CPU MHz - 1200 


 *****************************************************
 T2 WayTeq xTAB-70 7 inch tablet, 800 MHz Cortex-A9

 Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 17.03

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800  Original
 MFLOPS
 1T      367     172      57     643     640     622      393
 2T      405     290      64     643     648     643      395
 4T      399     380      66     657     671     653      390
 8T      390     399      62     650     679     646      392

          Total Elapsed Time   13.2 seconds


 *****************************************************
 ET1 Device ARM Emulator 2.4 GHz Core 2 Duo

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T       24      24      24      38      38      37
 2T       24      24      24      36      37      37
 4T       24      21      24      37      38      37
 8T       24      24      24      37      37      37

        Total Elapsed Time  179.6 seconds
   


To Start


Logo NEON-Linpack

The Linpack Benchmark was produced from the "LINPACK" package of linear algebra routines. It became the primary benchmark for scientific applications, particularly under Unix, from the mid 1980's, with a slant towards supercomputer performance. The benchmark operates on 100x100 matrices. Performance is governed by an inner loop in function daxpy() with a linked triad dy[i] = dy[i] + da * dx[i], and is measured in Millions of Floating Point Operations Per Second (MFLOPS). Various double precision versions and a single precision variety are available from Android Benchmarks.htm.

This version replaces the the main daxpy calculations with NEON functions. See Program Code Used below. This has a NEON define, besides UNROLL, that it replaces. Two other UNROLL areas were duplicated with NEON defines. The numeric results, as shown below, are identical to those for the non-NEON single precision version.

As indicated above, the benchmarks were recompiled to use both Intel and ARM processors. The ARM variety produced the same numeric results as the first below, but the Intel answers, now included, are different, probably a rounding complication. It was a complete surprise to discover that ARM intrinsic functions were converted to Intel SIMD SSE instructions, with significant performance improvement on an Atom based tablet (see #I below and assembly code here).


 Android NEON Linpack Benchmark 15-Jan-2013 12.24

 Speed              382.46 MFLOPS

 norm. resid                 1.6
 resid            3.80277634e-05
 machep           1.19209290e-07
 x[0]-1          -1.38282776e-05
 x[n-1]-1        -7.51018524e-06

 System Information - as NeonSpeed

########################################################

 ARM/Intel NEON Linpack Benchmark V 1.003-May-2015 11.50

 Speed              900.17 MFLOPS

 norm. resid                 1.7
 resid            4.00543213e-05
 machep           1.19209290e-07
 x[0]-1          -1.38282776e-05
 x[n-1]-1        -7.51018524e-06
   


To Start


NEON-Linpack Results

Results from other versions are included below. Performance gains through using NEON functions is between 87% and 98%. See also Linpack Results.htm.

August 2013 - Tablet T11, with the Cortex-A15 CPU, runs at 2.5 times the speed of an A9 processor of the same MHz (if one existed).

February 2015 - Tablet A1 with Intel Atom, ARM code being converted via an Android compatibility layer, called Houdini, that maps ARM instructions into X86 instructions, but not Java that is fast.

May 2015 - The new ARM/Intel version produced significantly improved performance on the Intel Atom based tablet, surprisingly using native SSE instructions, but tablets with ARM processors produced little change see #I. Further details can be found in Android Native ARM-Intel Benchmarks.htm).

July 2015 - T21, with the 2150 MHz Qualcomm Snapdragon 800, is not quite as fast as T11, with a Cortex-A15 running at 1700 MHz.

August 2015 - T22 NEON Linpackresults from 32 bit and 64 bit compilations were similar, as the programs use a limited number of identical intrinsic functions.

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with SP speed of 1277 MFLOPS at 64 bits.

October 2015 - T7 Nexus 7, Android 5.0.2, then up to 5.1.1, upgrades produced similar speeds (not shown). T22 Android 5.0.2 to 5.1 (ARM-v8 CPU), as with the other Linpack benchmark speeds, NEON version results were a little faster, average around 3% (see below).


 System   ARM    MHz   Android  Linpackv5  Linpackv7  LinpackSP NEONLinpack LinpackJava
 See                              MFLOPS     MFLOPS     MFLOPS     MFLOPS     MFLOPS

  T2    v7-A9    800     2.3.4     10.56     101.39     129.05     255.77      33.36
  T4    v7-A9   1500a    4.0.3     16.86     155.52     204.61     382.46      56.89
  T7    v7-A9   1300a    4.1.2     17.08     151.05     201.30     376.00      56.44
  T7 #I v7-A9   1300a    4.1.2               159.34     199.84     346.78  
  P11   v7-A9   1400     4.0.4     19.89     184.44     235.54     454.21      56.99
  P30   QU-600  1900     4.4.2                                    1027.49 
  T11   v7-A15  2000b    4.2.2     28.82     459.17     803.04    1334.90     143.06
  T11#I v7-A15  2000b    4.2.2               826.36     952.88    1411.86
  A1    Z3745   1866     4.4.2     59.39     168.16     296.63     443.42     252.49
  A1 #I Z3745   1866     4.4.2               362.63     408.87     900.17
  T21   QU-800  2150     4.4.3     35.39     389.52     751.95    1250.14     340.44
  T21#I QU-800  2150     4.4.3               629.92     790.83    1325.00
  P33   QU-810  2000     5.0.2                                    1446.42
  T22#I v8-A53  1300     5.0.2     21.44     172.28     180.64     407.08      86.09
  T22#I v8-A53  1300     5.1                 178.04     187.03     421.86      91.28

  64 Bit Version
  T22#I v8-A53  1300     5.0.2               338.00     479.69     505.12
  T22#I v8-A53  1300     5.1                 347.55     492.78     520.79


  P33   QU-810  2000     5.0.2                         1277.76 

 Measured MHz a 1200, b 1700, Z3745 = Intel Atom, QU = Qualcomm CPU, #I ARM/Intel Version
   

To Start


Logo NEON-Linpack-MP

This version uses mainly the same C programming code as the single precision floating point NEON compilation above. It is run run on 100x100, 500x500 and 1000x1000 matrices using 0, 1, 2 and 4 separate threads. The 0 thread procedures are identical to above and MFLOPS speeds should be the same, subject to reasonable variations.

The code differences were slight changes to allow a higher level of parallelism. The initial 100x100 Linpack benchmark is only of use for measuring performance of single processor systems. The one for shared memory multiple processor systems is a 1000x1000 variety. The programming code for this is the same as 100x100, except users are allowed to use their own linear equation solver.

Unlike the NEON MP MFLOPS benchmark, that carries out the same multiply/add calculations, this program can run much slower using multiple threads. This is due to the overhead of creating and closing threads too frequently. At 100x100, around 0.67 million floating point calculations are executed in daxpy, the critical function. With the present equations, threads have to be created 99 times (unless someone can do better and change more things). At 100x100, data size is 40 KB, with L2 cache coming into play. With larger matrices, performance becomes more dependent on RAM, but multi-threading overheads have less influence.

This benchmark can execute the required functions multiple times and the last pass is used to determine numerical results. Those displayed are for the unthreaded pass but these are compared with threaded results to show that the same instructions are calculated. A message is displayed in the event of comparison failures.


  Android Linpack NEON SP MP Benchmark 31-Jan-2013 12.14

    MFLOPS 0 to 4 Threads, N 100, 500, 1000
  Threads      None        1        2        4
  N  100     413.47    45.95    48.22    48.34
  N  500     253.08   187.51   189.69   189.94
  N 1000     148.76   135.49   136.08   136.17

  NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1
  N              100             500            1000
  NR            1.60            3.96           11.32
  RE  3.80277634e-05  4.72068787e-04  2.70068645e-03
  MA  1.19209290e-07  1.19209290e-07  1.19209290e-07
  X0 -1.38282776e-05  5.26905060e-05  1.62243843e-04
  XN -7.51018524e-06  3.26633453e-05 -6.65783882e-05
 Thread
  0 - 4 Same Results    Same Results    Same Results

        Total Elapsed Time   54.196 seconds

         System Information - as NeonSpeed
   


To Start


NEON-Linpack-MP Multithreading Overheads

Below are additional timing details, in terms of microseconds per pass, for tests on Tablet T7. The number of floating point operations per pass, as specified, is 2 x (n x n x n) / 3 + 2 x (n x n) and the timings can be derived from MFLOPs, using this constant. A surprise is that the overheads are fairly constant and not influenced by the number of threads used. Thread processing is used approximately n times and microseconds per pass is sometimes proportional to n, but can increase due to using a slower higher level cache or RAM.

With these high overheads, there can be no improvements in performance using multiple cores. With faster hardware and lower overheads, the benchmark can produce gains, as shown below for the same program compiled to run on 64-Bit Linux. This uses Intel SSE instructions where four adds or multiplies can be executed simultaneously. Minimum overheads for processing threads for this ARM CPU, at N=100, are (592 - 6)/100 = 56.6 microseconds. That for the two desktop processors is around 20 microseconds.


         1.3 GHz Quad Core ARM Cortex-A9

  Threads   None        1        2        4     None        1        2        4
            ------------ MFLOPS -----------    ----- Microseconds Per Pass ----
    N
   10     145.30     0.87     0.84     0.92        6      992     1025      937
   20     224.08     2.88     2.85     3.03       27     2132     2152     2025
   40     316.61    10.64    10.38    11.21      145     4310     4417     4093
   50     350.79    17.76    16.75    17.82      252     4972     5273     4957
  100     413.47    45.95    48.22    48.34     1661    14944    14240    14205
  500     253.08   187.51   189.69   189.94   331252   447087   441949   441367
 1000     148.76   135.49   136.08   136.17  4494936  4935174  4913776  4910529

         2.4 GHz Core 2 Duo

  100    1666.02   287.94   200.82   134.17      412     2385     3419     5118
  500    1908.89  1422.59  1902.42  1507.04    43917    58930    44067    55628
 1000    1921.33  1624.31  2606.09  2306.14   348023   411662   256579   289951

         3.0 GHz Quad Core Phenom II

  100    1924.69   279.90   206.19   141.13      357     2453     3330     4865
  500    2059.73  1333.07  1510.81  1247.76    40701    62887    55489    67187
 1000    2074.59  1682.34  2314.57  2478.78   322313   397462   288895   269756
 


To Start


NEON-Linpack-MP Results

P11 is said to have a revised version of the Cortex-A9 CPU, with wider internal busses and dual channel memory. This is probably responsible for better performance at N = 500 and 1000.

August 2013 - Tablet T11, with the Cortex-A15 CPU, continues to show significant performance gains, compared with an A9 processor of the same MHz, when multiple threads are not used and the data array size is increased. As with other MP-NEON benchmarks, the program was run using the 1000 MHz Power Saving setting, confirming that the original was running at this frequency for the multithreading tests.

February 2015 - Best results, at this time, were for T15, with the Qualcomm Snapdragon S4. Atom system A1 is not that good, except memory based speed, at threadless N=1000, is better than the other ARM results. The new ARM/Intel version again demonstrated a doubling of measured MFLOPS on the Atom, using the smaller matrices.

July 2015 - T21, with the Qualcomm Snapdragon 800, obtains significantly fastest results, at unthreaded N = 500.

August 2015 - T22 results from 32 bit and 64 bit compilations were again similar, due to the programs use a limited number of identical intrinsic functions.

October 2015 - T7 Nexus 7, Android 5.0.2, then up to 5.1.1 upgrades produced similar speeds (not shown). T22 Android 5.0.2 to 5.1 (ARM-v8 CPU) is shown to produce performance gains on all tests.


 -------------------------------------------------------
    MFLOPS 0 to 4 Threads, N 100, 500, 1000

 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
   Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s

  Threads      None        1        2        4

  N  100     452.39    21.00    23.48    17.48
  N  500     663.38   275.56    88.66   312.71
  N 1000     617.04   380.60   191.26   195.61

        Total Elapsed Time   63.747 seconds

#################### A1 ARM-Intel ######################

 ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 13.58

  Threads      None        1        2        4

  N  100     971.71    37.72    36.36    39.66
  N  500    1311.37   488.73   487.85   488.98
  N 1000     945.97   727.85   737.95   742.34

       Total Elapsed Time   59.966 seconds


 -------------------------------------------------------
 T15 Qualcomm Snapdragon S4 2265 MHz?, Android 4.4

  Threads      None        1        2        4

  N  100    1478.99    85.77    87.02    85.55
  N  500    1426.67   730.69   726.84   731.90
  N 1000     754.87   640.59   641.40   629.42

       Total Elapsed Time   35.982 seconds

  
 -------------------------------------------------------
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2

  Threads      None        1        2        4

  N  100    1399.82    54.86    55.31    54.66
  N  500    1154.21   434.16   434.06   436.97
  N 1000     571.26   482.57   487.25   485.80

       Total Elapsed Time   46.226 seconds

    Expected CPU MHz - 1700 

  

Power Saving Mode - 1000 MHz

Threads None 1 2 4
N 100 799.73 54.98 54.72 54.82 N 500 914.53 453.81 464.44 350.87 N 1000 545.42 481.56 487.00 485.20 Total Elapsed Time 49.572 seconds #################### T11 ARM-Intel #################### ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.44 Threads None 1 2 4 N 100 1497.90 61.13 63.13 61.87 N 500 1399.10 491.49 489.29 494.69 N 1000 586.14 499.00 504.97 497.49 Total Elapsed Time 43.952 seconds ------------------------------------------------------- T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s Android Linpack NEON SP MP Benchmark 26-Jul-2015 11.46 Threads None 1 2 4 N 100 1311.08 12.38 12.93 15.05 N 500 2271.56 344.04 419.52 381.73 N 1000 837.30 540.99 523.52 564.87 Total Elapsed Time 143.534 seconds #################### T21 ARM-Intel #################### ARM/Intel Linpack NEON SP MP Benchmark 26-Jul-2015 11.51 Threads None 1 2 4 N 100 1308.07 14.89 11.77 11.63 N 500 2341.17 407.96 481.02 415.12 N 1000 901.21 551.80 566.77 564.31 Total Elapsed Time 145.750 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.52 Compiled for 32 bit ARM v7a Threads None 1 2 4 N 100 460.74 22.35 23.16 23.82 N 500 480.63 336.52 339.94 303.66 N 1000 470.02 405.86 403.01 405.98 ################ T22 Android 5.1 32 Bit ################ ARM/Intel Linpack NEON SP MP Benchmark 1.2 28-Sep-2015 21.20 Compiled for 32 bit ARM v7a MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 478.29 22.91 26.14 24.45 N 500 526.25 349.09 343.33 350.01 N 1000 488.62 420.83 416.43 415.80 ###################### T22 64 Bit ###################### ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.57 Compiled for 64 bit ARM v8a Threads None 1 2 4 N 100 548.67 27.70 33.93 37.00 N 500 470.04 285.95 297.79 301.67 N 1000 519.02 441.84 443.47 441.91 ################ T22 Android 5.1 64 Bit ################ ARM/Intel Linpack NEON SP MP Benchmark 1.2 28-Sep-2015 22.24 Compiled for 64 bit ARM v8a MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 573.90 34.43 26.00 41.28 N 500 607.89 389.67 353.51 322.91 N 1000 541.80 449.28 461.96 461.27 ------------------------------------------------------- P30 Quad Core 1.9 GHz Qualcomm Snapdragon 600, Android 4.4.2 Threads None 1 2 4 N 100 1054.21 34.62 34.13 33.61 N 500 1607.52 468.21 505.96 492.88 N 1000 713.69 494.88 514.91 507.47 Total Elapsed Time 61,232 seconds ------------------------------------------------------- T16 Iconbit Nettab Skat RX, Quad Core Cortex-A9, 1.8 GHz MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 536.23 48.08 49.84 49.53 N 500 315.97 166.70 169.30 166.50 N 1000 241.52 221.36 224.23 217.57 Total Elapsed Time 56,409 seconds ------------------------------------------------------- P11 Galaxy SIII, Quad Cortex-A9 1.4 GHz, Android 4.0.4 Threads None 1 2 4 N 100 455.90 42.37 41.76 37.32 N 500 395.16 326.43 321.82 309.55 N 1000 355.77 322.98 323.71 322.24 Total Elapsed Time 38.691 seconds ------------------------------------------------------- T7 Nexus 7 Quad 1300 MHz Cortex-A9, Android 4.1.2 Threads None 1 2 4 N 100 413.47 45.95 48.22 48.34 N 500 253.08 187.51 189.69 189.94 N 1000 148.76 135.49 136.08 136.17 Total Elapsed Time 54.196 seconds Measured CPU MHz - 1200 #################### T7 ARM-Intel ##################### ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.40 Threads None 1 2 4 N 100 385.49 28.79 29.06 29.25 N 500 272.07 184.85 183.70 183.18 N 1000 147.09 131.92 132.44 130.05 Total Elapsed Time 64.318 seconds ------------------------------------------------------- T4 Miumiu w17 Pro 7 inch tablet, dual 1500 MHz Cortex-A9 Threads None 1 2 4 N 100 406.08 66.79 66.42 66.95 N 500 153.53 120.17 120.04 121.44 N 1000 117.31 104.05 104.48 106.27 Total Elapsed Time 66.327 seconds Measured CPU MHz - 1200 ------------------------------------------------------- T2 WayTeq xTAB-70 7 inch tablet, 800 MHz Cortex-A9 Threads None 1 2 4 N 100 251.41 28.06 28.15 28.24 N 500 70.81 57.66 57.34 54.47 N 1000 66.55 62.65 62.23 62.13 Total Elapsed Time 124.115 seconds ------------------------------------------------------- P18 Huawei Y300, Dual-core 1 GHz Cortex-A5 MFLOPS 0 to 4 Threads, N 100, 500, 1000 Threads None 1 2 4 N 100 239.86 40.83 40.95 40.60 N 500 189.84 136.34 136.14 136.17 N 1000 172.03 153.29 152.03 153.24 Total Elapsed Time 49,811 seconds


To Start


Program Code Details

Android.mk file details to include NEON Intrinsics is shown below, on the right, and under that is the main loop in the function that uses the intrinsics to calculate x[i] = x[i] + c * y[i] (for NeonSpeed). In this case, starting with vld1q_f32 to load four single precision floating point numbers to 2 x 64 bit vector registers (four words as Intel SSE). The vmla vector multiply accumulate instruction executes the linked multiply and add function. The non-NEON test includes four loop increments (i to i+3) for 4 loads from each array and increasing this for more loads made little difference in performance. With NEON, four increments means one vld1q for each array. To provide somewhat better performance, four loads are used with 16 word increments.

Slight changes of the Android.mk file are required to produce an assembly instruction listing, as on the left below. The listing shows a one for one conversion of the intrinsics, with one of each extra add, compare and branch instructions for loop control. The compiler also appears to attempt optimisation by overlapping scalar adds with vector instructions.

Numerous intrinsics are available and identified in Summary of NEON intrinsics.

The compiler used with Eclipse does not carry out automatic vectorisation, but the powerful assembly level intrinsics might mean that this is not very important. Automatic vectorisation is available in the GNU ARM toolchain under Linux package, but it does not appear possible to (easily?) convert the compiled code to run under Android. This was installed and the test functions compiled with the arm-linux-gnueabi-gcc command, shown below (-S for assembly listing). In this case, the code was compiled to operate on two words at a time, instead of four, and would probably be much slower.

Critical assembly code for ARM/Intel NEON-Linpack benchmark is shown below.


NeonSpeed

Android.mk Normal Android.mk For disassembly Unrolled x[i] = x[i] + c * y[i] LOCAL_PATH := $(call my-dir) LOCAL_PATH := $(call my-dir) include $(CLEAR_VARS) include $(CLEAR_VARS) LOCAL_MODULE := neonspeedlib LOCAL_CFLAGS := -save-temps ifeq ($(TARGET_ARCH_ABI),armeabi-v7a) ifeq ($(TARGET_ARCH_ABI),armeabi-v7a) LOCAL_CFLAGS : += -DHAVE_NEON=1 LOCAL_SRC_FILES = neonspeed.c.neon LOCAL_SRC_FILES = neonspeed.c.neon endif endif LOCAL_STATIC_LIBRARIES := cpufeatures LOCAL_STATIC_LIBRARIES := cpufeatures include $(BUILD_SHARED_LIBRARY) include $(BUILD_SHARED_LIBRARY) $(call import-module,cpufeatures) $(call import-module,cpufeatures) for(i=0; i < size/16; i++) .L31: { add r7, r1, #16 x41 = vld1q_f32(ptrx1); vld1.32 {d24-d25}, [r0] x42 = vld1q_f32(ptrx2); vld1.32 {d6-d7}, [r1] x43 = vld1q_f32(ptrx3); vld1.32 {d30-d31}, [r7] x44 = vld1q_f32(ptrx4); add r7, r1, #32 y41 = vld1q_f32(ptry1); add r4, r0, #48 y42 = vld1q_f32(ptry2); vld1.32 {d28-d29}, [r7] y43 = vld1q_f32(ptry3); add r5, r0, #32 y44 = vld1q_f32(ptry4); add r6, r0, #16 z41 = vmlaq_f32(x41, y41, c4); add r7, r1, #48 z42 = vmlaq_f32(x42, y42, c4); vmla.f32 q12, q3, q8 z43 = vmlaq_f32(x43, y43, c4); adds r2, r2, #1 z44 = vmlaq_f32(x44, y44, c4); vld1.32 {d22-d23}, [r6] vst1q_f32(ptrx1, z41); vld1.32 {d20-d21}, [r5] vst1q_f32(ptrx2, z42); vld1.32 {d18-d19}, [r4] vst1q_f32(ptrx3, z43); vld1.32 {d26-d27}, [r7] vst1q_f32(ptrx4, z44); adds r1, r1, #64 ptrx1 = ptrx1 + 16; vst1.32 {d24-d25}, [r0] ptry1 = ptry1 + 16; adds r0, r0, #64 ptrx2 = ptrx2 + 16; cmp r2, r3 ptry2 = ptry2 + 16; vmla.f32 q11, q15, q8 ptrx3 = ptrx3 + 16; vmla.f32 q10, q14, q8 ptry3 = ptry3 + 16; vmla.f32 q9, q13, q8 ptrx4 = ptrx4 + 16; vst1.32 {d22-d23}, [r6] ptry4 = ptry4 + 16; vst1.32 {d20-d21}, [r5] } vst1.32 {d18-d19}, [r4] bne .L31 arm-linux-gnueabi-gcc -O3 -mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -S memspeed.c vld1.32 {d20}, [r3] loading data to 1 register not 2 e.g. (d20-d21) vld1.32 {d17}, [r8] vld1.32 {d18}, [r5] vld1.32 {d19}, [lr] vmla.f32 d17, d20, d16 using single 64 bit registers not 2 e.g. q11, q15, q8 vmla.f32 d18, d19, d16

NEON-Linpack

Original Code Replaced by #ifdef UNROLL #ifdef NEON m = n % 4; float cf[4]; if ( m != 0) float32x4_t x41, y41, c41, r41; { float32_t *ptrx1 = (float32_t *)dx; for (i = 0; i < m; i++) float32_t *ptry1 = (float32_t *)dy; dy[i] = dy[i] + da*dx[i]; float32_t *ptrc1 = (float32_t *)cf; if (n < 4) return; for (i=0; i<4; i++) } { for (i = m; i < n; i = i + 4) cf[i] = da; { } dy[i] = dy[i] + da*dx[i]; m = n % 4; dy[i+1] = dy[i+1] + da*dx[i+1]; if ( m != 0) dy[i+2] = dy[i+2] + da*dx[i+2]; { dy[i+3] = dy[i+3] + da*dx[i+3]; for (i = 0; i < m; i++) } dy[i] = dy[i] + da*dx[i]; if (n < 4) return; #endif } ptrx1 = ptrx1 + m; ptry1 = ptry1 + m; c41 = vld1q_f32(ptrc1); for (i = m; i < n; i=i+4) { x41 = vld1q_f32(ptrx1); y41 = vld1q_f32(ptry1); r41 = vmlaq_f32(y41, x41, c41); vst1q_f32(ptry1, r41); ptrx1 = ptrx1 + 4; ptry1 = ptry1 + 4; } #endif

Main Assembly Code Instructions ARM/Intel Version

ARM NEON multiply + accumulate vld1.32 {d16-d17}, [r1] 4 words in q registers vmla.f32 q8, q10, q9 vst1.32 {d16-d17}, [r1] Intel 4 words SSE multiply and add movaps (%edx), %xmm0 can be linked in CPU mulps %xmm2, %xmm0 p = parallel, last s = addps (%eax), %xmm0 single precision movaps %xmm0, (%eax)


To Start


Systems Used



 T2      Device WayTeq xTAB-70 7 inch tablet, 800 MHz Cortex-A9
         Screen pixels w x h 600 x 800 
         Android Build Version      2.3.4
         Processor     : ARMv7 Processor rev 1 (v7l)
         BogoMIPS     : 2035.71
         Features : swp half thumb fastmult vfp edsp neon vfpv3 
         CPU part   : 0xc09                    - Cortex-A9
         Linux version 2.6.34

 T4      Device Miumiu w17 Pro 7 inch tablet, dual 1500 MHz  Cortex-A9
         Screen pixels w x h 600 x 976 
         Android Build Version      4.0.3 - Ice Cream Sandwich
         Processor  : ARMv7 Processor rev 0 (v7l)
         processor  : 0  BogoMIPS : 2393.70
         processor  : 1  BogoMIPS : 2393.70
         Features   : swp half thumb fastmult vfp edsp neon vfpv3 
         CPU part   : 0xc09                    - Cortex-A9
         Hardware   : Amlogic Meson6 g04 customer platform
         Linux version 3.0.8
 
 T7      Device Google Nexus 7 quad core CPU 1.3, GHz 1.2 GHz > 1 core
         RAM 1 GB DDR3L-1333 Bandwidth 5.3 GB/sec
         Screen pixels w x h 1280 x 736 MHz 
         Twelve-core Nvidia GeForce ULP graphics 416 MHz
         Android Build Version      4.1.2
         Processor : ARMv7 Processor rev 9 (v7l)
         processor : 0  BogoMIPS : 1993.93
         processor : 1  BogoMIPS : 1993.93
         processor : 2  BogoMIPS : 1993.93
         processor : 3  BogoMIPS : 1993.93
         Features  : swp half thumb fastmult vfp edsp neon vfpv3 tls 
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant     : 0x2
         CPU part        : 0xc09             - Cortex-A9
         CPU revision    : 9
         Hardware        : grouper           - nVidia Tegra 3 T30L
         Revision        : 0000
         Linux version    3.1.10

 T10     Samsung Galaxy Note GT-N7000 Dual core 1.4 GHz Cortex-A9
         Screen pixels w x h 800 x 1280
         Android Build Version      4.1.2
         Processor : ARMv7 Processor rev 1 (v7l)
         processor : 0
         BogoMIPS : 1592.52
         processor : 1
         BogoMIPS : 2786.91
         Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant : 0x2
         CPU part : 0xc09
         CPU revision : 1
         Hardware : SMDK4210
         Revision : 0008
         Linux version 3.0.31

 T11     Voyo A15, Samsung EXYNOS 5250 Dual core 2.0 GHz Cortex-A15, 
         Mali-T604 GPU, 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/s
         Screen pixels w x h 1920 x 1032 
         Android Build Version      4.2.2  - Jelly Bean
         Processor : ARMv7 Processor rev 4 (v7l)
         processor : 0
         BogoMIPS  : 992.87
         processor : 1
         BogoMIPS  : 997.78
         Features  : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant     : 0x0
         CPU part        : 0xc0f
         CPU revision    : 4
         Hardware        : SMDK5250
         Linux version 3.4.35Ut

 T15     Qualcomm Snapdragon S4 2265 MHz?
         Screen pixels w x h 1080 x 1776
         Android Build Version      4.4
         Processor : ARMv7 Processor rev 0 (v7l)
         processor : 0 to 3
         BogoMIPS : 38.40
         Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
         CPU implementer : 0x51
         CPU architecture: 7
         CPU variant : 0x2
         CPU part : 0x06f
         CPU revision : 0
         Hardware : Qualcomm MSM 8974 HAMMERHEAD (Flattened Device Tree)
         Revision : 000b
         Linux version 3.4.0-

 T16     Iconbit Nettab Skat RX, Quad Core Cortex-A9, 1.8 GHz
         Screen pixels w x h 1024 x 720
         Android Build Version      4.1.1
         Processor : ARMv7 Processor rev 0 (v7l)
         processor : 0, 1, 2, 3
         BogoMIPS : 2015.34
         Features : swp half thumb fastmult vfp edsp neon vfpv3
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant : 0x3
         CPU part : 0xc09
         CPU revision : 0
         Hardware : RK30board
         Revision : 0000
         Linux version 3.0.36

 T21     Kindle Fire HDX 7, 2.2 GHz  Quad Core Qualcomm Snapdragon 800 (Krait 400) 
         2 x 32 Bit LPDDR3-1866 Memory, 14.9 GB/s, GPU Qualcomm Adreno 330, 578 MHz
         Device Amazon KFTHWI
         Screen pixels w x h 1200 x 1803 
         Android Build Version      4.4.3
         Processor       : ARMv7 Processor rev 0 (v7l)
         processor       :  0, 1, 2, 3
         BogoMIPS        : 38.40
         Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
         CPU implementer : 0x51
         CPU architecture: 7
         CPU variant     : 0x2
         CPU part        : 0x06f
         CPU revision    : 0
         Hardware        : Qualcomm MSM8974
         Revision        : 0000
         Linux version 3.4.0-perf (gcc version 4.7) 

 T22     Lenovo Tab 2 A8-50, 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 
         1 GB LPDDR3, GPU Mali T720  MP2
         Device LENOVO Lenovo TAB 2 A8-50F
         Screen pixels w x h 800 x 1216
         Android Build Version      5.0.2
         Processor : AArch64 Processor rev 3 (aarch64)
         processor : 0, 1, 2
         BogoMIPS  : 26.0
         Features : fp asimd aes pmull sha1 sha2 crc32
         CPU implementer : 0x41
         CPU architecture: AArch64
         CPU variant : 0x0
         CPU part : 0xd03
         CPU revision : 3
         Hardware : MT8161
         Linux version 3.10.65 

 T23     Samsung Galaxy Tab 3 10.1 P5220, 1.6 GHz Dual Core Atom Z2560
         ARM Emulator Mode
         Screen pixels w x h 1280 x 800
         Android Build Version      4.2.2
         Processor : ARMv7 processor rev 1 (v7l)
         BogoMIPS : 1500
         Features : neon vfp swp half thumb fastmult edsp vfpv3
         CPU implementer : 0x69
         CPU architecture: 7
         CPU variant : 0x1
         CPU part : 0x001
         CPU revision : 1
         Hardware : placeholder
         Revision : 0001
         Linux version 3.4.34 

 P11     Samsung Galaxy SIII, Quad Core 1.4 GHz Cortex-A9
         Dual Channel DDR2 RAM
         Screen pixels w x h 720 x 1280
         Android Build Version      4.0.4
         Processor : ARMv7 Processor rev 0 (v7l)
         processor : 0  BogoMIPS : 1592.52
         processor : 1  BogoMIPS : 2786.91
         processor : 3  BogoMIPS : 398.13
         Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant : 0x3
         CPU part : 0xc09
         CPU revision : 0
         Hardware : SMDK4x12
         Revision : 000c
         Serial : 3b065f3d4df1bb2d
         Linux version 3.0.15

 P18     Huawei Y300, Dual-core 1 GHz Cortex-A5
         Screen pixels w x h 800 x 480 
         Android Build Version      4.1.1
         Processor       : ARMv7 Processor rev 1 (v7l)
         processor       : 0
         BogoMIPS        : 668.86
         processor       : 1
         BogoMIPS        : 398.13
         Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant     : 0x0
         CPU part        : 0xc05
         CPU revision    : 1
         Hardware        : MSM8x25 U8833 BOARD
         Linux version 3.4

 P24     LG Volt, Quad Core 1.2 GHz Snapdragon Cortex-A7
         Screen pixels w x h 540 x 960
         Android Build Version      4.4.2
         Processor : ARMv7 Processor rev 3 (v7l)
         processor : 0, 1, 2, 3
         BogoMIPS : 38.40
         Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
         CPU implementer : 0x41
         CPU architecture: 7
         CPU variant : 0x0
         CPU part : 0xc07
         CPU revision : 3
         Hardware : Qualcomm MSM 8226 (Flattened Device Tree)
         Revision : 0006
         Linux version 3.4.0

 P30     Galaxy S4 i9505, Quad Core 1.9 GHz Snapdragon 600
         Screen pixels w x h 1080 x 1920
         Android Build Version      4.4.2
         Processor : ARMv7 Processor rev 0 (v7l)
         processor : 0, 1, 2, 3
         BogoMIPS : 13.53
         Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4
         CPU implementer : 0x51
         CPU architecture: 7
         CPU variant : 0x1
         CPU part : 0x06f
         CPU revision : 0
         Hardware : SAMSUNG JF
         Revision : 000b
         Linux version 3.4.0 (gcc version 4.7)

 P33     Sony Xperia Z3+ E6533, Quad-core 1.5 GHz & Quad-core 2 GHz Qualcomm
         Snapdragon 810 64-bit CPU
         Screen pixels w x h 1080 x 1776
         Android Build Version      5.0.2
         Processor : AArch64 Processor rev 1 (aarch64)
         processor : 0 to 7
         Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
         CPU implementer : 0x41
         CPU architecture: 8
         CPU variant : 0x1
         CPU part : 0xd07
         CPU revision : 1
         Hardware : Qualcomm Technologies, Inc MSM8994
         Linux version 3.?10.?49

 A1      Asus MemoPad 7 ME176CEX, 1.86 GHz Atom Intel Atom Z3745 
         Screen pixels w x h 800 x 1216
         Android Build Version      4.4.2
         Processor : ARMv7 processor rev 1 (v7l)
         BogoMIPS : 1500.0
         Features : neon vfp swp half thumb fastmult edsp vfpv3
         CPU implementer : 0x69
         CPU architecture: 7
         CPU variant : 0x1
         CPU part : 0x001
         CPU revision : 1
         Hardware : placeholder
         Revision : 0001
         Linux version 3.10.20

 ET1     Device Emulator 2.4 GHz Core 2 Duo
         Screen pixels w x h 600 x 1024 
         Android Build Version      4.0.4
         Processor       : ARMv7 Processor rev 0 (v7l)
         BogoMIPS        : 292.45
         Features        : swp half thumb fastmult vfp edsp neon vfpv3 
         CPU implementer : 0x41
         CPU part        : 0xc08
         Linux version 2.6.29

 BS1     BlueStacks Emulator on 3 GHz Phenom
         Screen pixels w x h 1024 x 600
         Android Build Version      2.3.4
         processor       : 0
         vendor_id       : AuthenticAMD
         cpu family      : 16
         model           : 4
         model name      : AMD Phenom(tm) II X4 945 Processor
         stepping        : 2
         cpu MHz         : 3013.000
         cache size      : 512 KB
         -
         -
         bogomips        : 26686.25
         Linux version 2.6.38
  
To Start




Roy Longbottom at Linkedin Roy Longbottom January 2016

The Official Internet Home for my Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection