Android 9 Benchmarks and Stress Tests On 32 Bit and 64 Bit CPUs
|
System CPU MHz Android Vax MIPS See MIPS /MHz 32 Bit P37 v8-A53 1500 7.0 1464 0.98 T23 V8-A72 1800 5.1.1 3560 1.98 T26 v8-A73 2000 9.0 4514 2.26 64 Bit P42 v8-A57 2000 5.1.1 9525 4.76 P43 Ex8890 2300 7.0 13495 5.87 P44 v8-A73 2350 8.0 10188 4.33 P45 v8-A73 2000 9.0 8442 4.22 |
The Linpack benchmark speed is measured in MFLOPS, officially for double precision floating point calculations. For fastest speed, the benchmark was recompiled to use single precision numbers natively and via NEON instructions. Performance of this original Linpack scalar (N = 100) benchmark is almost entirely dependent on the calculation x[i]=x[i]+c*y[i], where changes in CPU instructions used can have a dramatic effect. Results from various hardware and software platforms can be found in Linpack Benchmark results, and more in an earlier Android report plus later 2018 report.
Besides MFLOPS ratings, calculations of MFLOPS per MHz are also shown below. The 64 bit ratios are generally better than at 32 bits and all better than those for the Whetstone benchmark. Java result are typically at half speed, compared with the other SP and DP scores. NEON versions include some SIMD intrinsic functions, where 64 bit compilations interpret them using different vector instructions.
System CPU MHz Android DP SP NEON SP Java SP MFLOPS/MHz See MFLOPS MFLOPS MFLOPS MFLOPS DP SP NEON Java 32 Bit P37 v8-A53 1500 7.0 205.59 224.64 478.05 112.14 0.14 0.15 0.32 0.07 T23 V8-A72 1800 5.1.1 1023.13 988.49 1661.21 227.70 0.57 0.55 0.92 0.13 T26 v8-A73 2000 9.0 927.79 914.55 1714.51 425.40 0.46 0.46 0.86 0.21 64 Bit P43 Ex8890 2300 7.0 998.84 1177.12 1025.52 752.06 0.43 0.51 0.45 0.33 P42 v8-A57 2000 5.1.1 1163.06 1324.15 1695.86 744.02 0.58 0.66 0.85 0.37 P44 v8-A73 2350 8.0 1378.91 1384.14 2718.22 567.04 0.59 0.59 1.16 0.24 P45 v8-A73 2000 9.0 1123.38 1122.70 2149.39 464.63 0.56 0.56 1.07 0.23 Numeric Sumchecks 32 bit + new 64 bit compilations 64 bit earlier compilation DP SP, NEON SP DP SP, NEON SP norm resid 1.7 1.6 1.9 2.0 resid 7.41628980E-14 3.80277634E-05 8.46778499E-14 4.69621336E-05 machep 2.22044605E-16 1.19209290E-07 2.22044605E-16 1.19209290E-07 x[0]-1 -1.49880108E-14 -1.38282776E-05 -1.11799459E-13 -1.31130219E-05 x[n-1]-1 -1.89848137E-14 -7.51018524E-06 -9.60342916E-14 -1.30534172E-05 |
The Livermore Loops comprise 24 kernels of numerical application with speeds calculated in MFLOPS (double precision). A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. Details and results from various hardware and software platforms are provided in Livermore Loops Benchmark results report (including Windows tablet versions running on desktop PCs), with further Android based results in an Earlier Report and in the Later 2018 Report.
Below are MFLOPS scores for the 24 kernels and MFLOPS per MHz calculations for maximum, geometric mean and minimum values.
Linpack DP ratios are also replicated to show similarities, where newer technology and 64 bit working obtain higher ratings.
System CPU MHz Android MFLOPS 24 Loops MFLOPS/MHz Max Geomean Min Linpack 32 Bit P37 v8-A53 1500 7 236 267 250 328 229 225 0.29 0.14 0.07 0.14 435 229 391 213 185 167 111 113 187 228 316 316 186 250 185 138 286 136 T23 V8-A72 1800 5.1.1 890 1050 882 1121 244 612 0.82 0.32 0.08 0.57 1473 955 1221 772 597 586 262 337 521 636 853 984 272 426 521 313 683 139 T26 v8-A73 2000 9 792 957 660 861 232 697 0.81 0.29 0.08 0.46 1626 966 1244 753 495 440 310 392 483 631 848 1022 279 452 696 408 668 153 64 Bit P43 Ex8890 2300 7 2180 1714 1289 1272 744 999 1.00 0.45 0.15 0.43 2189 2282 2071 1168 857 1501 395 379 606 1411 1298 1070 556 642 1222 351 1423 541 P42 v8-A57 2000 5.1.1 1413 1017 774 766 354 602 0.84 0.32 0.12 0.58 1315 1663 507 633 390 1075 292 426 489 845 785 833 366 553 666 252 672 361 P44 v8-A73 2350 8 1734 1101 1078 1064 595 856 0.96 0.42 0.18 0.59 2119 2178 2144 1000 591 1240 551 639 820 925 1118 1521 550 934 1130 565 1079 453 P45 v8-A73 2000 9 1414 903 880 871 492 713 0.91 0.39 0.13 0.56 1659 1820 1717 847 497 1014 275 467 663 779 721 577 323 798 948 745 897 375 |
This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision (DP and SP) floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing DP MB/second by 8 and 16, for the two tests, and SP speeds by 4 and 8. For more details and older results see this archived file and earlier report. also in the later 2018 publication.
Maximum MFLOPS and MFLOPS/MHz calculations are included below. These tests use continuous data streaming, with those using cached data producing better performance than Linpack SP and DP speeds. With scalar processing SP and DP MFLOPS are generally quite similar but, with vector processing, SP can be expected to be much faster. This is reflected by comparing the two new Cortex A73 results, where the 64 bit version maximum SP MFLOPS are near twice as fast as on the 32 bit system.
The 64 bit A73 based system produces the fastest RAM speeds, here. Other benchmark results might give a better indication of the reason why.
MemSpeed MB/second Memory x[m]=x[m]+s*y[m]I+ x[m]=x[m]+y[m] x[m]=x[m]+s*y[m]I+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int Dble Sngl Int Dble Sngl Int 32 Bit P37 v8-A53 1500 MHz T23 v8-A72 1800 MHz 16 4718 2397 2408 5062 2829 2534 8586 4644 5402 10763 5664 6781 32 4465 2355 2319 4659 2714 2426 10411 5890 5642 11458 6464 7245 64 4160 2312 2231 4271 2601 2331 10438 5867 5629 11413 6326 7050 128 4136 2300 2219 4246 2563 2182 10465 6022 5686 11471 6384 7082 256 4037 2298 2215 4214 2572 2311 10486 6024 5705 11396 6350 7130 512 3380 2074 1993 3426 2278 2082 10482 6057 5702 11398 6420 7162 1024 1746 1478 1494 1756 1548 1508 7291 4955 4734 4126 5361 5686 4096 1552 1366 1389 1565 1417 1409 5640 5433 5270 3643 5571 5686 16384 1573 1373 1394 1583 1435 1421 5624 5550 5356 4455 5671 5793 65536 1623 1387 1422 1642 1464 1444 5850 5531 5275 5135 5499 5806 MFLOPS 590 599 1311 1514 Per/MHz 0.39 0.40 0.73 0.84 T26 v8-A73 2000 MHz 16 6441 4947 5062 13950 6025 6321 32 11504 5072 5070 13997 6036 6334 64 11513 5070 5062 14015 6033 6328 128 9249 4946 5008 9843 5802 6192 256 9275 5064 5064 9334 6014 6320 512 9305 5072 5065 9325 6014 6332 1024 6798 4699 4719 6909 5350 5552 4096 3993 4208 4196 4461 4346 4401 16384 3788 4209 4213 4287 4199 4249 65536 3615 3894 4248 4188 4343 4126 MFLOPS 1439 1268 Per/MHz 0.72 0.63 64 Bit P42 v8-A57 2000 MHz P43 Ex8890 2300 MHz 16 13753 8271 8801 16927 8837 9117 2050 4018 4220 4439 3472 3701 32 12295 7621 7988 13158 7772 8011 4024 4026 4182 4280 3492 3709 64 12746 8124 8674 12822 8496 8728 4031 4035 4112 4006 3490 3482 128 12514 7896 8691 12662 8302 8746 4034 4035 4112 3996 3441 3419 256 12289 8050 8598 12206 8176 8650 4029 4029 4112 3991 3411 3407 512 12039 8057 8605 12138 8242 8350 4026 4027 4113 3987 3405 3405 1024 9706 6146 6400 8420 6197 6529 4025 4014 4106 3983 3387 3403 4096 3339 3213 3236 3358 3168 3175 4042 4029 4225 4208 3475 3641 16384 2863 2826 2822 2869 2764 2771 4051 4044 4239 4203 3478 3648 65536 2867 2825 2830 2881 2774 2771 4057 4056 4249 4256 3483 3650 MFLOPS 1719 2068 507 1014 Per/MHz 0.86 1.03 0.22 0.44 P44 v8-A73 2350 MHz P45 v8-A73 2000 MHz 16 16231 11823 13239 17201 10001 10271 11512 9705 10737 12270 8062 8010 32 16185 12080 13535 17276 10161 9756 11517 9769 11009 12128 8094 8111 64 16408 11703 13063 17317 10014 10303 11519 9776 10971 6286 3690 3793 128 11893 10762 11285 11549 8924 8893 9750 8744 9290 9727 7293 7224 256 11423 11126 11441 10869 9093 9151 9325 8979 9262 9262 7433 7337 512 11411 11263 11434 11316 9071 8882 9297 8996 9284 9262 7450 7407 1024 10997 10777 11435 11308 9094 9062 7675 7478 7651 7688 6596 6562 4096 5448 5260 5458 5424 5304 5246 6141 5997 6095 6157 5878 5869 16384 5245 5322 5349 5327 5182 5087 6245 6219 6262 6256 5992 6039 65536 4704 4749 4783 4668 4710 4585 6387 6361 6375 6349 6020 5980 MFLOPS 2051 3020 1440 2444 Per/MHz 0.87 1.29 0.72 1.22 |
This benchmark carries out the same calculations as the MemSpeed Benchmark, except they are all in single precision, for comparison with NEON sections. The latter are carried out using NEON intrinsic functions. For further details and results see earlier Android report also in the later 2018 publication.
For earlier reports I had not worked out why the Normal speeds could be much faster than MemSpeed SP results. Now I have rediscovered the reason. A has NEON compile parameter, included for the Intrinsic Functions, leads to normal compilation producing NEON instructions. At 64 bits, different vector instructions are compiled, where. in this case, can be slower that in the 32 bit versions.
Given a full hardware implementation, with fused multiply and add, the first tests could produce a maximum of 8 results per clock cycle, realistically with an average of more than 6 MFLOPS per MHz. Best here is 3.73.
Unlike most others, that CPU indicated much faster performance than MemSpeed and NeonSpeed, using RAM based data.
NeonSpeed Vector Reading Speed in MBytes/Second Float v=v+s*v Int v=v+v+s Neon v=v+v Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int Norm Neon Norm Neon Float Int 32 Bit P37 v8-A53 1500 MHz T23 v8-A72 1800 MHz 16 3192 4154 3979 4546 4445 5021 12874 13877 14961 15288 15874 17150 32 2847 3638 3446 3932 3790 4162 12106 11977 11935 11963 12514 12760 64 2817 3573 3416 3839 3721 4058 12095 11998 12248 12482 13011 13062 128 2884 3653 3578 3905 3806 4021 12574 12472 11890 11918 12493 11404 256 2868 3679 3618 3990 3865 4230 12445 12385 11932 11737 12505 12421 512 2601 3137 3019 3320 3293 3480 11491 10468 11054 10495 11569 11538 1024 1637 1723 1724 1749 1730 1760 8396 5695 8350 8299 8500 8651 4096 1448 1506 1495 1476 1510 1519 5866 3257 5904 5877 5888 5759 16384 1452 1502 1418 1446 1479 1482 5915 5872 5919 5769 5979 5439 65536 1417 1474 1482 1124 1481 1495 5259 5186 4906 5185 4920 4834 MFLOPS 798 1039 3219 3469 Per MHz 0.53 0.69 1.79 1.93 T26 v8-A73 2000 16 12593 11968 15740 15581 13939 17826 32 12646 12022 15795 15754 14026 17983 64 12661 12000 15803 15657 14030 17841 128 9839 10043 9742 9995 9297 9765 256 9455 9552 9316 9472 8716 9292 512 9440 9540 9308 9469 8869 9299 1024 7760 7648 7393 7589 6427 7588 4096 4474 4546 4536 4525 4202 4485 16384 4494 4544 4488 4444 4092 4349 65536 3933 4159 4260 3671 3894 4264 MFLOPS 3165 3006 Per MHz 1.58 1.50 64 Bit P42 v8-A57 2000 MHz P43 Ex8890 2300 MHz 16 4095 15470 8869 18296 20535 22364 12298 34094 14419 36314 36453 40732 32 8197 15159 8763 17449 19000 20012 12381 34284 14422 36620 36611 41090 64 8142 14200 8692 14906 15645 16157 12017 18659 12514 18859 18952 18959 128 8018 14568 8702 14138 15337 15349 12280 20062 13452 20186 20290 20231 256 8075 13967 8618 13932 14960 15169 12397 20563 13632 20580 20592 20551 512 7715 13740 8252 13954 14651 14860 12441 20763 13716 20754 20741 20745 1024 7990 13460 8605 13470 14847 14532 12453 20821 13748 20789 20777 20822 4096 3193 3375 3274 3348 3451 3468 9413 9638 9490 9588 9602 9615 16384 2847 2888 2844 2892 2969 2976 8819 9317 8731 9268 8803 8959 65536 2890 2942 2894 2441 3021 3029 7989 8917 8773 9003 8950 9349 MFLOPS 2049 3868 3113 8571 Per MHz 1.02 1.93 1.35 3.73 P44 v8-A73 2350 MHz P45 v8-A73 2000 MHz 16 11377 18821 12629 18501 17006 18168 4268 12312 8276 13004 12735 13393 32 11865 19063 13369 16555 16966 18592 9744 14890 11027 12746 12816 13593 64 11797 18949 13351 14965 16899 19012 9773 14894 10988 12886 12828 13616 128 10557 11182 11272 12228 12225 12126 8693 9199 9319 10035 10025 10050 256 10992 10196 11446 11180 11357 11317 8890 8560 9345 9441 9413 9424 512 11070 10433 11580 11502 11380 11459 8918 8584 9408 9429 9405 9427 1024 10922 10467 11314 11441 11120 11374 6809 6383 7442 8558 8577 8550 4096 5375 5294 5332 5406 5421 5404 6076 6067 6074 6128 6006 6036 16384 4965 4800 4921 4844 4951 4893 5894 5945 5888 4099 4124 5885 65536 4901 4936 4920 4965 4991 4921 5991 5976 6094 6135 6142 6104 MFLOPS 2966 4766 2443 3724 Per MHz 1.26 2.03 1.22 1.86 |
This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum bus speed can be estimated by multiplying this minimum value by 16.
These estimated maximum bus speeds are shown below, with additional calculations for 8 word increments, to cover strange results. In the case of older technology, these estimates can reflect actual measured MB/second, but not so for newer CPUs. The measured speeds can be faster using multithreading, that is indicated in later MP-BusSpeed results. It appears to be impossible to obtain details of cache sizes, where the technology specifications indicate a range of sizes. Below, assumptions are shown, based on the results, but they might only apply to the (Big or Little?) CPU set being used.
The Read All Max/MHz calculations indicate highesr CPU speeds, with 64 bit operation somewhat faster, but not a large variation on different hardware. These ratios can be divided by four to indicate integer speeds in MOPS (but 1 OP might comprise a load and an arithmetic instruction). For more details and further results see old archived report and the earlier report also in the later 2018 publication.
BusSpeed MB/second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All Words Words Words Words Words All 32 Bit P37 v8-A53 1500 MHz T23 v8-A72 1800 MHz 16 3235 3885 4370 4802 4860 3502L1 4838 5609 6155 6556 6544 6561L1 32 1033 1083 1802 3028 3898 3334 1015 1529 2814 4442 6082 6436 64 758 774 1430 2470 3543 3315L2 888 1282 2558 4232 6111 6429L2 128 703 713 1294 2307 3436 3325 853 1263 2483 4084 6224 6487 256 669 678 1249 2217 3372 3282 851 1261 2500 4088 6143 6442 512 432 459 693 1270 2369 2406 859 1269 2496 4059 6032 6464 1024 187 179 391 783 1544 2753RAM 367 584 1076 2036 3444 4587RAM 4096 182 184 357 722 1396 2559 320 520 996 1977 3782 4564 16384 172 178 253 520 1151 2178 320 521 992 1957 3721 4584 65536 153 160 277 665 1326 2500 318 506 993 1969 3674 5789 Max/MHz 2.33 3.65 Bus MB/s 2704 2120 8216 7940 T26 v8-A73 2000 MHz 16 3973 4791 6909 7362 7332 6928L1 32 6310 6871 6906 7394 7361 6939 64 4103 4656 6252 7266 7312 6830 128 1237 1245 2368 3920 6037 6548L2 256 1043 1038 2129 3497 5581 6665 512 1037 1030 2112 3464 5654 6846 1024 773 725 1269 2291 4093 6011RAM 4096 583 584 1292 2529 4681 6168 16384 583 598 1286 2506 4721 6179 65536 611 612 1295 2546 4737 6294 Max/MHz 3.47 Bus MB/s 9680 10324 64 Bit P42 v8-A57 2000 MHz P43 Ex8890 2300 MHz 16 6406 6677 7341 7515 7605 7775L1 3011 4418 6023 10176 10181 10345L1 32 1280 1907 2810 5404 6619 7600 4885 4392 5002 7947 9258 10284 64 630 1022 2053 3883 5893 7594L2 1878 1886 3683 4943 8924 10332L2 128 645 1046 2059 3891 5639 7304 1927 1934 3641 5009 8942 10347 256 638 1047 2072 3873 5552 7365 1931 1934 3660 5033 9003 10349 512 629 1054 2073 3744 5476 7587 1943 1948 3671 5050 9032 10368 1024 595 1000 2075 3867 5485 7129 1948 1944 3663 5056 9030 10367 4096 258 330 654 1227 1959 3770RAM 909 384 1385 2647 4954 9446RAM 16384 235 311 621 1182 2116 4008 346 362 1324 2596 5077 9663 65536 234 311 621 1184 2128 4003 355 356 1331 2616 5053 9667 Max/MHz 3.89 4.51 Bus MB/s 4976 4968 5744 10620 P44 v8-A73 2350 MHz P45 v8-A73 2000 MHz 16 7438 7997 8777 9284 9272 9659L1 3190 5170 6616 7549 7564 7903L1 32 8250 8784 8776 9303 9207 9729 6731 7183 7214 7567 7554 7925 64 5320 5759 7854 8670 8983 9523 4381 4764 6423 7257 7398 7843 128 1458 1468 2975 4856 7307 9712L2 1200 1213 2420 4019 5948 7911L2 256 1239 1237 2538 4321 6523 9389 1011 1011 2181 3596 5338 7923 512 1229 1227 2652 4362 6386 9630 1008 1007 2163 3548 5287 7924 1024 1230 1232 2655 4367 6408 9503 773 794 1719 3137 4493 7056RAM 4096 825 885 1843 3556 5877 9246RAM 603 630 1475 2861 4982 7607 16384 851 921 1948 3615 6145 8567 610 636 1369 2774 4807 7485 65536 766 757 1784 3327 5982 9058 579 601 1364 2702 4790 7440 Max/MHz 4.14 3.96 Bus MB/s 13424 14928 9896 10932 |
RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches. For more details and further results see archived details and earlier Android report, also in the later 2018 publication. PC version details are in randmem results.htm 2013.
Note that, particularly for cache based data, there are many similarities in reading, writing, serial and random speeds, probably influenced by them having the same complicated indexing. It is only cache and RAM random accessing where performance really suffers.
Maximum MB/second per MHz ratios are provided along with those from BusSpeed, showing similarities. The second BusSpeed results shown are minimum MB/second from RAM at Inc16, mainly more than twice as fast as the slowest random access speed.
RandMem MB/second Memory Serial. ....... Random. ....... Serial ........Random. ....... KBytes Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt 32 Bit P37 v8-A53 1500 MHz T23 v8-A72 1800 MHz 16 3969 4571 4346 4572 6794 9073 6787 9038 32 3643 4291 2543 3125 6638 8171 5796 5252 64 3347 4099 1175 1407 6119 6235 3162 3284 128 3249 4043 834 968 6585 7878 2555 2432 256 3153 4022 815 851 6754 8199 2207 2071 512 2526 2548 305 384 6403 7919 2046 1927 1024 1418 899 114 143 5103 4502 806 888 4096 1247 825 73 81 2483 4091 254 249 16384 1313 860 64 73 4380 4038 209 209 65536 1399 905 63 72 5781 3508 188 200 Max/MHz 2.65 3.05 3.77 5.04 Bus 2.33 153 3.65 506 T26 v8-A73 2000 MHz 16 5491 11652 8821 11889 32 8790 11771 8714 11050 64 8943 11739 8940 11911 128 7638 8107 3485 3261 256 9192 7571 2337 2331 512 9159 7539 2006 2029 1024 8477 4030 712 894 4096 8004 2371 288 315 16384 8016 2046 232 257 65536 7934 2068 195 243 Max/MHz 4.60 5.89 Bus 3.47 583 64 Bit P42 v8-A57 2000 MHz P43 Ex8890 2300 MHz 16 7390 9876 7418 6795 4107 7836 9589 9587 32 7247 8828 6027 6624 9637 12322 9834 11145 64 7171 7896 2868 2911 9118 9024 5352 4890 128 7070 7651 2227 2209 9195 9319 4213 3390 256 7074 7643 1832 1898 9231 9460 2991 2967 512 7074 7645 1751 1768 9260 9530 2413 2579 1024 7037 7278 1656 1649 9266 9559 2158 2340 4096 3422 1652 382 374 8701 7042 643 627 16384 3752 1509 227 219 8680 6938 377 371 65536 3796 1561 136 134 8205 6834 314 315 Max/MHz 3.70 4.94 4.19 5.36 Bus 3.89 311 4.51 356 P44 v8-A73 2350 MHz P45 v8-A73 2000 MHz 16 11405 14806 11384 14394 8930 7315 8927 7287 32 11387 14790 11402 14499 8954 7327 8955 7322 64 11310 14272 11372 13878 8936 7320 8945 7322 128 11308 9876 3957 4000 9042 7321 3304 3364 256 11348 9381 2771 2781 9221 7324 2287 2414 512 11378 9412 2355 2486 9262 7328 1962 2090 1024 10485 8289 1840 2254 8134 6053 587 808 4096 10292 2729 475 552 8689 4707 173 197 16384 10339 2589 287 334 8643 4735 139 163 65536 10316 2477 209 241 8511 4728 217 250 Max/MHz 4.85 6.30 4.63 3.66 Bus 4.14 757 3.96 601 |
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results provided are running times in milliseconds. Besides Android, the bechmarks are available to run via Windows and Linux. Two versions are available FFT1, original version and with optimised C code as FFT3c. Further details, results, and links for benchmarks and source code are in FFTBenchmarks.htm 2012. with more in a later report. The latter includes a count of floating point operations executed for each FFT size, enabling MFLOPS to be calculated. Even more recent results are in this Android report, also in the later 2018 publication. Particularly with the original Version 1.0 benchmark, data addressing can be mainly on a skipped sequential basis, with speed degraded with burst reading and writing, as in the RandMem Benchmark tests.
Results below are average or typical measurements. Note that running times are generally more that double at twice the FFT size, but some times greater when using a higher level cache.
The optimised version produces much faster FFT calculation speeds at the larger sizes, but not necessarily for the smaller ones. Calculations using floating point operation counts, mentioned above, produced the minimum (large FFTs) and maximum MFLOPS (small FFTs) shown for Version 3c.0. Considering the two new devices with 2000 MHz Cortex A73 CPUs, the 64 bit system was much faster calculating single precision FFTs but virtually the same at double precision. Other wide variations in the performance patterns make accurate comparisons virtually impossible.
FFT Single Precision and Double Precision Results in milliseconds 32 Bit P37 T23 T26 v8-A53 1500 MHz v8-A72 1800 MHz v8-A73 2000 MHz K Size SP DP SP DP SP DP Version 1.0 1 0.20 0.22 0.04 0.04 0.11 0.12 2 0.47 0.57 0.09 0.12 0.24 0.28 4 1.03 1.36 0.23 0.33 0.62 0.78 8 2.37 2.84 0.66 0.74 1.76 2.64 16 4.38 5.18 1.46 1.69 4.49 5.00 32 9.96 22.34 3.23 4.08 6.27 5.27 64 31.13 90.64 8.02 23.32 8.16 19.14 128 150.32 213.72 46.72 76.29 35.35 70.59 256 371.04 521.71 152.47 185.97 124.72 168.43 512 896.93 1119.74 341.39 441.27 296.97 388.28 1024 1902.22 2419.83 779.47 1053.30 707.66 877.22 Version 3c.0 1 0.15 0.14 0.04 0.04 0.28 0.05 2 0.32 0.31 0.09 0.09 0.60 0.10 4 0.73 0.74 0.21 0.26 1.38 0.22 8 1.72 1.69 0.47 0.63 3.00 0.56 16 4.35 4.70 1.10 1.65 5.93 1.32 32 9.00 12.61 2.43 4.17 9.65 3.40 64 24.01 30.28 6.61 8.99 13.05 9.98 128 56.23 69.75 19.78 22.21 18.90 24.02 256 126.46 161.91 41.03 50.12 40.51 54.08 512 292.45 354.48 90.74 110.98 93.26 117.85 1024 638.54 791.93 222.78 251.03 189.35 264.37 Min MFLOPS 168 135 480 426 183 405 Max MFLOPS 365 395 1313 1403 603 1222 64 Bit P42 P43 P44 P45 v8-A57 2000 MHz Ex8890 2300 MHz v8-A73 2350 MHz v8-A73 2000 MHz K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.03 0.03 0.05 0.05 0.12 0.11 0.07 0.05 2 0.07 0.08 0.10 0.22 0.26 0.24 0.12 0.11 4 0.21 0.28 0.51 0.70 0.56 0.53 0.26 0.25 8 0.54 0.62 1.42 1.65 0.79 0.96 0.59 0.74 16 1.50 1.42 1.96 2.24 2.11 1.68 1.59 1.71 32 2.96 3.26 4.54 5.21 2.90 2.40 3.64 3.87 64 6.95 9.17 8.88 9.28 5.31 7.85 7.94 13.03 128 20.65 39.42 14.19 20.53 13.43 33.55 26.75 74.03 256 67.44 145.44 35.38 60.17 66.42 107.46 129.62 188.27 512 249.83 380.53 100.55 157.88 198.51 258.60 359.92 402.00 1024 709.12 850.11 280.70 384.91 456.33 566.70 735.38 868.83 Version 3c.0 1 0.04 0.04 0.20 0.03 0.05 0.04 0.07 0.05 2 0.09 0.10 0.43 0.08 0.11 0.09 0.15 0.11 4 0.22 0.23 1.03 0.18 0.25 0.20 0.30 0.23 8 0.51 0.52 2.53 0.41 0.55 0.47 0.66 0.57 16 1.21 1.19 3.48 0.93 1.16 1.08 1.49 1.31 32 2.62 2.76 4.22 1.99 2.41 2.30 3.14 3.20 64 6.21 8.16 8.95 5.02 4.58 5.52 6.86 7.88 128 15.44 22.83 17.62 11.93 10.57 14.92 16.67 20.52 256 39.50 58.00 29.00 31.25 26.16 36.05 36.81 43.15 512 96.11 136.55 59.94 80.62 63.11 82.84 80.02 96.30 1024 243.94 320.00 150.64 190.89 140.31 199.10 173.62 250.99 Min MFLOPS 438 334 217 560 762 537 616 426 Max MFLOPS 1257 1269 848 1616 1173 1328 836 1113 |
For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same shared code, with separate variables. In this case, performance of each of the eight test functions and overall MWIPS ratings is invariably (nearly) proportional to the number of CPU cores available. The driving program checks that calculations on every thread produce consistent numeric results. Further details and download options for earlier MP-Whets versions can be found in original multithreading benchmarks 2013 Archive and later version of Android report also in the later 2018 publication.
The overall times for all threads to finish are included with detailed performance ratings. Timing is calibrated to determine repeat parameters used for all tests, using a single thread. This can vary between around 3 and 5 seconds, depending on the start up state. Although the actual times cannot be compared across different systems, they can be used to indicate MP efficiency. Only one of the systems below has four cores, with by far the longest time running 8 threads. All of the others either have half of the cores running at different frequencies or/and different CPU models. This, of course, affects 8 thread performance.
Single thread performance was a little slower than the earlier single core tests. Regarding the comparable 2.0 GHz Cortex A73/A53 CPUs (T26 and P45), this time the 64 bit version was faster on most tests.
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 32 Bit P37 Cortex-A53 4 x 1.5 GHz 4 x 1.2 GHz 1T 1138.0 370.7 375.7 185.7 31.5 20.6 582.0 897.1 494.8 2T 2291.3 630.1 590.6 373.3 64.5 41.3 1389.6 1870.3 979.2 4T 4585.5 1237.2 1206.1 740.7 129.2 83.2 2805.5 3734.6 1955.2 8T 8157.2 2261.8 1843.7 1340.3 234.8 150.5 4622.6 7014.8 3548.8 Overall Seconds 5.19 1T, 5.22 2T, 5.25 4T, 6.39 8T T23 Cortex A72 2 x 1.8 GHz + 2 x 1.4 GHz 1T 1904.7 519.1 513.1 315.1 67.0 25.0 1501.3 1803.6 770.3 2T 3756.8 925.9 989.5 631.4 134.6 47.9 2985.9 3499.3 1526.8 4T 5447.4 1352.9 1415.3 1018.9 200.6 65.5 4251.7 4667.4 2500.1 8T 6006.3 1425.0 1751.6 969.9 193.9 89.5 4475.5 5277.4 2520.1 Overall Seconds 4.05 1T, 4.16 2T, 6.45 4T, 11.42 8T (4 cores) T26 4 x Cortex-A73 + 4 x Cortex-A53, all 2 GHz 1T 1723.8 426.4 296.1 280.4 76.0 25.7 679.5 999.5 682.5 2T 4085.6 999.6 608.0 648.4 177.9 62.2 2300.9 2720.2 1397.8 4T 8474.9 1987.0 2075.9 1321.3 346.8 120.0 4778.1 6800.5 2802.2 8T 15162.1 3463.4 3803.0 2337.4 581.4 237.6 8604.4 11601.8 5337.4 Overall Seconds 4.17 1T, 3.68 2T, 3.57 4T, 4.29 8T 64 Bit P42 4 x 2.0 GHz Cortex-A57 + 4 x 1.5 GHz Cortex-A53 1T 1592.2 248.8 276.3 292.1 57.2 19.1 ###### 1612.6 466.2 2T 3176.7 504.0 554.9 581.0 114.3 38.1 ###### 3237.9 927.2 4T 5907.9 987.4 1079.0 1156.3 199.4 75.4 ###### 6391.9 1204.4 8T 11269.4 1925.3 2616.0 2414.8 317.3 149.7 ###### 8187.0 2494.7 Overall Seconds 4.56 1T, 4.58 2T, 5.62 4T, 6.22 8T P43 Exynos 8890 4 x 2.3 GHz + 4 x 1.6 GHz 1T 2854.2 845.4 945.1 563.6 81.5 33.7 ###### 2082.6 970.0 2T 6502.4 1800.9 1788.2 1475.1 199.5 67.3 ###### 6848.4 1928.9 4T 11393.5 2840.9 2734.4 2545.7 361.4 119.3 ###### 10234.4 3420.1 8T 18984.5 5399.4 5357.5 4238.3 601.7 211.9 ###### 18468.2 5141.6 Overall Seconds 2.94 1T, 2.68 2T, 3.16 4T, 4.82 8T P44 Cortex-A73 4 x 2.3 5GHz + 4 x 1.9 GHz 1T 3199.2 592.6 693.4 589.7 111.1 41.8 ###### 3523.0 599.8 2T 6318.7 1415.0 1399.7 1176.2 218.4 80.8 ###### 6907.3 1200.0 4T 11762.7 2736.9 2103.6 2221.3 421.6 151.4 ###### 10194.2 2318.6 8T 20501.2 4675.2 4559.4 4204.6 642.4 280.7 ###### 22153.2 3818.6 Overall Seconds 4.53 1T, 4.53 2T, 4.83 4T, 6.60 8T P45 4 x Cortex-A73 + 4 x Cortex-A53, all 2 GHz 1T 2554.9 550.5 530.0 451.5 89.2 35.4 ###### 2930.2 458.0 2T 5042.0 1188.8 1005.0 863.7 167.7 77.4 ###### 3434.9 927.2 4T 11051.2 1766.8 1655.9 1952.6 368.2 181.8 ###### 10013.3 2017.6 8T 19303.2 3674.0 3906.7 3577.0 569.6 316.3 ###### 19445.9 3548.6 Overall Seconds 3.29 1T, 3.33 2T, 3.17 4T, 3.96 8T ###### Impossible performance, probably over-optimisation, little effect on MWIPS MP-Dhrystone Benchmark next or Go To Start |
For further details see Dhrystone Benchmark above and the following, that includes further results and a download optipon for the earlier version android multithreading benchmarks.htm 2013 and this ARM/Intel report plus the later 2018 publication.
This multithreading benchmark runs using 1, 2, 4 and 8 threads, executing multiple copies of the same program. An initial calibration, using a single thread, determines the number of passes needed for an overall execution time of 1 second. Then all threads are run using the same pass count, running time being extended when there are more threads than CPUs. The same calculations are carried out on each thread. Separate data arrays are used for each thread but some variables can be used by all threads. The latter is probably responsible for failure to increase throughput much, using multiple threads.
Note that, at least for the latest Android 9 results, the single thread speeds are best case. Worst case for the 64 bit P45 was 4818 VAX MIPS, but multithreaded performance was similar.
VAX MIPS or DMIPS Threads System CPU MHz Android 1 2 4 8 None See 32 Bit Version T23 v8-A72 1800 x4 5.1.1 4397 6514 3940 3385 3560 + v8-A53 1400 x4 P37 v8-A53 1500 x4 7.0 1427 2639 4261 2329 1464 + v8-A53 1200 x4 T24 v8-A74 2000 x4 9.0 4690 7322 9151 4963 4514 + v8 A53 2000 x4 64 Bit Version P42 v8-A57 2000 x4 5.1.1 6298 8393 7447 5112 9525 + v8-A53 1500 x4 P43 Ex8890 2300 x4 7.0 15498 5224 5530 2789 13495 + 1500 x4 P44 v8-A73 2350 x4 8.0 10470 13396 15247 8994 10188 + 1900 x4 P45 v8-A74 2000 x4 9.0 7729 9511 11758 7595 8442 + v8 A53 2000 x4 |
This is a multithreading version of the above. Further details and results can be found in android neon benchmarks.htm 2013. and 2017 Android Report
This benchmark is not generally available with the new 4A8 compilation as overall running time had increased to more than 400 seconds, on a new phone.
This is a multithreading version of BusSpeed above but some of the single thread results are different. The latest version arranges for threads to have staggered starting points, each reading all the data. It is clear that multiple threads are needed to demonstrate maximum throughput. See Last Version of Android Report and here for further results, then were for later 2018 publication.
Considering just the important Read All results, 8 cores generally provided appropriate performance gains, using caches, on doubling threads up to four, but less so from 4 to 8 threads, partly due to some use of alternative cores with slower MHz. Often more important, multiple threads produced gains using RAM based data. Maximum bus speed estimates are again included below, reflecting the improvement. Maximum MB/second per MHz ratios are also provided, based on 8 core L1 cache based data transfer rate and average MHz of the two sets of cores.
MB/Second Reading MB/Second Reading KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 32 Bit P37 Cortex-A53 4x1.5 GHz 4x1.2 GHz T23 Cortex A72 2x1.8 GHz + 2x1.4 GHz 12.3 1T 2903 3715 3964 4258 4384 3335 5479 6201 6348 6508 6553 6207 2T 4908 6632 7279 7975 8065 6725 10418 11580 12032 12788 13131 12006 4T 7997 11780 13832 15518 16117 13141 11163 14290 16135 17280 18215 15409 8T 6406 9148 18628 17698 25240 20946 5426 6522 14721 12676 19005 16868 123 1T 675 666 1203 2094 3143 3239 864 1258 2425 3898 6002 5595 2T 1018 1045 1984 3668 5781 6310 832 1198 2666 5237 9840 11988 4T 1067 1110 2206 4366 8025 12191 1671 2229 4246 7968 13353 15281 8T 1800 1869 3622 6938 11690 17271 1104 2304 3896 7471 12977 15508 49152 1T 160 169 326 661 1300 2334 316 505 971 1939 3713 5770 2T 287 288 600 1175 2318 4224 313 587 1003 2036 4240 7894 4T 430 360 739 1510 2956 5722 418 612 1087 2296 4292 8885 8T 436 399 752 1716 4242 5817 420 564 1115 2290 4450 8811 Max/MHz 15.52 10.54 Bus MB/s 6384 6016 9024 8920 T26 4xCortex-A73 + 4xCortex-A53 2 GHz 12.3 1T 5095 4899 6499 6473 5599 5443 2T 10489 9734 12600 13940 13605 12618 4T 16167 22396 23597 27230 26251 22337 8T 11314 14923 30091 29825 37405 28674 123 1T 1006 1036 1688 2984 4637 6373 2T 1278 1350 2886 4893 7676 9409 4T 1711 1874 4011 7631 13077 18501 8T 2892 3001 6122 11060 15162 22620 49152 1T 593 424 936 1381 2999 4936 2T 609 597 821 2232 5088 7657 4T 1063 693 1316 2750 5614 15683 8T 919 895 1556 3656 6739 14075 Max/MHz 14.34 Bus MB/s 14320 12448 64 Bit P42 4x2.0 GHz A57 + 4x1.5 GHz A53 P43 Exynos 8890 4x2.3 GHz + 4x1.6 GHz 12.3 1T 4227 4385 3996 4604 4409 3679 6109 9771 10070 7328 8449 7859 2T 5439 6648 7744 9226 10911 6936 14592 17326 18030 19515 20019 20347 4T 6627 9079 11722 16045 15897 14411 20825 26166 30411 33036 34255 32872 8T 5970 7777 15033 17943 28098 23425 15221 19637 36246 30585 42923 35993 123 1T 537 642 1304 2086 3811 3327 1562 1934 2648 4977 7451 10196 2T 718 992 1894 3719 5963 7180 2147 2191 4426 7450 11705 19772 4T 667 994 1965 4015 7475 12584 3045 3074 5427 10756 19892 31669 8T 1292 1797 3423 6426 12678 20602 3674 3658 6668 12791 24018 34620 49152 1T 154 196 395 750 1437 2312 373 324 1020 2452 3942 6888 2T 258 283 544 1130 2181 4130 909 733 1323 2657 5963 11235 4T 321 425 801 1648 3023 6162 1131 660 1181 2409 4740 8990 8T 372 474 926 1774 3346 7758 1205 783 1500 2912 5630 10637 Max/MHz 13.39 18.46 Bus MB/s 7584 7408 12528 12000 P44 A73 4x2.3 GHz + 4x1.9 GHz P45 4xCortex-A73 + 4xCortex-A53 2 GHz 12.3 1T 8020 8748 8969 9039 9310 9150 5188 6941 3419 6338 6794 6021 2T 13581 15086 16400 17698 17816 17921 11526 10827 12068 14013 13189 12322 4T 23160 25585 27865 30934 31776 31235 16856 21215 17335 18750 21369 14970 8T 17111 21200 41776 37305 52660 39386 18086 19646 44559 38915 53692 37112 123 1T 1480 1557 3003 4831 7331 9060 1234 1085 2107 2267 4738 5794 2T 2034 2001 3767 6666 10521 15441 1693 1431 2791 4788 8149 10871 4T 2229 2233 4531 8891 16209 23150 1701 1949 2645 4971 14743 20175 8T 3416 3514 7122 13435 24139 32032 2101 3088 6355 12679 25005 33216 49152 1T 774 837 1709 3313 5014 8102 329 572 941 1856 2537 5215 2T 1083 977 1842 3881 7776 13265 694 686 1296 2492 3539 10492 4T 975 823 1836 3468 6655 12640 1004 765 1533 4588 5799 12135 8T 1273 1217 2046 3774 7158 14470 1022 1090 1919 3797 7753 14227 Max/MHz 18.76 18.56 Bus MB/s 19472 16368 17440 15352 |
These are multithreading varieties of RandMem above. The latest are ARM/Intel versions of the longer running MP-RndMem2.apk, available from android long MP benchmarks.htm 2016, with further details and results in last version of Android report. then later 2018 publication. The most striking feature of these MP results is the apparent constant performance at all thread sizes, over the memory area covered, during read/write tests. Although data access is started at staggered addresses, the whole data area is shared and it seems that this leads to only one thread being used at a time, to ensure data integrity.
Most serial and random read tests produced appropriate multithreaded performance gains from cache based data and some using RAM, even with random access influenced by burst reading. Maximum serial reading speed from RAM was similar to BusSpeed Read All tests, sometimes better. There are many different performance variations across the systems. For some clarification, averages are included for 8 thread tests. The first is MB/second per MHz, for CPU processor power. The widest variations were on running the serial read/write tests. The other ratios are for random access from RAM, the worst where slower read/write speed was also indicted for the CPU comparisons.
MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR SerRD SerRDWR RndRD RndRDWR 32 Bit P37 Cortex-A53 4x1.5 GHz 4x1.2 GHz T23 Cortex A72 2x1.8 GHz + 2x1.4 GHz 12.3 1T 3970 4455 3767 4397 6568 8722 6559 9253 2T 7114 4123 7123 3997 12773 8941 12967 8857 4T 13613 3869 13477 3471 14468 8473 14077 8494 8T 16201 3398 15432 3475 14198 7681 13132 7465 123 1T 3440 3993 872 1015 6330 8648 2626 2565 2T 6547 3742 1603 951 12402 8533 2663 2469 4T 12252 3245 2441 820 13357 7816 4114 2344 8T 16505 3246 3851 836 12945 7375 4106 2306 12288 1T 2456 865 73 77 3987 4141 233 229 2T 4404 856 149 76 9317 2899 356 220 4T 6965 837 267 72 7238 2545 380 207 8T 8753 840 399 73 8134 2352 406 195 Max8/MHz 12.0 2.5 8.9 4.8 Min8/MHz 0.30 0.05 0.25 0.12 T26 4xCortex-A73 + 4xCortex-A53 2 GHz 12.3 1T 7605 8979 6904 10530 2T 14642 4626 16708 8019 4T 30603 6200 30526 5179 8T 21190 2911 21123 3318 123 1T 7198 6858 2314 2784 2T 13114 4571 4060 1851 4T 20694 4741 5133 1011 8T 27465 2879 7511 1043 12288 1T 7709 2047 266 254 2T 8742 1350 449 163 4T 15630 1012 764 57 8T 16096 1026 824 71 Max8/MHz 10.6 1.5 Min8/MHz 0.41 0.04 64 Bit P42 4x2.0 GHz A57 + 4x1.5 GHz A53 P43 Exynos 8890 4x2.3 GHz + 4x1.6 GHz 12.3 1T 7605 5887 4722 5837 9167 13423 10264 11268 2T 9302 4040 9342 3644 19064 13606 19256 13533 4T 14938 3293 14879 3233 34740 13553 33304 11402 8T 24243 3244 21400 3422 38359 10711 34231 10244 123 1T 4217 4920 1564 1555 9478 8407 4305 2846 2T 8317 3227 1881 868 16134 9919 5164 3082 4T 9488 2956 2191 831 31978 7364 7430 1804 8T 14073 2868 3295 849 34835 4754 9126 3612 12288 1T 2819 1630 146 144 7276 7271 419 411 2T 4741 1644 325 97 13304 7154 755 412 4T 6141 1589 495 81 10217 7132 1158 405 8T 7280 1694 565 75 11211 3309 1200 386 Max8/MHz 13.9 1.9 19.7 5.5 Min8/MHz 0.32 0.04 0.62 0.20 P44 A73 4x2.3 GHz + 4x1.9 GHz P45 4xCortex-A73 + 4xCortex-A53 2 GHz 12.3 1T 11673 15957 11158 14887 8274 6783 8632 7518 2T 22897 15131 22940 15423 15351 5088 19387 4701 4T 41281 14180 39242 12910 32836 2998 32374 3218 8T 53007 11363 39641 10187 52255 2635 45453 2636 123 1T 10755 9601 4199 4309 6893 6356 3013 3631 2T 15385 10061 5195 4026 12815 4006 1979 2005 4T 23533 8370 5839 4154 21771 2981 5368 797 8T 31776 7267 8680 2686 33599 2552 7764 738 12288 1T 8962 2450 374 370 6341 4950 175 223 2T 14257 2406 543 374 9644 2569 296 205 4T 15952 2301 714 328 14958 2027 654 86 8T 20927 2171 669 346 20869 2774 894 97 Max8/MHz 25.2 5.4 26.1 1.3 Min8/MHz 0.32 0.16 0.45 0.05 |
The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Further details, results and links to download original MP-MFLOPS benchmark can be found at android multithreading benchmarks.htm 2013 with the latest ARM only MP-MFLOPS2 compilations from android long MP benchmarks.htm 2016 and later version of Android Report plus 2018 publication. The newer versions have longer running times that avoid inconsistent speeds produced by the original.
Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run. Included with the results are calculations of MFLOPS per MHz. Those for one and four threads are from the fastest MHz, with the eight core varieties using the average frequency of the two sets of cores.
Regarding single core ratios, a maximum of 4 MFLOPS/MHz might be expected using NEON or 64 bit vector SIMD instructions or 8 MFLOPS/MHz where fused multiply and add is implemented, but this would only be apparent using 32 floating point operations per data word.
64 bit tests produced faster speeds than those at 32 bits, but only one (P43) achieved more than 4 MFLOPS/MHz. Judging by the latter ratios, not many indicated appropriate MP performance gains.
At the end of the table are maximum single core MFLOPS/MHz ratios for all systems, along with those from MemSpeed (single precision results above). These identify significant improvements
Those for the following NEON MFLOPS MP benchmark have also been included, just affecting 32 bit compilations.
Single Precision MFLOPS 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 32 Bit P37 Cortex-A53 4x1.5 GHz 4x1.2 GHz T23 Cortex A72 2x1.8 GHz + 2x1.4 GHz 1T 230 228 221 891 889 875 1336 1337 1335 2467 2497 2481 2T 454 448 430 1778 1772 1753 1820 2544 2162 4898 4986 4929 4T 897 874 672 3530 3515 3460 2675 2703 2066 5714 5777 5799 8T 1370 1279 739 5717 5718 5602 2595 2414 1712 6106 5746 5608 MFLOPS/MHz 1T 0.15 0.15 0.15 0.59 0.59 0.58 0.74 0.74 0.74 1.37 1.39 1.38 4T 0.60 0.58 0.45 2.35 2.34 2.31 1.49 1.50 1.15 3.17 3.21 3.22 8T 1.01 0.95 0.55 4.23 4.24 4.15 1.62 1.51 1.07 3.82 3.59 3.51 T26 4xCortex-A73 + 4xCortex-A53 2 GHz 1T 970 956 786 2571 2199 2545 2T 2233 2182 1247 4266 5071 4589 4T 4132 3264 1259 7730 5762 6695 8T 1943 4304 1734 8169 8110 8565 MFLOPS/MHz 1T 0.49 0.48 0.39 1.29 1.10 1.27 4T 2.07 1.63 0.63 3.87 2.88 3.35 8T 0.97 2.15 0.87 4.08 4.06 4.28 64 Bit P42 4x2.0 GHz A57 + 4x1.5 GHz A53 P43 Exynos 8890 4x2.3 GHz + 4x1.6 GHz 1T 1772 1746 764 4872 4805 4656 3368 3085 2744 12117 11592 3015 2T 2901 2449 1121 9014 8764 8519 8917 7468 4184 20439 23380 6031 4T 4209 7472 1461 10314 10064 10072 9810 12004 4017 42010 41528 10848 8T 4980 8758 1890 14707 14050 16111 9345 15470 3877 36689 39021 14468 MFLOPS/MHz 1T 0.89 0.87 0.38 2.44 2.40 2.33 1.46 1.34 1.19 5.27 5.04 1.31 4T 2.10 3.74 0.73 5.16 5.03 5.04 4.27 5.22 1.75 18.27 18.06 4.72 8T 2.85 5.00 1.08 8.40 8.03 9.21 4.79 7.93 1.99 18.81 20.01 7.42 P44 A73 4x2.3 GHz + 4x1.9 GHz P45 4xCortex-A73 + 4xCortex-A53 2 GHz 1T 4539 3709 1367 7894 7933 7732 3658 2840 1591 5844 5890 6122 2T 8500 9170 1770 15336 15388 14802 4064 7675 1683 11216 12950 11114 4T 15335 7610 1896 25973 27855 23491 4939 6542 2505 25406 20311 22484 8T 10395 10542 1973 29124 31385 27072 8814 12475 2488 30095 28142 30009 MFLOPS/MHz 1T 1.97 1.61 0.59 3.43 3.45 3.36 1.83 1.42 0.80 2.92 2.95 3.06 4T 6.67 3.31 0.82 11.29 12.11 10.21 2.47 3.27 1.25 12.70 10.16 11.24 8T 4.95 5.02 0.94 13.87 14.95 12.89 4.41 6.24 1.24 15.05 14.07 15.00 One Core Maximum MFLOPS/MHz Comparison P37 T23 T26 P42 P43 P44 P45 MemSpeed 0.40 0.84 0.63 1.03 0.44 1.29 1.22 MP-MFLOPS 0.59 1.39 1.29 2.44 5.27 3.45 3.06 MP NEON 1.43 3.12 2.50 2.20 5.43 3.43 2.83 |
NEON-MFLOPS-MP carries out the same calculations as MP-MFLOPS Benchmarks above, but with NEON intrinsic functions used for all calculations. For further results see android neon benchmarks.htm, with details and results in this version of Android Report plus 2018 publication.
As indicated by the MFLOPS per MHz ratios included above MP-MFLOPS comparison table, these NEON functions significantly outperformed the 32 bit compiled standard C code. They made no difference at 64 bits, where vector SIMD instructions replaced the NEON function code.
The MFLOPS/MHz calculations produce further confusion on performance gains due to multithreading. Observing CPU MHz details sometimes shows the frequency reducing or switching to the less efficient cores, without any apparent reason, such as temperature increases.
Single Precision MFLOPS 2 Ops/Word 32 Ops/Word 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 12.8 128 12800 12.8 128 12800 P37 Cortex-A53 4x1.5 GHz 4x1.2 GHz T23 Cortex A72 2x1.8 GHz + 2x1.4 GHz 1T 819 765 432 2146 2123 2081 2813 3375 2072 5614 5452 5556 2T 1538 1431 605 4241 4158 4080 3580 4888 2139 11090 10928 8249 4T 2708 2359 727 8308 8296 7853 6521 6702 2026 12717 9457 8833 8T 2960 3613 763 12688 12314 10721 5857 7140 2003 11882 12152 9899 MFLOPS/MHz 1T 0.55 0.51 0.29 1.43 1.42 1.39 1.56 1.88 1.15 3.12 3.03 3.09 4T 1.81 1.57 0.48 5.54 5.53 5.24 3.62 3.72 1.13 7.07 5.25 4.91 8T 2.19 2.68 0.57 9.40 9.12 7.94 3.66 4.46 1.25 7.43 7.60 6.19 T26 4xCortex-A73 + 4xCortex-A53 2 GHz 1T 1505 1215 676 4725 4699 4997 2T 2281 1713 978 10008 10305 9415 4T 3474 3762 1633 18261 17663 12419 8T 3891 15536 1789 23878 15398 20030 MFLOPS/MHz 1T 0.75 0.61 0.34 2.36 2.35 2.50 4T 1.74 1.88 0.82 9.13 8.83 6.21 8T 1.95 7.77 0.89 11.94 7.70 10.02 P42 4x2.0 GHz A57 + 4x1.5 GHz A53 P43 Exynos 8890 4x2.3 GHz + 4x1.6 GHz 1T 1121 978 690 4406 4226 4147 3840 3724 2221 9408 12213 12479 2T 1625 1449 1058 7584 7166 7016 13142 7651 3492 22942 23924 25043 4T 2866 4020 1548 10354 9725 9481 14053 16381 3950 41799 40234 38691 8T 2938 5434 1817 16603 13018 12537 17176 20587 4104 40815 38050 44242 MFLOPS/MHz 1T 0.56 0.49 0.35 2.20 2.11 2.07 1.67 1.62 0.97 4.09 5.31 5.43 4T 1.43 2.01 0.77 5.18 4.86 4.74 6.11 7.12 1.72 18.17 17.49 16.82 8T 1.68 3.11 1.04 9.49 7.44 7.16 8.81 10.56 2.10 20.93 19.51 22.69 P44 A73 4x2.3 GHz + 4x1.9 GHz P45 4xCortex-A73 + 4xCortex-A53 2 GHz 1T 3658 4176 1301 7865 7888 7811 1556 2840 1432 5668 5618 4070 2T 8498 7971 1749 15333 15402 15134 2140 2806 2088 10604 12777 10769 4T 14381 5276 1935 29803 21957 22070 3316 3782 2492 15143 23324 15743 8T 6871 5086 1930 26429 27159 24767 5941 8057 2494 28367 28537 24423 MFLOPS/MHz 1T 1.59 1.82 0.57 3.42 3.43 3.40 0.78 1.42 0.72 2.83 2.81 2.04 4T 6.25 2.29 0.84 12.96 9.55 9.60 1.66 1.89 1.25 7.57 11.66 7.87 8T 3.27 2.42 0.92 12.59 12.93 11.79 2.97 4.03 1.25 14.18 14.27 12.21 |
|
![]() |
--------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured ##################### P37 ################### P37 Cortex-A53 4 x 1.5 GHz 4 x 1.2 GHz 32 bit Android 7, GPU Adreno 405 550 MHz 9000+ 18.49 18.74 14.45 11.73 18000+ 9.70 9.75 8.40 6.31 36000+ 4.78 4.78 4.45 3.48 Screen Pixels 1776 Wide 1080 High |
--------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured ##################### T23 ################### T23 Cortex A72 2 x 1.8 GHz + 2 x 1.4 GHz 32 bit Android 5, GPU PowerVR GX6250 9000+ 60.18 60.23 56.72 34.45 18000+ 38.36 38.59 33.22 18.15 36000+ 19.29 19.22 17.96 9.95 Screen Pixels 1200 Wide 1848 High |
##################### T26 ################### T26 4 x Cortex-A73 + 4 x Cortex-A53, all 2 GHz 32 bit Android 9, GPU Mali-G72 MP3 9000+ 50.91 51.16 39.60 33.42 18000+ 28.01 27.87 23.69 19.20 36000+ 14.37 14.45 13.20 10.19 Screen Pixels 1200 Wide 1848 High |
############################################# |
##################### P42 ################### P42 4x2.0 GHz Cortex-A57 + 4x1.5 GHz Cortex-A53 64 bit Android 5, GPU Adreno 430 600 MHz 9000+ 35.89 35.50 28.79 25.02 18000+ 19.48 19.51 17.13 12.62 36000+ 8.60 8.34 8.00 6.65 Screen Pixels 1080 Wide 1794 High |
##################### P43 ################### P43 Exynos 8890 4 x 2.3 GHz + 4 x 1.6 GHz 64 bit Android 7, Mali T880 624 MHz 9000+ 29.91 29.99 22.36 19.40 18000+ 15.11 14.63 11.87 9.57 36000+ 6.69 6.59 5.85 4.71 Screen Pixels 1080 Wide 1920 High |
##################### P44 ################### P44 Cortex-A73 4 x 2.3 5GHz + 4 x 1.9 GHz 64 bit Android 8, GPU Adreno 540 710 MHz 9000+ 56.25 56.29 43.71 35.48 18000+ 29.22 8.18 24.94 24.89 36000+ 14.54 14.45 13.49 9.81 Screen Pixels 1080 Wide 1794 High |
##################### P45 ################### P45 4 x Cortex-A73 + 4 x Cortex-A53, all 2 GHz 64 bit Android 9, GPU Mali-G72 MP3 9000+ 44.62 53.82 42.79 36.34 18000+ 27.50 29.55 25.72 21.10 36000+ 15.10 15.34 14.31 11.51 Screen Pixels 720 Wide 1339 High |
|
![]() |
Test Frames FPS ##################### P37 ################### P37 Cortex-A53 4 x 1.5 GHz 4 x 1.2 GHz 32 bit Android 7, GPU Adreno 405 550 MHz Display PNG Bitmap Twice 236 23.57 Plus 2 SweepGradient Circles 149 14.85 Plus 200 Random Small Circles 132 13.19 Plus 320 Long Lines 103 10.24 Plus 4000 Random Small Circles 41 4.06 Screen pixels 1776 Wide 1080 High |
Test Frames FPS ##################### T23 ################### T23 Cortex A72 2 x 1.8 GHz + 2 x 1.4 GHz 32 bit Android 5, GPU PowerVR GX6250 Display PNG Bitmap Twice 598 59.75 Plus 2 SweepGradient Circles 377 37.65 Plus 200 Random Small Circles 317 31.62 Plus 320 Long Lines 238 23.76 Plus 4000 Random Small Circles 90 8.92 Screen pixels 1200 Wide 1848 High |
##################### T26 ################### T26 4xCortex-A73 + 4xCortex-A53, all 2 GHz 32 bit Android 9, GPU Mali-G72 MP3 Display PNG Bitmap Twice 592 59.11 Plus 2 SweepGradient Circles 398 39.78 Plus 200 Random Small Circles 398 39.74 Plus 320 Long Lines 268 26.70 Plus 4000 Random Small Circles 81 8.09 Screen pixels 1200 Wide 1848 High |
############################################# |
##################### P42 ################### P42 4x2.0 GHz Cortex-A57+4x1.5 GHz Cortex-A53 64 bit Android 5, GPU Adreno 430 600 MHz Display PNG Bitmap Twice 313 31.21 Plus 2 SweepGradient Circles 164 16.39 Plus 200 Random Small Circles 148 14.76 Plus 320 Long Lines 117 11.70 Plus 4000 Random Small Circles 48 4.75 Screen pixels 1080 Wide 1794 High |
##################### P43 ################### P43 Exynos 8890 4 x 2.3 GHz + 4 x 1.6 GHz 64 bit Android 7, Mali T880 624 MHz Display PNG Bitmap Twice 515 51.47 Plus 2 SweepGradient Circles 368 36.73 Plus 200 Random Small Circles 352 35.11 Plus 320 Long Lines 290 28.90 Plus 4000 Random Small Circles 118 11.80 Screen pixels 1080 Wide 1920 High |
##################### P44 ################### P44 Cortex-A73 4x2.3 5GHz + 4x1.9 GHz 64 bit Android 8, GPU Adreno 540 710 MHz Display PNG Bitmap Twice 594 59.31 Plus 2 SweepGradient Circles 588 58.74 Plus 200 Random Small Circles 504 50.33 Plus 320 Long Lines 328 32.78 Plus 4000 Random Small Circles 112 11.14 Screen pixels 1080 Wide 1794 High |
##################### P45 ################### P45 4xCortex-A73 + 4xCortex-A53, all 2 GHz 64 bit Android 9, GPU Mali-G72 MP3 Display PNG Bitmap Twice 594 59.37 Plus 2 SweepGradient Circles 602 60.13 Plus 200 Random Small Circles 602 60.13 Plus 320 Long Lines 393 39.23 Plus 4000 Random Small Circles 101 10.01 Screen pixels 720 Wide 1339 High |
These apps are intended for use when there is a real need to measure drive speeds and should not really be tried on expensive new phones or tablets. I have not had any problems using them and have executed them on the new Android 9 T26 and P45 devices, with the same limited success reported here. P45 read only results are included below.
The programs are primarily intended for measuring performance of SD cards and internal drives, but can also be used to test USB drives. DriveSpeed carries out four tests.
Test 1 - Write and read three 8 and 16 MB; Results given in MBytes/second
Test 2 - Write 8 MB, read can be cached in RAM; Results given in MBytes/second
Test 3 - Random write and read 1 KB from 4 to 16 MB; Results are Average time in milliseconds
Test 4 - Write and read 200 files 4 KB to 16 KB; Results in MB/sec, msecs/file and delete seconds.
The first DriveSpeed benchmark has two run buttons, RunS for an SD card and RunI for the internal drive, the file path being identified by standard functions. The external SD test worked on earlier Android tablets but failed on later Android versions. RunS ran but provided distorted reading speeds by caching data in RAM. An extra button was added to prevent large files from being deleted and a read only option to measure uncached speeds after rebooting.
DriveSpd2 requires input of the file path to use and this might be identified using a file browser app. The file path can sometimes be selected for internal drives, SD cards and USB devices but there are complications associated with permissions and caching.
Running these benchmarks can require a lot of experimentation. Lots of paths, results and explanations are in android benchmarks32.htm DriveSpeed and android benchmarks32.htm Comparison, with more in later Android report and 2018 publication.
The latest compilations have been tested on devices with 32 bit and 64 bit ARM and Intel CPUs. Following is an example of running DriveSpd1.apk on a new phone. The SD card test (RunS) would not run properly (wrong default path?) but the internal drive test could be run, but data was cached for reading. In this case, the More button was used to avoid deleting the files. After powering the phone off and on, the More button was used to select Read Only, with Runi, providing measurements of reading speeds.
########################## P42 ######################### P42, Qualcomm 810, ARM Cortex A57, 2000 MHz, Adroid 5.1.1 Android DriveSpeed1 Benchmark 4A8 17-Jan-2018 13.34 Internal Drive Data Cached Compiled for 64 bit ARM v8a MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 123.8 239.2 318.2 1007.9 1109.3 1154.9 16 243.4 200.5 98.9 598.8 789.6 949.5 Cached 8 294.5 355.9 291.8 1169.8 1228.6 1175.6 Random Write Read From MB 4 8 16 4 8 16 msecs 1.25 1.56 1.13 0.00 0.00 0.00 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 35.9 37.3 51.3 138 237 248 msecs 0.11 0.22 0.32 0.03 0.03 0.07 0.015 No delete Total Elapsed Time 17.6 seconds Path Used /data/data/com.?drivespeed/files/? READ ONLY Android DriveSpeed1 Benchmark 4A8 17-Jan-2018 13.38 Internal Drive Read Only Compiled for 64 bit ARM v8a MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 0.0 0.0 0.0 60.8 239.0 241.3 ################## P45 Android 9 ################## Android DriveSpeed1 Benchmark 4A8 30-Mar-2020 16.05 Internal Drive Read Only Compiled for 64 bit ARM v8a MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 0.0 0.0 0.0 182.4 173.3 207.0 CPU Stress Tests next or Go To Start |
There are two main stress test programs, that can use multiple threads to exercise (presently) all CPU cores, one using floating point instructions, and the other carryinfg out integer arithmetic. Further detail is covered in the earlier report - Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM and Intel.pdf and with an update in a 2018 publication. The third program monitors MHz of up to 8 cores. Each of the stress test applications has five buttons:
RunB - Run Benchmark - Runs most combinations of number of threads, data sizes and calculations per data word for the FPU tests. This is mainly to help to decide which options to use for stress testing. The benchmark runs using fixed parameters, carrying out exactly the same number of calculations using all thread combinations and data sizes. The pass count changes according to the number of calculations per word, for the FPU tests.
RunS - Run Stress Tests - Default running time is 15 minutes, with the middle data size, intended for containment in L2 cache, using 8 threads. and 32 operations per word in the FPU tests.
False Errors - The need for continuous performance displays lead to false error reports, due to multiple copies of the stress test programs running. This could occur with the original versions on rotating the device. The new version runs in forced portrait display mode, but false errors can be caused if the run button is clicked again when the tests are running. The main unique symptoms are multiple “End Time” message displays.
SetS - Specify run time parameters for stress test - These are 1, 2, 4, 8, 16 or 32 threads, 2, 8 or 32 Operations per word for FPU tests, 12.8 or 16 KB, 128 or 160 KB, 12.8 or 16 MB for FPU or Integer tests, and running time in minutes.
Info - Test description and details - The is essentially the same as details provided here.
Save - This offers details of the results and identified CPU hardware and Operating System for E-mail. Default addressee is the program author via results@roylongbottom.org.uk but this can be changed or additional addresses added.
Unexpected Faster Speed - Performance depends on whether the data comes from caches or RAM, with a particular effect on using the 160 or 128 KB options. Assuming 32 KB L1 caches, four threads, each using a dedicated quarter, should run at L2 cache speed but, eight threads or more threads, at 20 KB or less, will probably mainly run at L1 cache speed. This can also apply, to some extent, with 32 threads sharing 16 MB, where L2 cache can be the main source. See benchmark examples below. Note that the later CPUs can have L1 cache sizs of 64 KB and L2 1024 KB.
CP_MHz2 measurements are instantaneous at a constant sampling rate in seconds, default 10, for a specified number of minutes, default 15. This has Set, Run and Save buttons, as above.
Below are examples of short stress tests and recorded MHz of each of the eight cores, for the periods that the other programs were running. Note that these are instantaneous samples taken around every five seconds. As noted using CPU-Z, frequencies vary extremely rapidly, and on two different (Big/Little) sets of cores, with differing performance characteristics. They can proide more clarification on longer stress tests, but stll extensive confusion using one second sampling.
The program performance measurements are based on timing the number of operations or data blocks over the particular periods, identifying real changes in performance, in this case. The programs demonstrate or check that calculated results are consistent, for a specified number of calculations, or correct.
T26 4 x Cortex-A73 + 4 x Cortex-A53, all 2 GHz MHz for Core Secs 0 1 2 3 4 5 6 7 0 1989 1989 1989 1989 1989 1989 1989 1989 MP-FPU Stress Test 4A8 01-Apr-2020 10.42.30 11 1989 1989 1989 1989 1989 1716 1508 1508 Compiled for 32 bit ARM v7a 16 1989 1989 1716 1716 793 793 793 793 21 1014 1014 1924 1989 1326 1677 1846 1989 Data Ops/ Nmeric 27 1989 1989 1989 1716 910 910 793 793 Seconds Size Threads Word MFLOPS Results 32 1989 1989 1989 1989 910 910 793 793 37 1989 1989 1716 1716 793 793 793 793 14.7 12.8 KB 8 32 9565 35216 42 1989 1989 1989 1989 1989 1989 1326 1326 29.8 12.8 KB 8 32 8851 35216 48 1989 1989 1989 1989 1131 1131 1131 910 45.1 12.8 KB 8 32 8690 35216 53 793 793 793 793 1014 1014 910 910 60.8 12.8 KB 8 32 8482 35216 58 1716 1716 1417 1417 793 793 793 793 64 910 910 793 793 793 1989 1989 1989 End Time 01-Apr-2020 10.44.20 69 793 793 793 793 1989 1677 1417 1417 75 793 793 793 793 793 793 793 793 10 1417 1417 1989 1989 1989 1989 1989 1989 MP-Int Stress Test 4A8 01-Apr-2020 10.45.42 16 1989 1989 1846 1846 793 793 793 793 Compiled for 32 bit ARM v7a 21 1508 1131 793 1131 793 793 1326 1586 27 1014 1014 1014 793 793 793 1989 1989 Data Sum Same All 32 793 793 1989 1989 1989 1989 1989 1989 Seconds Size Threads MB/sec Check Threads 38 910 910 910 793 793 1924 1781 1989 43 793 793 1989 1989 1989 1989 1989 1989 8.5 16 KB 8 50143 00000000 Yes 48 793 1586 1989 1989 1989 1989 1989 1989 18.1 16 KB 8 43696 00000000 Yes 54 793 793 1989 1989 1989 1989 1989 1989 27.9 16 KB 8 42458 00000000 Yes 59 793 793 793 793 1924 1989 1989 1989 38.0 16 KB 8 41736 00000000 Yes 64 793 793 793 793 793 1586 1989 1989 48.2 16 KB 8 40772 00000000 Yes 70 1846 1846 1417 1417 793 793 793 793 58.4 16 KB 8 40998 00000000 Yes 75 1924 1924 1924 1508 793 793 793 793 69.4 16 KB 8 38063 FFFFFFFF Yes 80 793 793 793 793 793 793 793 793 End Time 01-Apr-2020 10.47.29 |
As indicated earlier, these benchmarks are intended to help to decide the format for a long running stress test. Besides the two systems using Android 9, two others are provided, one reason being to show that floating point sumchecks are identical. These depend on the number of calculations, that are different according to data size and operations per word, but the same at 1, 2, 4 and 8 threads in these groups.
Performance can be somewhat different to that obtained at the start of extended stress tests. With the benchmarks, some have too little running time and later ones might be influenced by higher CPU temperatures.
Particularly running the integer program using more threads than CPU cores can lead to higher than expected performance on increasing the number of threads. For example, with 8 cores, the 160 KB L2 cache test could end up running eight L1 cache tests, at the same time and 16 MB RAM test could run using shared L2 cache (or dedicated L2 caches).
Comparing the two Android 9 systems, using the same CPU, shows that 64 bit P45 appeared to be faster than 32 bit T26, on all floating point tests, with an average of 2.6 times. For the integer tests, T26 average speed was slightly faster over 1 to 4 threads, but P45 was somewhat faster using 8 to 32 threads.
MP-FPU Stress Test MP-Int Stress Test MFLOPS Numeric Results MB/second Op/ KB KB MB KB KB MB KB KB MB Same Secs Thrd Wrd 12.8 128 12.8 12.8 128 12.8 Secs Thrd 16 160 16 Sumcheck All 32 Bit P37 Cortex-A53 4 x 1.5 GHz 4 x 1.2 GHz, Android 7 8.9 T1 2 218 216 216 40392 76406 99700 6.3 1 5581 5048 2547 00000000 Yes 4.4 T2 2 452 452 432 40392 76406 99700 3.5 2 10559 10050 4372 FFFFFFFF Yes 2.3 T4 2 876 874 765 40392 76406 99700 2.3 4 18481 18007 5810 5A5A5A5A Yes 1.9 T8 2 1182 1083 838 40392 76406 99700 2.1 8 24289 29298 5768 AAAAAAAA Yes 15.2 T1 8 506 505 510 54760 85092 99819 1.9 16 27609 32680 5936 CCCCCCCC Yes 7.4 T2 8 1052 1055 1032 54760 85092 99819 1.9 32 27547 34166 6038 0F0F0F0F Yes 3.9 T4 8 1923 2028 2021 54760 85092 99819 3.2 T8 8 2331 2698 2403 54760 85092 99819 35.2 T1 32 865 886 874 35218 66014 99520 17.5 T2 32 1773 1767 1746 35218 66014 99520 8.9 T4 32 3499 3512 3451 35218 66014 99520 5.7 T8 32 5244 5663 5560 35218 66014 99520 32 Bit T26 4 x Cortex-A73 + 4 x Cortex-A53, all 2 GHz, Android 9 2.5 T1 2 1058 828 620 40392 76406 99700 2.7 1 10566 8744 8228 00000000 Yes 1.6 T2 2 1453 1376 952 40392 76406 99700 1.7 2 16227 19763 10517 FFFFFFFF Yes 1.2 T4 2 1734 2095 1264 40392 76406 99700 1.5 4 26317 25518 9511 5A5A5A5A Yes 1.1 T8 2 2303 2362 1292 40392 76406 99700 1.5 8 27321 25989 9929 AAAAAAAA Yes 3.9 T1 8 1951 2143 1880 54760 85092 99819 1.5 16 26590 28027 9486 CCCCCCCC Yes 2.2 T2 8 3373 4154 3308 54760 85092 99819 1.2 32 29078 49521 11620 0F0F0F0F Yes 1.4 T4 8 5884 5820 5440 54760 85092 99819 1.4 T8 8 5767 6427 5491 54760 85092 99819 12.5 T1 32 2485 2461 2457 35218 66014 99520 6.5 T2 32 4760 4921 4693 35218 66014 99520 3.2 T4 32 9159 9703 10281 35218 66014 99520 3.1 T8 32 9777 10723 10098 35218 66014 99520 64 Bit P45 4 x Cortex-A73 + 4 x Cortex-A53, all 2 GHz, Android 9 1.0 T1 2 3068 1983 1576 40392 76406 99700 3.0 1 9293 8315 7186 00000000 Yes 0.6 T2 2 4823 6229 1995 40392 76406 99700 2.1 2 16315 11414 9926 FFFFFFFF Yes 0.5 T4 2 6594 10712 2338 40392 76406 99700 1.3 4 24639 26650 11966 5A5A5A5A Yes 0.4 T8 2 8878 12392 2317 40392 76406 99700 1.2 8 38896 42982 10658 AAAAAAAA Yes 1.8 T1 8 5099 4376 3781 54760 85092 99819 1.2 16 38500 44182 10992 CCCCCCCC Yes 1.0 T2 8 8897 10583 5323 54760 85092 99819 1.0 32 48766 42833 12921 0F0F0F0F Yes 0.7 T4 8 16234 14843 7707 54760 85092 99819 0.5 T8 8 20060 22272 9221 54760 85092 99819 5.5 T1 32 6182 5634 5031 35218 66014 99520 3.0 T2 32 10534 10584 10278 35218 66014 99520 1.7 T4 32 19017 18406 18220 35218 66014 99520 1.5 T8 32 22002 20674 20920 35218 66014 99520 64 Bit P42, 4 x 2.0 GHz Cortex-A57 + 4 x 1.5 GHz Cortex-A53, Android 5 1.6 T1 2 2181 1780 756 40392 76406 99700 4.0 1 10855 7833 3839 00000000 Yes 1.0 T2 2 2877 3826 1131 40392 76406 99700 2.8 2 14504 12047 5239 FFFFFFFF Yes 0.8 T4 2 4285 6801 1351 40392 76406 99700 2.1 4 17355 16046 7261 5A5A5A5A Yes 0.6 T8 2 5001 8604 1656 40392 76406 99700 1.7 8 19687 25586 8437 AAAAAAAA Yes 2.5 T1 8 3938 3502 2342 54760 85092 99819 1.5 16 18690 25861 10762 CCCCCCCC Yes 1.5 T2 8 6519 5768 3928 54760 85092 99819 1.4 32 19463 24682 13136 0F0F0F0F Yes 1.1 T4 8 8685 10017 4941 54760 85092 99819 0.8 T8 8 12001 12541 6496 54760 85092 99819 6.5 T1 32 5028 4783 4548 35218 66014 99520 4.1 T2 32 7303 7918 7382 35218 66014 99520 3.0 T4 32 11226 10516 9991 35218 66014 99520 2.4 T8 32 11549 14283 13633 35218 66014 99520 |
MHz for Core T26 32 Bit MP-Int Stress Test Averages 160 KB, 8 Threads Secs 0 1 2 3 4 5 6 7 0-3 4-7 Secs MB/ 8 core 1.0 1989 1989 1989 1989 1989 1989 1989 1989 1989 1989 Sec MHz 2.3 1989 1989 1989 1989 1989 1989 1989 1989 1989 1989 3.7 1989 1989 1989 1989 1417 1417 1417 1326 1692 1394 10 56259 13717 5.0 1989 1989 1989 1989 1014 1014 1014 1014 1502 1014 20 51435 11038 6.4 1716 1716 1716 1716 910 910 1989 1989 1583 1450 31 50476 11100 7.6 1989 1989 1989 1989 1989 1989 1677 1677 1911 1833 42 48617 11516 8.8 793 793 793 793 1989 1989 1716 1716 1323 1853 53 48337 11113 10.2 1781 1781 1989 1989 1989 1989 1989 1989 1937 1989 64 48220 11522 11.5 1989 1989 1989 1989 1131 1131 1014 1014 1531 1073 76 44352 11555 12.9 1716 1846 1846 1625 793 793 793 793 1276 793 88 44747 11074 14.2 1989 1989 1989 1989 1989 1677 1677 1417 1840 1690 100 44178 10853 15.6 1989 1989 1781 1781 793 793 793 793 1339 793 112 42057 10879 16.9 1924 1989 1989 1989 1989 1989 1989 1677 1942 1911 18.2 1989 1989 1989 1989 1248 1131 1131 1131 1575 1160 Min 42057 10853 19.6 1326 1716 1989 1989 1989 1677 1417 1417 1690 1625 Max 56259 13717 21.0 1989 1989 1989 1989 1014 1014 910 910 1476 962 22.3 1716 1716 1716 1716 793 793 793 1989 1404 1092 Min/Max 0.75 0.79 23.7 1989 1989 1989 1989 1625 1417 1417 1417 1729 1469 25.0 1989 1989 1989 1989 1014 1014 793 793 1446 904 26.4 1625 1625 1989 1989 1989 1989 1989 1989 1898 1989 27.7 1989 1989 1989 1989 1131 1131 1014 1014 1531 1073 29.0 1716 1781 1781 1586 793 793 793 793 1255 793 30.4 1989 1989 1989 1989 1417 1417 1417 1248 1682 1375 31.7 1846 1846 1781 1781 793 793 793 793 1303 793 33.1 1326 1846 1989 1989 1989 1989 1989 1625 1843 1898 34.5 1989 1989 1989 1989 1131 1131 1131 1131 1560 1131 35.7 1989 1989 1781 1846 910 793 793 793 1362 822 37.0 1326 1326 1989 1989 1989 1989 1989 1989 1823 1989 38.4 1989 1989 1989 1989 1248 1014 1014 910 1518 1047 39.7 1846 1781 1781 1508 793 793 793 793 1261 793 41.1 1716 1989 1924 1989 1716 1924 1989 1989 1905 1905 42.4 1989 1989 1989 1989 1989 1625 1131 1131 1729 1469 43.7 1924 1924 1781 1781 793 793 793 793 1323 793 45.1 1248 1989 1989 1989 1989 1989 1989 1989 1896 1989 46.4 1989 1989 1989 1989 1014 1014 910 910 1476 962 47.8 1625 1417 1417 1131 793 793 1989 1989 1394 1391 49.2 1989 1989 1989 1989 1326 1131 1131 1131 1584 1180 50.5 1716 1716 1716 1625 793 793 793 793 1243 793 51.8 793 793 793 793 1989 1989 1989 1014 1269 1745 53.3 1989 1989 1989 1989 1989 1989 1677 1326 1867 1745 54.6 1989 1924 1924 1924 910 910 793 793 1396 852 55.9 1417 1248 1248 1989 1989 1989 1989 1989 1732 1989 57.3 1989 1989 1989 1989 1014 1014 1014 910 1489 988 58.7 1625 1625 1417 1417 793 793 793 1677 1268 1014 60.0 1989 1989 1989 1989 1326 1326 1131 1131 1609 1229 61.4 1781 1781 1781 1781 793 793 793 793 1287 793 62.8 1989 1989 1989 1989 1989 1989 1989 1989 1989 1989 64.0 1131 793 793 1989 1989 1989 1989 1989 1583 1989 65.4 1989 1989 1989 1989 910 910 910 793 1435 881 66.7 1248 910 793 910 1924 1989 1989 1989 1469 1973 68.2 1989 1989 1989 1989 1326 1326 1014 1014 1580 1170 69.4 1846 1846 1677 1677 793 793 793 793 1277 793 70.8 1846 1989 1989 1989 1989 1989 1625 1625 1880 1807 72.2 1989 1989 1924 1924 793 793 793 793 1375 793 73.6 793 910 793 793 1846 1677 1924 1989 1341 1859 75.0 793 793 793 793 1989 1989 1326 1716 1274 1755 76.4 1989 1989 1989 1989 1326 1326 1014 1014 1580 1170 77.7 1924 1625 1625 1625 793 793 793 793 1246 793 79.1 910 1716 1924 1989 1989 1989 1989 1989 1812 1989 80.3 1989 1989 1989 1989 1014 1014 910 910 1476 962 81.6 1781 1716 1781 1781 793 793 793 793 1279 793 83.0 910 1846 1924 1989 1989 1989 1989 1989 1828 1989 84.3 1989 1989 1989 1989 910 910 910 793 1435 881 85.6 1326 910 793 910 1677 1924 1989 1989 1440 1895 87.1 1846 1989 1677 1781 1989 1989 1989 1989 1906 1989 88.4 1989 1989 1989 1989 1014 1014 910 910 1476 962 89.7 1625 1326 1326 910 793 793 793 793 1045 793 91.2 1989 1989 1989 1989 1625 1625 1326 1014 1693 1398 92.6 1846 1846 1677 1677 793 793 793 793 1277 793 94.0 1014 1014 1014 1924 1924 1989 1989 1989 1607 1973 95.4 1989 1989 1989 1989 910 910 793 793 1420 852 96.7 1716 1417 1417 1014 793 793 793 793 1092 793 98.2 793 1989 1989 1989 1989 1989 1989 1989 1840 1989 99.4 1989 1989 1989 1989 1625 1326 1131 1131 1646 1303 |
Maximum and average MFLOPS are shown below (average being harmonic mean, with a constant number of calculations and variable running times). These show that 64 bit / 32 bit operation produced a maximum speed ratio around 2.8 times and, with decreased 32 bit performance with time, an average 64 bit gain of about 3.2 times, all over 5 minutes. For all these measurements, eight thread versus 4 thread performance gain was around a mere 20%.
32 Bit T26 4xCortex-A73 + 4xCortex-A53 64 Bit P45 4xCortex-A73 + 4xCortex-A53 Ops/ Numeric Ops/ Numeric Seconds Size Thrd Wrd MFLOPS Results Seconds Size Thrd Wrd MFLOPS Results 12.5 12.8 KB 4 32 8973 35216 9.2 12.8 KB 4 32 24716 35216 24.8 12.8 KB 4 32 8300 35216 18.4 12.8 KB 4 32 23373 35216 37.6 12.8 KB 4 32 8035 35216 27.1 12.8 KB 4 32 24732 35216 50.4 12.8 KB 4 32 7953 35216 35.8 12.8 KB 4 32 24664 35216 63.6 12.8 KB 4 32 7771 35216 45.2 12.8 KB 4 32 22939 35216 77.1 12.8 KB 4 32 7615 35216 53.8 12.8 KB 4 32 24941 35216 90.4 12.8 KB 4 32 7654 35216 62.4 12.8 KB 4 32 25007 35216 104.0 12.8 KB 4 32 7544 35216 71.1 12.8 KB 4 32 24758 35216 117.8 12.8 KB 4 32 7443 35216 79.8 12.8 KB 4 32 24770 35216 131.8 12.8 KB 4 32 7333 35216 88.4 12.8 KB 4 32 24871 35216 145.5 12.8 KB 4 32 7450 35216 97.2 12.8 KB 4 32 24657 35216 159.5 12.8 KB 4 32 7315 35216 106.1 12.8 KB 4 32 24095 35216 173.5 12.8 KB 4 32 7299 35216 114.7 12.8 KB 4 32 24957 35216 187.6 12.8 KB 4 32 7282 35216 123.4 12.8 KB 4 32 24710 35216 201.6 12.8 KB 4 32 7335 35216 132.6 12.8 KB 4 32 23412 35216 215.8 12.8 KB 4 32 7195 35216 141.3 12.8 KB 4 32 24820 35216 230.0 12.8 KB 4 32 7205 35216 150.0 12.8 KB 4 32 24729 35216 244.5 12.8 KB 4 32 7066 35216 158.6 12.8 KB 4 32 24795 35216 258.7 12.8 KB 4 32 7192 35216 167.5 12.8 KB 4 32 24256 35216 273.4 12.8 KB 4 32 7012 35216 176.2 12.8 KB 4 32 24749 35216 287.7 12.8 KB 4 32 7149 35216 184.9 12.8 KB 4 32 24702 35216 302.2 12.8 KB 4 32 7031 35216 193.7 12.8 KB 4 32 24610 35216 202.3 12.8 KB 4 32 24915 35216 MFLOPS 210.9 12.8 KB 4 32 24841 35216 Max Average 219.9 12.8 KB 4 32 24091 35216 229.1 12.8 KB 4 32 23283 35216 32 Bit 8973 7481 237.8 12.8 KB 4 32 24883 35216 64 Bit 25007 24383 246.4 12.8 KB 4 32 24743 35216 Ratio 2.79 3.26 255.7 12.8 KB 4 32 23252 35216 264.4 12.8 KB 4 32 24691 35216 273.7 12.8 KB 4 32 23096 35216 282.4 12.8 KB 4 32 24915 35216 291.7 12.8 KB 4 32 23033 35216 300.4 12.8 KB 4 32 24618 35216 13.1 12.8 KB 8 32 10855 35216 10.4 12.8 KB 8 32 29662 35216 26.8 12.8 KB 8 32 9737 35216 20.1 12.8 KB 8 32 30588 35216 40.3 12.8 KB 8 32 9823 35216 29.8 12.8 KB 8 32 30486 35216 54.4 12.8 KB 8 32 9484 35216 39.5 12.8 KB 8 32 30792 35216 68.7 12.8 KB 8 32 9310 35216 49.3 12.8 KB 8 32 30285 35216 83.0 12.8 KB 8 32 9308 35216 59.2 12.8 KB 8 32 29970 35216 97.6 12.8 KB 8 32 9119 35216 69.2 12.8 KB 8 32 29612 35216 112.3 12.8 KB 8 32 9034 35216 79.3 12.8 KB 8 32 29388 35216 126.9 12.8 KB 8 32 9157 35216 90.2 12.8 KB 8 32 27391 35216 141.8 12.8 KB 8 32 8898 35216 100.4 12.8 KB 8 32 29032 35216 156.9 12.8 KB 8 32 8834 35216 110.5 12.8 KB 8 32 29421 35216 171.8 12.8 KB 8 32 8952 35216 120.8 12.8 KB 8 32 28920 35216 186.8 12.8 KB 8 32 8876 35216 130.9 12.8 KB 8 32 29456 35216 201.7 12.8 KB 8 32 8918 35216 141.0 12.8 KB 8 32 29223 35216 216.8 12.8 KB 8 32 8793 35216 151.2 12.8 KB 8 32 29265 35216 232.0 12.8 KB 8 32 8768 35216 161.3 12.8 KB 8 32 29339 35216 247.5 12.8 KB 8 32 8593 35216 171.7 12.8 KB 8 32 28577 35216 262.8 12.8 KB 8 32 8739 35216 181.9 12.8 KB 8 32 28973 35216 278.2 12.8 KB 8 32 8631 35216 192.1 12.8 KB 8 32 29094 35216 293.5 12.8 KB 8 32 8685 35216 202.6 12.8 KB 8 32 28508 35216 309.0 12.8 KB 8 32 8614 35216 212.7 12.8 KB 8 32 29185 35216 223.0 12.8 KB 8 32 28933 35216 MFLOPS 233.6 12.8 KB 8 32 27934 35216 Max Average 244.4 12.8 KB 8 32 27632 35216 255.1 12.8 KB 8 32 27642 35216 32 Bit 10855 9074 265.9 12.8 KB 8 32 27651 35216 64 Bit 30792 28814 276.8 12.8 KB 8 32 27283 35216 Ratio 2.84 3.18 287.7 12.8 KB 8 32 27056 35216 298.5 12.8 KB 8 32 27616 35216 309.6 12.8 KB 8 32 26728 35216 |
This program is also calibrated to initially run for 10 seconds., using multiple steps subtracting and adding different data patterns. In this case, performance at 32 bits and 64 bits was similar at the start, but 32 Bit code taking longer after increased running time. This time, average 8 thread gains were around 1.4 times, still disappointing.
32 Bit T26 4xCortex-A73 + 4xCortex-A53 64 Bit P45 4xCortex-A73 + 4xCortex-A53 Seconds Size Thrds MB/s Sumcheck Same Seconds Size Thrds MB/s Sumcheck Same 11.1 16 KB 4 38387 00000000 Yes 9.1 16 KB 4 36268 00000000 Yes 22.8 16 KB 4 35653 00000000 Yes 17.4 16 KB 4 38292 00000000 Yes 35.1 16 KB 4 34062 00000000 Yes 25.8 16 KB 4 38200 00000000 Yes 47.5 16 KB 4 33531 00000000 Yes 34.2 16 KB 4 37753 00000000 Yes 60.2 16 KB 4 33144 00000000 Yes 42.6 16 KB 4 38438 00000000 Yes 73.0 16 KB 4 32467 00000000 Yes 51.1 16 KB 4 37645 00000000 Yes 86.5 16 KB 4 31024 FFFFFFFF Yes 59.6 16 KB 4 37276 FFFFFFFF Yes 100.0 16 KB 4 30921 FFFFFFFF Yes 68.1 16 KB 4 37528 FFFFFFFF Yes 113.9 16 KB 4 30025 FFFFFFFF Yes 76.5 16 KB 4 38388 FFFFFFFF Yes 127.9 16 KB 4 29947 FFFFFFFF Yes 84.8 16 KB 4 38286 FFFFFFFF Yes 141.9 16 KB 4 29711 FFFFFFFF Yes 93.6 16 KB 4 36337 FFFFFFFF Yes 156.2 16 KB 4 29249 FFFFFFFF Yes 102.2 16 KB 4 37335 FFFFFFFF Yes 170.8 16 KB 4 28642 5A5A5A5A Yes 110.4 16 KB 4 38694 5A5A5A5A Yes 185.4 16 KB 4 28706 5A5A5A5A Yes 119.0 16 KB 4 37372 5A5A5A5A Yes 199.8 16 KB 4 28997 5A5A5A5A Yes 127.2 16 KB 4 38736 5A5A5A5A Yes 214.4 16 KB 4 28625 5A5A5A5A Yes 135.6 16 KB 4 38422 5A5A5A5A Yes 229.2 16 KB 4 28163 5A5A5A5A Yes 144.2 16 KB 4 36793 5A5A5A5A Yes 244.0 16 KB 4 28272 5A5A5A5A Yes 152.5 16 KB 4 38660 5A5A5A5A Yes 258.9 16 KB 4 28127 AAAAAAAA Yes 161.0 16 KB 4 37591 AAAAAAAA Yes 273.9 16 KB 4 27808 AAAAAAAA Yes 169.8 16 KB 4 36227 AAAAAAAA Yes 288.9 16 KB 4 27896 AAAAAAAA Yes 178.3 16 KB 4 37770 AAAAAAAA Yes 303.9 16 KB 4 27751 AAAAAAAA Yes 187.1 16 KB 4 36262 AAAAAAAA Yes 195.4 16 KB 4 38590 AAAAAAAA Yes MFLOPS 203.7 16 KB 4 38470 AAAAAAAA Yes Max Average 212.2 16 KB 4 37757 CCCCCCCC Yes 220.5 16 KB 4 38452 CCCCCCCC Yes 32 Bit 38387 30269 228.8 16 KB 4 38275 CCCCCCCC Yes 64 Bit 38755 37639 237.1 16 KB 4 38418 CCCCCCCC Yes Ratio 1.01 1.24 246.0 16 KB 4 35916 CCCCCCCC Yes 254.6 16 KB 4 37177 CCCCCCCC Yes 262.9 16 KB 4 38755 0F0F0F0F Yes 271.2 16 KB 4 38282 0F0F0F0F Yes 279.9 16 KB 4 36655 0F0F0F0F Yes 288.4 16 KB 4 37955 0F0F0F0F Yes 296.9 16 KB 4 37251 0F0F0F0F Yes 305.9 16 KB 4 35546 0F0F0F0F Yes 9.4 16 KB 8 56586 00000000 Yes 8.3 16 KB 8 46811 00000000 Yes 19.4 16 KB 8 52351 00000000 Yes 15.8 16 KB 8 50278 00000000 Yes 29.8 16 KB 8 50744 00000000 Yes 23.4 16 KB 8 49441 00000000 Yes 40.2 16 KB 8 50526 00000000 Yes 31.0 16 KB 8 49435 00000000 Yes 51.0 16 KB 8 48572 00000000 Yes 38.3 16 KB 8 51458 00000000 Yes 61.9 16 KB 8 47939 00000000 Yes 45.7 16 KB 8 50974 00000000 Yes 73.8 16 KB 8 44093 FFFFFFFF Yes 53.5 16 KB 8 48855 FFFFFFFF Yes 85.7 16 KB 8 43975 FFFFFFFF Yes 61.2 16 KB 8 48848 FFFFFFFF Yes 97.8 16 KB 8 43582 FFFFFFFF Yes 69.0 16 KB 8 48429 FFFFFFFF Yes 110.1 16 KB 8 42475 FFFFFFFF Yes 76.7 16 KB 8 48889 FFFFFFFF Yes 122.3 16 KB 8 43104 FFFFFFFF Yes 84.5 16 KB 8 48145 FFFFFFFF Yes 134.5 16 KB 8 42916 FFFFFFFF Yes 92.1 16 KB 8 49344 FFFFFFFF Yes 147.4 16 KB 8 40529 5A5A5A5A Yes 99.8 16 KB 8 49037 5A5A5A5A Yes 160.4 16 KB 8 40457 5A5A5A5A Yes 107.6 16 KB 8 48635 5A5A5A5A Yes 173.2 16 KB 8 41098 5A5A5A5A Yes 115.5 16 KB 8 47793 5A5A5A5A Yes 186.4 16 KB 8 39688 5A5A5A5A Yes 123.2 16 KB 8 48601 5A5A5A5A Yes 199.4 16 KB 8 40247 5A5A5A5A Yes 131.3 16 KB 8 46527 5A5A5A5A Yes 212.3 16 KB 8 40521 5A5A5A5A Yes 139.5 16 KB 8 46376 5A5A5A5A Yes 225.5 16 KB 8 39772 AAAAAAAA Yes 147.5 16 KB 8 47023 AAAAAAAA Yes 238.6 16 KB 8 40079 AAAAAAAA Yes 155.5 16 KB 8 46766 AAAAAAAA Yes 251.8 16 KB 8 39727 AAAAAAAA Yes 163.9 16 KB 8 45024 AAAAAAAA Yes 264.8 16 KB 8 40300 AAAAAAAA Yes 172.1 16 KB 8 45983 AAAAAAAA Yes 278.2 16 KB 8 39273 AAAAAAAA Yes 180.2 16 KB 8 46460 AAAAAAAA Yes 291.4 16 KB 8 39531 AAAAAAAA Yes 188.6 16 KB 8 45181 AAAAAAAA Yes 304.5 16 KB 8 40175 CCCCCCCC Yes 197.2 16 KB 8 43757 CCCCCCCC Yes 205.4 16 KB 8 45801 CCCCCCCC Yes MFLOPS 213.6 16 KB 8 45773 CCCCCCCC Yes Max Average 221.8 16 KB 8 46080 CCCCCCCC Yes 230.2 16 KB 8 45115 CCCCCCCC Yes 32 Bit 56586 43081 238.7 16 KB 8 43972 CCCCCCCC Yes 64 Bit 51458 46815 247.0 16 KB 8 45540 0F0F0F0F Yes Ratio 0.91 1.09 255.4 16 KB 8 44860 0F0F0F0F Yes 263.7 16 KB 8 45332 0F0F0F0F Yes 272.2 16 KB 8 44621 0F0F0F0F Yes 280.5 16 KB 8 45252 0F0F0F0F Yes 289.4 16 KB 8 42460 0F0F0F0F Yes 297.8 16 KB 8 44938 FFFFFFFF Yes 306.2 16 KB 8 44774 FFFFFFFF Yes |
As for earlier stress tests, these were run using fully charged batteries but still with power connections. Only measured performance details are provided for 4 and 8 threads, besides 32 bit and 64 bit operation. At the bottom are minimum/maximum ratios along with those for the shorter tests, showing some, but not excessive deterioration.
Average short run 8/4 thread comparisons are also shown as in the same sort of disappointing range. The average 64/32 bit performance ratios included were similar to the latest measurements at around 200 seconds.
32 Bit 64 Bit 64b/32b Threads 4 8 8T/4T 4 8 8T/4T 4 8 Seconds MFLOPS MFLOPS MFLOPS MFLOPS 12 9673 11998 23478 30753 24 8631 10901 23263 30616 36 8453 10486 24817 30659 48 8358 10311 1.24 24727 30725 1.27 2.74 2.81 60 8225 9761 24752 30687 73 8122 10087 24438 29110 86 7968 10034 24727 30698 99 7907 9695 1.23 24769 29706 1.22 3.06 3.04 112 7805 9988 24808 27388 125 7762 9729 24885 30011 138 7712 9881 24207 29234 152 7646 9700 1.27 25000 29359 1.17 3.20 2.95 165 7538 9623 23632 29439 179 7662 9629 24705 29192 192 7470 9573 23200 28978 206 7454 9558 1.27 24327 29264 1.22 3.18 3.04 220 7534 9348 23296 28645 234 7447 9646 24239 28814 247 7419 9392 22652 29386 261 7350 9377 1.27 21921 29168 1.26 3.10 3.07 275 7369 9265 21717 28807 289 7351 9435 22272 27864 303 7307 9246 21182 27641 317 7239 9285 1.27 22090 27445 1.28 2.98 3.00 331 7215 8975 21590 26962 346 7285 9116 21434 26644 360 7195 9271 21632 27501 374 7177 9130 1.26 21394 26296 1.25 2.98 2.94 388 7188 9112 20614 27058 403 7125 9094 21352 26994 417 6926 8922 20620 25748 432 7168 8828 1.27 21038 26805 1.27 2.94 2.96 446 7034 9048 20540 25174 461 6943 9116 20685 26581 475 7093 9016 20308 26411 490 7106 8978 1.28 20271 25818 1.27 2.90 2.88 505 6856 9098 20463 26482 519 7118 9008 20414 26344 534 6887 8935 20207 25902 549 6988 8840 1.29 20587 25467 1.28 2.93 2.90 564 6914 9013 19934 22613 578 6964 8882 20642 25742 593 6873 8812 20294 23375 608 6794 8859 1.29 20500 25315 1.19 2.95 2.73 623 7004 8931 20111 25511 638 6883 8887 19907 25244 653 6821 8895 20298 25651 668 6781 8999 1.30 19523 25159 1.27 2.90 2.84 683 6743 8435 19814 25494 698 6810 8891 20332 24850 713 6729 8486 19107 25095 728 7007 8816 1.27 19765 25098 1.27 2.90 2.90 744 6547 8813 19428 25271 758 7036 9020 19744 25248 774 6574 8722 19831 24945 789 6889 8928 1.31 19710 24150 1.27 2.91 2.81 803 6953 8564 19622 24556 819 6666 8771 19877 24458 834 6772 8583 19857 24391 849 6915 8834 1.27 19662 24378 1.24 2.89 2.81 864 6812 8962 19729 24666 879 6737 8585 19806 24382 894 6824 8763 19639 24502 909 6867 8754 1.29 19818 24165 1.24 2.90 2.79 Min/Max 0.68 0.70 0.76 0.74 Was 0.78 0.79 1.21 0.92 0.87 1.18 3.26 3.18 |
These again are only performance measurements using fully charged batteries. As expected, 64 bit performance gains are not as high as during the floating point. Eight threads provided a more significant gain but still disappointing.
The 64 Bit, 8 thread, 15 minute test was repeated, in the same environment, on the earlier P42 phone, with 4 x 2.0 GHz Cortex-A57 + 4 x 1.5 GHz Cortex-A53 CPUs, In this case, maximum speed was 24219 MB/second and minimum 12433, ratio 51%, much worse than the 83% indicated here.
32 Bit 64 Bit 64b/32b Threads 4 8 8T/4T 4 8 8T/4T 4 8 Seconds MB/sec MB/sec MB/sec MB/sec 12 35498 56835 38343 50723 26 31975 52318 38617 48180 39 30847 50381 36574 51090 54 30156 45773 1.60 38169 51145 1.33 1.18 0.98 68 29705 45081 38439 50287 82 29550 43697 38433 50673 98 27963 43543 36752 51006 113 27482 41777 1.52 37412 47669 1.32 1.32 1.15 128 27661 41643 38405 49947 144 27674 41269 38439 49539 159 27360 40749 38610 49384 176 26356 39789 1.50 34741 46989 1.30 1.38 1.20 191 27348 40892 38290 49556 207 26993 39517 36865 48913 223 26677 40002 38671 49120 239 26790 38869 1.48 37877 49455 1.30 1.41 1.24 255 26566 33442 37242 49503 271 26471 34318 36072 48806 287 26440 38901 35958 47341 303 26494 39611 1.38 35679 49101 1.34 1.37 1.33 320 25974 38553 36110 49442 336 26543 38942 35746 48671 352 26389 38155 34409 49235 368 26226 38257 1.46 36108 48623 1.38 1.35 1.27 384 26317 37875 34499 48423 400 26480 38588 30353 48819 417 26200 38239 34861 48738 433 26254 38018 1.45 34600 47805 1.44 1.28 1.27 449 26255 38903 34256 46875 465 26208 38962 35292 46740 481 26258 38284 34893 47445 498 26167 38259 1.47 34719 46890 1.35 1.33 1.22 514 25858 37884 33866 46965 533 23217 38019 34356 45736 550 24371 37623 32383 46840 567 25258 36721 1.52 33537 46554 1.39 1.36 1.24 583 25789 37684 33215 46210 600 26176 37491 34410 45718 616 25932 38622 33338 46194 633 25611 37864 1.47 32229 46098 1.38 1.29 1.21 649 25957 37598 34166 46212 666 25971 38009 33255 45406 682 26127 37391 33324 47250 698 26118 38164 1.45 32876 47173 1.39 1.28 1.23 715 26126 38310 33004 46926 731 26207 39458 35513 45323 747 25713 41501 35981 45854 763 26466 40280 1.53 35559 45652 1.31 1.34 1.15 780 25539 40787 35726 42698 797 25816 38650 33283 45292 814 24917 38249 32865 45149 830 26226 38002 1.52 33781 44259 1.31 1.32 1.14 847 25703 38614 33289 45062 863 25855 37284 33523 43999 880 24581 36487 33151 44642 897 25525 38179 1.48 31746 44768 1.36 1.30 1.19 Min/Max 0.65 0.59 0.78 0.83 Was 0.72 0.69 1.42 0.92 0.83 1.24 1.24 1.09 |
T23 Amazon Fire HD 10, 2 x 1.8 GHz Cortex A72 + 2 x 1.4 GHz Cortex A53, GPU PowerVR GX6250 Device Amazon KFSUWI Screen pixels w x h 1200 x 1848 Android Build Version 5.1.1 Hardware : MT8173 processor : 0, 1 model name : AArch64 Processor rev 0 (aarch64) BogoMIPS : 26.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 2 processor : 2, 5 model name : AArch64 Processor rev 0 (aarch64) BogoMIPS : 26.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 0 T26 Kindle Fire HD 1, four Cortex-A73 and four Cortex-A53, all 2 GHz GPU Mali-G72 MP3 Device Amazon KFMAWI Screen pixels w x h 1200 x 1848 Android Build Version 9 32 bit Hardware : MT8183 processor : 0 to 7 model name : ARMv8 Processor rev 2 (v8l) BogoMIPS : 26.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd09 CPU revision : 2 P37 Lenovo Moto G4 Snapdragon 617, Octa-core Cortex-A53 28 nm Cores 4x1.5 GHz 4x1.2 GHz, 2 GB RAM 933 MHz, GPU Adreno 405 550 MHz Device Motorola Moto G (4) Screen pixels w x h 1080 x 1776 Android Build Version 6.0.1 CPU part : 0xd03 CPU revision : 4 Hardware : Qualcomm Technologies, Inc MSM8952 Revision : 82a0 Processor : ARMv7 Processor rev 4 (v7l) Device : athene_13mp Radio : EMEA MSM Hardware : MSM8952 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 5, 6, 7 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.00 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 Linux version 3.10.84-g061c37c P37 Later Android Build Version 7.0 Linux version 3.10.84-g478d03a Continued below or Go To Start P42 LG G Flex2 Qualcomm 810 Octa-core 4x1.5 GHz Cortex-A53 & 4x2.0 GHz Cortex-A57. 20nm, Dual Channel RAM 25.6 GB/s, Adreno 430 Graphics @ 600 MHz, L1/L2 32KB/2MB Device LGE LG-H955 Screen pixels w x h 1080 x 1794 Android Build Version 5.1.1 Processor : AArch64 Processor rev 1 (aarch64) processor : 0 to 7 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd07 CPU revision : 1 Hardware : Qualcomm Technologies, Inc MSM8994 Revision : 0008 Linux version 3.10.49 - 64 Bit P43 Samsung Galaxy S7 edge, Exynos 8890 (2.3 GHz Quad + 1.6 GHz Quad) 14nm, Quad Channel RAM 29.8 GB/s, Mali T880 Graphics @ 624 MHz, L1 32KB, L2 1MB Device Samsung SM-G935F Screen pixels w x h 1080 x 1920 Android Build Version 7.0 processor : 0 to 3 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 4 to 7 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x53 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 1 Linux version 3.18.14 P44 Google Pixel 2, Qualcomm Snapdragon 835 MSM8998, Kryo 280 10nm, Customized Cortex-A73 4 x 2350 + 4 x 1900 Adreno 540 710 MHz Device Google Pixel 2 Screen pixels w x h 1080 x 1794 Android Build Version 8.1.0 processor : 0 to 3 BogoMIPS : 38.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xa CPU part : 0x801 CPU revision : 4 processor : 4 to 7 BogoMIPS : 38.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xa CPU part : 0x800 CPU revision : 1 P45 Moptorola One Macro, four Cortex-A73 and four Cortex-A53, all 2 GHz MT6771 Helio P70 chipset 12nm GPU Mali-G72 MP3 Screen pixels w x h 720 x 1339 Android Build Version 9 64 bit processor : 0 to 7 BogoMIPS : 26.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 |