Android 12 and 13 Benchmarks and Cortex-X2 CPU With Low MP Efficiency


Contents


Summary Introduction Configurations
Whetstone Benchmark Dhrystone Benchmark Linpack Benchmark
Livermore Loops Benchmark MemSpeed Benchmark NeonSpeed Benchmark
BusSpeed Benchmark RandMem Benchmark FFT Benchmarks
MP-Whetstone Benchmark MP-Dhrystone Benchmark MP-BusSpeed Benchmark
MP-RandMem Benchmark MP-MFLOPS Benchmark NEON-MFLOPS-MP Benchmark
Java OpenGL Benchmark Java Drawing Benchmark Java Whetstone Benchmark
Java Linpack Benchmark DriveSpeed Benchmark CPU Stress Tests
Integer Stress Benchmark Floating Point Stress Benchmark Integer Stress Tests
Floating Point Stress Tests More Integer Stress Tests More Floating Point Stress Tests


Summary

The main purpose of this report was to confirm that the benchmarks and stress testing programs continued to run successfully on the later versions of Android, which they did. Again the systems tested had eight ARM big.LITTLE CPU configurations, one of which was run using earlier versions of Android that effectively produced the same levels of performance. Configurations comprised:
System 1 - 2 x Cortex-A76 at 2.05 GHz, 6 x Cortex-A55  at 2.00 GHz                             
System 2 - 2 x Cortex-A76 at 2.00 GHz, 6 x Cortex-A55  at 1.80 GHz                             
System 3 - 2 x Cortex-A75 at 2.00 GHz, 6 x Cortex-A55  at 2.00 GHz                             
System 4 - 1 x Cortex-X2  at 2.80 GHz, 3 x Cortex-A710 at 2.52 GHz, 4 x Cortex-A510 at 1.82 GHz
Particular attention is given to comparing performance of the newer Cortex-X2 processor and the Cortex-A76, also multiprocessor efficiency and battery/mains power effects.

10 Single Core Benchmarks

These and most others are normally run on power to avoid reduction in clock speed as the battery discharges and expected to use the fastest CPU core. The first set were the old Classic Benchmarks comprising Whetstone, Dhrystone, Linpack and Livermore Loops. Then, performance is expected to be proportional to CPU MHz with some variation due to architectural changes. System 4/System 2 performance gains are included, with minimum expectations of 1.4 times (2.8/2.0 GHz), with ratings of overall performance being 1.50, 2.40, 2.37# and 2.15 times over the four benchmarks, This includes a revised calculation for Benchmark 3# that produced a lower Cortex-X2 gain of 1.35 times, demonstrating that it was slower than running on battery. Benchmark 4 was the last one that appeared to run on the X2 at full speed under power (see recorded dates/times).

6 Single Core Memory Benchmarks - Each of these carries out four to six computing variations using 10 or 11 different memory levels, covering all caches and RAM. Initial runs, under power, demonstrated low System 4/System 2 performance ratios, some being less than 1.0. Later they were run under battery power, where most of the 300+ ratios were greater than 1.4 times, with highest gains of up to 10 times, using the Cortex-X2 L3 cache.

5 Multithreading Benchmarks These were run using 1, 2, 4 and 8 threads, two of which are included to demonstrate unsuitable programming code for using multiple CPU cores. The report again includes more than 300 System 4/2 comparisons with minimum, average, maximum ratios of 1.03, 2.21, 13.3 times, ignoring one at 0.89.

Calculations of Multithreading Efficiency are also provided, where examples were as below. PC results are for an Intel Core i5 CPU (using 8 threads on a 4 core processor) from Cray 1 Supercomputer Performance Comparisons With Home Computers Phones and Tablets.htm, ratios demonstrating what might be expected. Here 8 PC threads produced a performance greater than 7 times that using 1 thread. This is for the relatively simple code in the Whetstone Benchmark. Here, the Android systems were less efficient using 8 real cores with gains of around 5 times. Then, Android MP_MFLOPS 8 core gains were particularly low at less than 3 times, and with the unbalanced Cortex-X2 ratio of 1.31 times using 2 cores.

        Whetstone               MP-MFLOPS                                  
Threads     PC SSE  S2 A76   S4 X2  PC SSE  S2 A76   S4 X2                         
   1         1.00    1.00    1.00    1.00    1.00    1.00                        
   2         1.99    2.13    1.87    1.97    1.92    1.31                        
   4         3.61    3.47    3.57    2.59    2.07    2.11                        
   8         7.02    5.14    4.64    3.59    2.71    2.21                        

   Max MFLOPS   8773    4869    7993  119460   34024   70401                           
   PC 4.15 GHz AVX512                 325915                                           

MP-MFLOPS had been run earlier on power, with the latest run on battery, providing faster performance, as mentioned above. NEON-MFLOPS-MP, the same program but using NEON SIMD functions, was run next on power, immediately followed by one on battery. This time, performance via power and battery was similar. Other applicable results are from MP-BusSpeed and MP-RandMem, only run on battery, where average System 4/2 random access performance gains were up to 10.6 times using the the former’s large L3 cache. System 4 provided widely variable gains on other test functions.

Java and Graphics Benchmarks

These comprise OpenGL and Drawing programs, then Java versions of Whetstone and Linpack benchmarks. The main point is that they all ran successfully but some of the results are questionable.

CPU Stress Tests

There are two main CPU stress testing programs that can use up to 32 threads one executing floating point instructions and the other integer arithmetic. Parameters are provided to specify testing time, number of threads and memory size, the former also includes floating point complexity.

Benchmarks are provided, using all options, to help in determining stress testing parameters, each thread using different segments of the data, with repetitive calculations. As with other programs, performance varies depending on the current environment including battery or mains power and temperature state.

The Integer Benchmark uses up to 32 threads where, with 8 CPU cores, similar performance is expected using 16 and 32 threads. However, there are unexpected faster speeds where data used by threads can be transferred from lower level caches. The System 4 test demonstrated the usual performance gains over System 2 but, this time, the earlier run on power was faster the the later one on battery.

The Floating Point Benchmark uses up to 8 threads but executes increasing computation levels. System 4/2 comparisons demonstrated the former’s inferior performance gains using 2 threads and inexplicable Battery/Power ratios between 0.53 and 1.60.

Integer Stress Tests were run for 15 minutes and 8 threads on the three older systems also including samples of CPU MHz measurements of the 8 cores. There, the 6 classified as LITTLE mainly appeared to run at constant frequencies with variations on the BIG ones. Runs using mains and battery power are included, producing similar performance, with variations in LITTLE CPU MHz on the latter. Tests were run on System 4 at 8, 4, 2 and 1 threads, including connecting mains power during the third one. For this phone, the MHz program failed to operate properly and the stress test timed out, without manual intervention (see More Integer Stress Tests).

Floating Point Stress Tests These were also run for 15 minutes using 8 threads, demonstrating the same pattern of frequency changes, using 32 calculations per word. This time, System 2 shows all cores running at maximum MHz throughout the 15 minute test. System 4 first run covered running on battery, reducing performance by 36% over the testing time, probably due to battery discharge. The immediately following test was via power starting at an increased speed. This time, reduction was 31%. Later came tests using 4, 2 and 1 threads, where all suffered from performance degradation, including the last having a long timeout that lead to increased speed.

More Integer and Floating Point Stress Tests Based on some of the above results and later ones, minimum, average and maximum performance ratios are provided for all four systems using 1, 2, 4, and 8 threads. Following are sample highlight results of multithreading efficiency and measured performance using 2 and 8 threads of Systems 2 and 4. Performance MOPS are averages of MFLOPS for floating point and measured MBytes per second divided by four for integers. Although System 4 is shown to be faster on all measurements, it is indicated as having a lowest thread efficiency using all thread levels but only on floating point calculations. Efficiency using 4 and 8 threads were particularly low in all cases.

            Integer       Floating Point    
Threads  S2 A76  S4  X2   S2 A76  S4  X2    
   1       1.0     1.0      1.0     1.0     
   2       1.7     1.9      1.9     1.2     
   4       2.3     2.5      2.4     1.8     
   8       3.3     3.8      2.6     2.1     

MOPS 2T    7677   10027    24896   36783    
MOPS 8T   13840   26147    37249   65709    

Performance Running On Battery Or Power

Performance is likely to reduce at increased CPU temperatures and as the battery becomes discharged. These test results could not identify the best option for maximum performance using the Cortex-X2 based phone. Perhaps there are additional heating effects running certain programs whilst the battery is being charged at a high rate. Maybe all comparative power and battery tests should initially run with fully charged batteries, after a cooling off period.

Introductions Next or Go To Start


Inroduction

The results of the programs used are not intended to be used to produce an artificial single number rating. The aim is to identify a wide range of performance attributes that indicate strengths and weaknesses and ongoing hardware and software compatibility.

In 2018, I published android benchmarks.htm with background and details of the small change required for my benchmarks to run under Android 8, with appropriate references and links to earlier programs and results. Later, I repeated the tests covered in Android 9 benchmarks.htm In 2021 the programs were run again and reported in Android 10 and 11 Benchmarks and ARM bigLITTLE Architecture Issues.htm.

The documents, from the first two of the above links, provide the options to independently download and install all the programs used, and also include detailed descriptions, not provided here.

This 2023 report covers the latest releases of Android available at the time. These can complicate on-line installation and the easiest way was to download the files to a PC and copy them onto an appropriate SD card or into the device’s memory.

With the original benchmarks, the only way I could find to report computer readable results, in the standard monospaced format, was Email via the Save button. There, I was the default receiver but this could be changed. Now various options are offered with the same procedure being on selecting the Gmail icon. My preference is the Google Drive option, allowing me to access the files on my PCs.

The programs provide the following range of activities, the actual testing functions being mainly produced using the same C code as my Windows, Linux and Raspberry Pi benchmarks.

CPU Benchmarks - The first set are the Classic Benchmarks that were the original 1970s to 1980s programs that set standards of performance for computers, comprising Whetstone, Dhrystone, Linpack and Livermore Loops.

Memory Benchmarks - Next are programs that measure performance with data from caches and RAM. MemSpeed (including NeonSpeed variant), BusSpeed and RandMem all use the same range of data sizes between 4 KB and 64 MB. Then there is a Fast Fourier Transform benchmark with multiple data sizes.

MultiThreading Benchmarks - These all measure performance using 1, 2, 4 and 8 threads. The first are MP-Whetstone, MP-Dhrystone and MP-Linpack. The next batch use memory sized 12.8 KB, 128 KB and mainly 12.8 MB, comprising MP-MFLOPS (including NEON-MFLOPS MP), MP-BusSpeed and MP-RandMem.

Java Benchmarks - These comprise Java versions of the Whetstone and Linpack benchmarks, a graphics one using drawing functions and another using OpenGL.

DriveSpeed Benchmarks - For measuring main drive speeds.

CPU Stress Testing Programs - These have variable parameters to run MP benchmarks for extended periods, for identifying overheating and discharging battery performance issues.

Run Time Procedures - Initially, as usual, the benchmarks were run on power, to avoid slow performance induced at discharged battery levels. Then, single core programs are expected to run on the fastest processor core in a big.LITTLE arrangement.

Configurations next or Go To Start


Configurations

Many ARM processors have options for different sizes of L1, L2 and L3 caches and whether shared by multiple processor cores. It is often difficult to discover the sizes in a particular device. However, memory benchmark results can provide an indication.

 CPUID From Benchmarks                     From CPU-Z or Searches

 System 1 Android 11                       Lenovo Tab P11 Plus
 Screen pixels w x h 1200 x 1928           SOC MediaTek Helio G90 12nm
 Android Build Version      11             2x 2.05 GHz ARM Cortex-A76 and 6x 2.0 GHz ARM Cortex-A55
                                           Has L3 cache
                                           GPU Mali-G76 MC4 720 MHz
processor	: 5
BogoMIPS	: 26.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0xd05
CPU revision	: 0

processor	: 6
BogoMIPS	: 26.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0b
CPU revision	: 0

 System 2 Android 12
 Device Motorola moto g(50)                SOC Snapdragon 750 8 nm, claim based on ARM Cortex-A76 and ARM Cortex-A55
 Screen pixels w x h 720 x 1339            CPUs - 2 x 2.0 GHz Kryo 480 and 6 x 1.8 GHz Kryo 460 
 Android Build Version      12             Both Kryo caches L1 64 KB, L2 512 KB, L3 2 MB shared
                                           GPU Adreno 619 450 MH

1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x51
CPU architecture: 8
CPU variant	: 0xd
CPU part	: 0x805
CPU revision	: 14

processor	: 5
BogoMIPS	: 38.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x51
CPU architecture: 8
CPU variant	: 0xd
CPU part	: 0x805
CPU revision	: 14

processor	: 7
BogoMIPS	: 38.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x51
CPU architecture: 8
CPU variant	: 0x8
CPU part	: 0x804
CPU revision	: 14


 System 3 Android 13
 Device Samsung SM-X200                    SOC  Tiger T618 12 nm
 Screen pixels w x h 1920 x 1128           2x 2.0 GHz Cortex-A75 & 6x 2.0 GHz Cortex-A55 
 Android Build Version      13             Caches L1 16 KB, L2 256 KB, L3 ?
                                           GPU  Mali G52 MP2 850 MHz

Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0xd05
CPU revision	: 0

processor	: 6
BogoMIPS	: 52.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop as+imddp

CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0a
CPU revision	: 1
  
Continued Below


Armv9 CPU Phone Configuration

This new phone’s CPU is based on Arm®v9.0-A architecture. As can be seen here, the program functions used identify a completely different variety of features and limited information about the technology used. CPU-Z provided limited information and numerous searches did not help in finding more

 CPUID From Benchmarks                  From CPU-Z or Searches

 System 4 Android 13                    Samsung S22
 Device Samsung SM-S901B                1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710
 Screen pixels w x h 1080 x 2009        SOC Exynos 2200 4nm
                                        Caches L1 64 KB, L2 between 512 & 1024 KB, L3 between 512 KB and 8 MB
                                        GPU Xclipse 920

 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc 
 dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes 
 svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti

CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd48
CPU revision : 0

CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 6
BogoMIPS : 51.20
  

Maximum CPU Speed Summary

System 1 - 2 x Cortex-A76 at 2.05 GHz, 6 x Cortex-A55  at 2.00 GHz
System 2 - 2 x Cortex-A76 at 2.00 GHz, 6 x Cortex-A55  at 1.80 GHz
System 3 - 2 x Cortex-A75 at 2.00 GHz, 6 x Cortex-A55  at 2.00 GHz
System 4 - 1 x Cortex-X2  at 2.80 GHz, 3 x Cortex-A710 at 2.52 GHz, 4 x Cortex-A510 at 1.82 GHz
The following single threaded CPU benchmarks are expected to run on the fastest CPU core. The same applying to the MP multithreading programs running using a single thread.

Whetstone Benchmark below or Go To Start


Whetstone Benchmark - NativeWhetstone2.apk

This benchmark carries out both single precision floating point and integer calculations, the overall MWIPS rating being mainly dependent on the former. Systems 1 and 2 are shown to provide the same performance characteristics, with the former slightly faster as expected with the increase in CPU MHz. System 3 was somewhat slower, due to the older CPU technology. All produced the same the same numeric results. For System 4, the simple calculations used are completely unsuitable for vector processing, relative performance mainly being proportional to CPU MHz, in the case shown 2.8 GHz versus 2.0. The overall gain of 1.5 being influenced by tests using such as COS and EXP functions.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

 ARM/Intel Native Whetstone Benchmark 4A8 04-Feb-2023 13.11
           Compiled for 64 bit ARM v8a

 Test        MFLOPS    MOPS   millisecs    Results

 N1 float   1087.84              0.018   -1.124750137
 N2 float    846.07              0.159   -1.131330490
 N3 if               3066.65     0.034    1.000000000
 N4 fixpt            5109.38     0.062   12.000000000
 N5 cos               147.35     0.565    0.499109805
 N6 float    816.02              0.661    0.999999821
 N7 equal            2043.99     0.090    3.000000000
 N8 exp                76.12     0.489    0.935364604

 MWIPS      4815.37              2.077

 Total Elapsed Time   18.3 seconds

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel Native Whetstone Benchmark 4A8 05-Feb-2023 10.04
 
 Test        MFLOPS    MOPS   millisecs    Results

 N1 float   1068.88              0.018   -1.124750137
 N2 float    886.76              0.152   -1.131330490
 N3 if               2991.53     0.035    1.000000000
 N4 fixpt            5013.41     0.063   12.000000000
 N5 cos               141.39     0.588    0.499109805
 N6 float    801.74              0.673    0.999999821
 N7 equal            2004.78     0.092    3.000000000
 N8 exp                70.97     0.524    0.935364604

 MWIPS      4663.10              2.144

 Total Elapsed Time   16.2 seconds

 System 3 Android 13 2.0 GHz ARM Cortex-A75

 ARM/Intel Native Whetstone Benchmark 4A8 04-Feb-2023 15.31

 Test        MFLOPS    MOPS   millisecs    Results

 N1 float    819.73              0.023   -1.124750137
 N2 float    665.33              0.202   -1.131330490
 N3 if               2997.37     0.035    1.000000000
 N4 fixpt            3331.87     0.095   12.000000000
 N5 cos               130.91     0.636    0.499109805
 N6 float    666.54              0.809    0.999999821
 N7 equal            1332.93     0.139    3.000000000
 N8 exp                63.31     0.588    0.935364604

 MWIPS      3959.52              2.526

 Total Elapsed Time   15.6 seconds

 System 4 Android 13 1x 2.80 GHz Cortex-X2

 ARM/Intel Native Whetstone Benchmark 4A8 20-Apr-2023 20.18

 Test        MFLOPS    MOPS   millisecs    Results   System 4/System 2

 N1 float   1491.65              0.013   -1.124750137     1.40
 N2 float   1231.55              0.109   -1.131330490     1.39
 N3 if               3598.79     0.029    1.000000000     1.20
 N4 fixpt            6992.04     0.045   12.000000000     1.39
 N5 cos               246.11     0.338    0.499109805     1.74
 N6 float   1118.73              0.482    0.999999821     1.40
 N7 equal            2796.29     0.066    3.000000000     1.39
 N8 exp               106.54     0.349    0.935364604     1.50

 MWIPS      6986.66              1.431                    1.50

 Total Elapsed Time   16.4 seconds 
  
Dhrystone Benchmark below or Go To Start


Dhrystone Benchmark - Dhrystone2i.apk

The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). Results from two runs are provided, for the first 3, to demonstrate variance in measured MIPS speeds. These are generally in line with performance expectations. But a single run can provide false impressions. The program checks for correct numeric results.

With this benchmark often being used to identify performance of ARM CPUs, they may have added more hardware tweaks to increase the rating to 12 MIPS per MHz on System 4, twice as high as the other bunch shown here. The program does not appear to be suitable for vector operation. In 2015 it used to be around 2 MIPS/MHz with my 64 bit program and 4 on high end Intel CPUs.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

 ARM/Intel Dhrystone 2 Benchmark 4A8 05-Feb-2023 14.25

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          40 
 Dhrystones per Second            24826887 
 VAX MIPS rating                     14130 

 ARM/Intel Dhrystone 2 Benchmark 4A8 05-Feb-2023 16.32

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          40 
 Dhrystones per Second            24821062 
 VAX MIPS rating                     14127 

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel Dhrystone 2 Benchmark 4A8 05-Feb-2023 14.35

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          40 
 Dhrystones per Second            24750676 
 VAX MIPS rating                     14087 

 ARM/Intel Dhrystone 2 Benchmark 4A8 05-Feb-2023 16.38

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          40 
 Dhrystones per Second            24841761 
 VAX MIPS rating                     14139 

 System 3 Android 13 2.0 GHz ARM Cortex-A75

 ARM/Intel Dhrystone 2 Benchmark 4A8 05-Feb-2023 14.20

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          47 
 Dhrystones per Second            21287928 
 VAX MIPS rating                     12116 

 ARM/Intel Dhrystone 2 Benchmark 4A8 05-Feb-2023 16.22

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          47 
 Dhrystones per Second            21373535 
 VAX MIPS rating                     12165 

 System 4 Android 13 1x 2.80 GHz Cortex-X2

ARM/Intel Dhrystone 2 Benchmark 4A8 20-Apr-2023 20.26

           Compiled for 64 bit ARM v8a

Nanoseconds one Dhrystone run          17 System 4/System 2
Dhrystones per Second            59677446 
VAX MIPS rating                     33966         2.40
   

Linpack Benchmark below or Go To Start


Linpack Tests - LinpackDP2.apk, LinpackSP2.apk, NEON-Linpacki.apk

The Linpack benchmark speed is measured in MFLOPS. Three versions are provided, the original using double precision floating point calculations, then one with single precision arithmetic, with the third via NEON SIMD single precision intrinsic functions. Results for this benchmark code should not be compared with those from High Performance Linpack (HPL) benchmark. Again the first two systems produced similar performance, with the third much slower. Single precision calculations were somewhat faster than those using double precision, producing different numeric sumchecks, yet consistent across all platforms. NEON functions lead to at least a doubling of measured MFLOPS with the same single precision sumchecks.

System 4 - This is the first indication of possible heating issues, when running in in the preferred power on mode. Then the third test appeared to be slower than expected. Note that a number of other benchmarks were run between the last two tests, also indicating slow performance. This benchmarks can be compiled to use vector processing but limited to two floating point operations per word, similar to MemSpeed and part of MFLOPS benchmarks.

The later System 4 gains over System 2 were all greater than twice with the NEON test achieving nearly 9.5 GFLOPS or 3.38 MFLOPS per MHz.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

  ARM/Intel DP Linpack Benchmark      ARM/Intel SP Linpack Benchmark      ARM NEON Linpack Benchmark
       4A8 06-Feb-2023 12.19               4A8 06-Feb-2023 12.20             4A8  06-Feb-2023 13.38
   Compiled for 64 bit ARM v8a         Compiled for 64 bit ARM v8a         Compiled for 64 bit ARM v8a

 Speed            2047.81 MFLOPS     Speed            2186.84 MFLOPS     Speed            4705.52 MFLOPS
 norm. resid                 1.7     norm. resid                 1.6     norm. resid                 1.6
 resid            7.41628980e-14     resid            3.80277634e-05     resid            3.80277634e-05
 machep           2.22044605e-16     machep           1.19209290e-07     machep           1.19209290e-07
 x[0]-1          -1.49880108e-14     x[0]-1          -1.38282776e-05     x[0]-1          -1.38282776e-05
 x[n-1]-1        -1.89848137e-14     x[n-1]-1        -7.51018524e-06     x[n-1]-1        -7.51018524e-06

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

  ARM/Intel DP Linpack Benchmark      ARM/Intel SP Linpack Benchmark      ARM NEON Linpack Benchmark
       4A8 06-Feb-2023 14.59               4A8 06-Feb-2023 15.11               4A8 06-Feb-2023 15.13
   Compiled for 64 bit ARM v8a         Compiled for 64 bit ARM v8a         Compiled for 64 bit ARM v8a

 Speed            2027.77 MFLOPS     Speed            2150.02 MFLOPS     Speed            4614.88 MFLOPS
 norm. resid                 1.7     norm. resid                 1.6     norm. resid                 1.6
 resid            7.41628980e-14     resid            3.80277634e-05     resid            3.80277634e-05
 machep           2.22044605e-16     machep           1.19209290e-07     machep           1.19209290e-07
 x[0]-1          -1.49880108e-14     x[0]-1          -1.38282776e-05     x[0]-1          -1.38282776e-05
 x[n-1]-1        -1.89848137e-14     x[n-1]-1        -7.51018524e-06     x[n-1]-1        -7.51018524e-06

 System 3 Android 13 2.0 GHz ARM Cortex-A75

  ARM/Intel DP Linpack Benchmark      ARM/Intel SP Linpack Benchmark       ARM NEON Linpack Benchmark
      4A8 06-Feb-2023 15.44                4A8 06-Feb-2023 15.45               4A8 06-Feb-2023 15.47
   Compiled for 64 bit ARM v8a         Compiled for 64 bit ARM v8a         Compiled for 64 bit ARM v8a

 Speed            1474.16 MFLOPS     Speed            1664.41 MFLOPS     Speed            3294.97 MFLOPS
 norm. resid                 1.7     norm. resid                 1.6     norm. resid                 1.6
 resid            7.41628980e-14     resid            3.80277634e-05     resid            3.80277634e-05
 machep           2.22044605e-16     machep           1.19209290e-07     machep           1.19209290e-07
 x[0]-1          -1.49880108e-14     x[0]-1          -1.38282776e-05     x[0]-1          -1.38282776e-05
 x[n-1]-1        -1.89848137e-14     x[n-1]-1        -7.51018524e-06     x[n-1]-1        -7.51018524e-06

 System 4 Android 13 1x 2.80 GHz Cortex-X2 Power then Battery

 ARM/Intel DP Linpack Benchmark      ARM/Intel SP Linpack Benchmark        ARM NEON Linpack Benchmark 
      4A8 20-Apr-2023 20.28               4A8 20-Apr-2023 20.30                4A8 20-Apr-2023 20.45        ##
   Compiled for 64 bit ARM v8a        Compiled for 64 bit ARM v8a          Compiled for 64 bit ARM v8a
		
Speed            4834.32 MFLOPS      Speed            4965.85 MFLOPS     Speed            6246.93 MFLOPS
norm. resid                 1.7      norm. resid                 1.6     norm. resid                 1.6
resid            7.41628980e-14      resid            3.80277634e-05     resid            3.80277634e-05
machep           2.22044605e-16      machep           1.19209290e-07     machep           1.19209290e-07
x[0]-1          -1.49880108e-14      x[0]-1          -1.38282776e-05     x[0]-1          -1.38282776e-05
x[n-1]-1        -1.89848137e-14	     x[n-1]-1        -7.51018524e-06     x[n-1]-1        -7.51018524e-06
                                                                                   After Memory Benchmarks  ##
System 4/System 2 MFLOPS   2.38                                 2.36                           SLOW 1.35

2 ARM/Intel DP Linpack Benchmark      ARM/Intel SP Linpack Benchmark       ARM NEON Linpack Benchmark 
      4A8 23-Apr-2023 14.23               4A8 23-Apr-2023 14.21                 4A8 23-Apr-2023 14.19
   Compiled for 64 bit ARM v8a        Compiled for 64 bit ARM v8a           Compiled for 64 bit ARM v8a
		
Speed            4826.04 MFLOPS      Speed            5083.03 MFLOPS     Speed            9466.57 MFLOPS
norm. resid                 1.7      norm. resid                 1.6     norm. resid                 1.6
resid            7.41628980e-14      resid            3.80277634e-05     resid            3.80277634e-05
machep           2.22044605e-16      machep           1.19209290e-07     machep           1.19209290e-07
x[0]-1          -1.49880108e-14      x[0]-1          -1.38282776e-05     x[0]-1          -1.38282776e-05
x[n-1]-1        -1.89848137e-14      x[n-1]-1        -7.51018524e-06     x[n-1]-1        -7.51018524e-06

System 4/System 2 MFLOPS   2.38                                 2.36                                2.05
  

Livermore Loops Benchmark below or Go To Start


Livermore Loops Benchmark - LivermoreLoops2.apk

The Livermore Loops comprise 24 kernels of numerical applications with speeds calculated in MFLOPS (double precision). A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. They are repeated three times at different array dimension spans.

Below are MFLOPS scores for the 24 kernels, at one data span, and overall ratings of Maximum, Average, Geometric mean, Harmonic mean and Minimum MFLOPS. Again, System 1 slightly faster CPU MHz gave a lead over System 2, with System 3 far behind. Results are also provided using System 3 for a second power on run and on battery at 45% charge, all indicating the same performance.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

 ARM/Intel Livermore Loops Benchmark 4A8 06-Feb-2023 12.22
           Compiled for 64 bit ARM v8a

  MFLOPS for 24 loops Do Span 471
  2603.8  1889.6  1644.0  1670.3   790.6  1433.2
  2606.3  3006.5  2780.7  1905.8   941.0  2110.0
   524.5   756.1  1414.9  1560.5  1533.0  2645.4
   715.3  1930.0  1766.2  1300.3  1554.1   672.2

 Overall Weighted MFLOPS Do Spans 471, 90, 19
 Maximum Average Geomean Harmean Minimum
  3007.5  1651.3  1495.8  1335.2   524.5

 Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

 Total Elapsed Time    8.8 seconds

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel Livermore Loops Benchmark 4A8 06-Feb-2023 15.15
           Compiled for 64 bit ARM v8a

  MFLOPS for 24 loops Do Span 471
  2558.0  1853.8  1592.7  1636.9   774.7  1402.6
  2553.6  2942.2  2730.5  1869.7   968.9  2086.2
   516.0   745.9  1362.3  1525.4  1508.5  2594.3
   700.3  1894.8  1736.3  1221.3  1521.9   658.0

 Overall Weighted MFLOPS Do Spans 471, 90, 19
 Maximum Average Geomean Harmean Minimum
  2942.2  1619.1  1466.2  1308.7   516.0

 Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

 Total Elapsed Time    8.8 seconds

 System 3 Android 13 2.0 GHz ARM Cortex-A75

 ARM/Intel Livermore Loops Benchmark 4A8 06-Feb-2023 15.48
           Compiled for 64 bit ARM v8a

  MFLOPS for 24 loops Do Span 471
  2138.1  1346.2  1329.3  1308.0   668.8   929.1
  2183.1  2718.9  2443.1  1380.8   667.8  1375.9
   410.7   534.2   961.6  1003.3  1241.0  1755.8
   429.5  1328.1  1256.7   958.1  1234.5   440.7

 Overall Weighted MFLOPS Do Spans 471, 90, 19
 Maximum Average Geomean Harmean Minimum
  2718.9  1258.8  1111.2   964.9   371.3

 Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

 Total Elapsed Time    9.0 seconds

 System 3 Rerun

  2137.7  1344.8  1329.4  1307.4   668.3   934.9
  2182.5  2719.7  2443.9  1379.3   668.5  1376.4
   412.5   533.2   961.2  1012.2  1241.8  1755.9
   429.6  1328.9  1255.6   958.0  1234.6   440.7

 System 3 Battery 45%

  2137.8  1338.8  1329.3  1307.8   668.6   920.2
  2181.5  2717.3  2443.8  1380.2   668.5  1380.1
   413.1   535.0   961.2  1010.0  1235.2  1756.1
   429.7  1328.5  1256.2   957.9  1233.8   440.7
  

Continued Below


Livermore Loops Armv9 CPU Phone

This benchmark was first run before temperature increases lead to noticeable performance deterioration. Then, maximum speed of an individual loop was nearly 7 GFLOPS, 2.33 times faster than the older System 2 but with maximum gain of 3.63 times on another loop. Rerunning the benchmark, on battery (cooler), indicated slightly faster performance.

 System 4 Android 13 1x 2.80 GHz Cortex-X2

Test 1 On Power

ARM/Intel Livermore Loops Benchmark 4A8 20-Apr-2023 20.32
           Compiled for 64 bit ARM v8a

  MFLOPS for 24 loops Do Span 471
  6669.7  4873.3  2659.1  3066.6  1131.5  2339.6
  6444.7  6866.3  6740.1  4898.4  1372.2  6161.1
  1871.2  1695.0  3828.5  3432.4  2452.0  6094.3
   927.5  2690.4  2831.2  3429.4  2301.9  1363.1

Overall Weighted MFLOPS Do Spans 471, 90, 19
Maximum Average Geomean Harmean Minimum
  6867.1  3634.5  3136.9  2652.1   927.5

Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

Total Elapsed Time    9.6 seconds

Test 2 On Battery

ARM/Intel Livermore Loops Benchmark 4A8 30-Apr-2023 13.40
           Compiled for 64 bit ARM v8a

  MFLOPS for 24 loops Do Span 471
  6827.0  4835.3  2747.5  3172.3  1136.1  2343.6
  6520.9  6984.3  6718.6  4888.5  1375.9  6192.2
  1928.1  1750.2  3963.8  3588.4  2550.2  6333.6
   962.4  2699.2  2932.5  3547.5  2304.3  1361.5

Overall Weighted MFLOPS Do Spans 471, 90, 19
Maximum Average Geomean Harmean Minimum
  7032.9  3662.7  3158.9  2667.6   929.0

Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

Total Elapsed Time    9.3 seconds


Test 1/System 2

  MFLOPS for 24 loops Do Span 471
    2.61    2.63    1.67    1.87    1.46    1.67
    2.52    2.33    2.47    2.62    1.42    2.95
    3.63    2.27    2.81    2.25    1.63    2.35
    1.32    1.42    1.63    2.81    1.51    2.07

  Maximum Average Geomean Harmean Minimum
    2.33    2.24    2.14    2.03    1.80

Test 2/System 2

  MFLOPS for 24 loops Do Span 471  

    2.67    2.61    1.73    1.94    1.47    1.67
    2.55    2.37    2.46    2.61    1.42    2.97
    3.74    2.35    2.91    2.35    1.69    2.44
    1.37    1.42    1.69    2.90    1.51    2.07

  Maximum Average Geomean Harmean Minimum

    2.39    2.26    2.15    2.04    1.80
  
MemSpeed next or Go To Start


MemSpeed Benchmark - MemSpeedi.apk

This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision (DP and SP) floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing DP MB/second by 8 and 16, for the two tests, and SP speeds by 4 and 8.

The results clearly demonstrate differences in such as CPU, RAM and cache speeds, floating point double and single precision floating point performance and cache sizes, indicating the invalidity of an overall single number rating.

With calculated single precision MFLOPS greater than MHz or double precision half that rate, the use of SIMD instructions being executed are indicated. For some reason, the older technology Cortex A-75 was best on L1 cache based double precision MFLOPS.

This and later benchmarks demonstrate that System 3 RAM speeds are much slower than those for the other two.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

 ARM/Intel MemSpeed Benchmark 4A8 07-Feb-2023 10.21
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16  14368  12749  13579  25806  13430  13114 L1
      32  14377  12612  13629  25300  13078  12931
      64  14315  12442  13534  26042  12740  12967
     128  13677  12190  13147  21466  12434  12616 L2
     256  13537  12097  13036  21231  12311  12491
     512  13432  12018  12831  20618  12261  12454
    1024  13230  11924  12791  18379  12173  12401 L3
    4096  11013  10328  10937  10390  10612  10386 
   16384   9371   9342   9406   8997   9282   9084 RAM
   65536   8799   8846   8878   8636   8801   8665
Max MFLOPS 1797   3187    

          Total Elapsed Time   12.2 seconds

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel MemSpeed Benchmark 4A8 07-Feb-2023 10.26
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16  14059  12474  13286  26090  13109  12806 L1
      32  14045  12320  13326  26087  13023  12843
      64  14061  12187  13323  25871  12544  12729
     128  13455  11979  12990  21318  12189  12418 L2
     256  13100  11827  12715  20903  12119  12290
     512  13309  11892  12791  21008  12129  12291
    1024  13295  11932  12788  21078  11992  12281 L3 2 MB
    4096   9419   9354   9522   8907   9251   6848 RAM
   16384   7912   7797   7883   6614   7549   7320
   65536   7722   7788   7530   7333   7467   7255
Max MFLOPS 1757   3119
  
        Total Elapsed Time   11.8 seconds


 System 3 Android 13  2.0 GHz ARM Cortex-A75

 ARM/Intel MemSpeed Benchmark 4A8 07-Feb-2023 21.49
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16  19342  12941  14154  18768  10836  10799 L1
      32  19432  12942  14187  18798  10784  10970
      64  19430  12940  14184  18651  10803  10971
     128   9987   9084   9830  10006   9040   9114 L2
     256  10341   9551  10274  10461  10125  10120
     512  10239   9563  10283  10398  10030  10021
    1024   9249   8657   9109   9267   8923   8959 L3
    4096   4942   4881   4926   4879   4917   4888 RAM
   16384   4577   4511   4565   4522   4532   4542
   65536   4408   4509   4523   4527   4512   4510
Max MFLOPS 2429   3236

          Total Elapsed Time   10.1 seconds
  

Continued Below


MemSpeed Armv9 CPU Phone

This was the first benchmark, run on 20th April, where some performance comparisons were lower those based on CPU MHz difference, the penalty imposed of continuously running using the power connection. Later, running on battery, performance gains were between 1.49 and 4.22, the larger L3 cache being responsible for the latter. This time a repeat of Test 2 obtained the same perfomance on power (details not shown here).

 System 4 Android 13 1x 2.80 GHz Cortex-X2

Test 1 On Power

ARM/Intel MemSpeed Benchmark 4A8 20-Apr-2023 20.40
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16  18273  16318  13593  34975  21486  21577 L1
      32  15278  13607  13606  34968  21565  21690
      64  15230  13584  13562  34953  21214  21543
     128  15301  13604  13578  34717  21359  21555 L2
     256  15244  13599  13599  34859  21152  21389
     512  15311  13611  13610  34911  21257  21269
    1024  15236  13590  13529  34630  21168  21299
    4096  15269  13588  13570  34599  21601  21495 L3
   16384  15075  13472  13449  21727  18962  19053 RAM
   65536  13210  13468  13460  18029  16851  14148
Max MFLOPS 2284   4080

          Total Elapsed Time   11.3 seconds

Test 2 On Battery

ARM/Intel MemSpeed Benchmark 4A8 23-Apr-2023 13.52
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16  22292  19857  19860  51064  31512  31522 L1
      32  22342  19872  19842  51115  31999  32111
      64  22229  19706  19782  51115  31400  31663
     128  22300  19864  19858  50730  31237  31454 L2
     256  22298  19875  19844  50906  31585  31959
     512  22265  19873  19859  50290  30853  31149
    1024  22346  19865  19872  49249  29985  30510
    4096  21319  18952  19300  43691  28347  28916 L3
   16384  19239  17066  15105  19805  19700  20244 RAM
   65536  16165  15122  15114  17565  17043  17009
Max MFLOPS 2793   4968

          Total Elapsed Time   10.4 seconds

Test 1/System 2

  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16   1.30   1.31   1.02   1.34   1.64   1.68
      32   1.09   1.10   1.02   1.34   1.66   1.69
      64   1.08   1.11   1.02   1.35   1.69   1.69
     128   1.14   1.14   1.05   1.63   1.75   1.74
     256   1.16   1.15   1.07   1.67   1.75   1.74
     512   1.15   1.14   1.06   1.66   1.75   1.73
    1024   1.15   1.14   1.06   1.64   1.77   1.73
    4096   1.62   1.45   1.43   3.88   2.33   3.14
   16384   1.91   1.73   1.71   3.29   2.51   2.60
   65536   1.71   1.73   1.79   2.46   2.26   1.95

Test 2/System 2

  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16   1.59   1.59   1.49   1.96   2.40   2.46
      32   1.59   1.61   1.49   1.96   2.46   2.50
      64   1.58   1.62   1.48   1.98   2.50   2.49
     128   1.66   1.66   1.53   2.38   2.56   2.53
     256   1.70   1.68   1.56   2.44   2.61   2.60
     512   1.67   1.67   1.55   2.39   2.54   2.53
    1024   1.68   1.66   1.55   2.34   2.50   2.48
    4096   2.26   2.03   2.03   4.91   3.06   4.22 L3 vs RAM
   16384   2.43   2.19   1.92   2.99   2.61   2.77 
   65536   2.09   1.94   2.01   2.40   2.28   2.34
  
NeonSpd Benchmark next or Go To Start



NeonSpeed Benchmark - NeonSpeedi.apk

This benchmark carries out the same calculations as the MemSpeed Benchmark, except they are all in single precision, as applicable with the NEON calculations. The latter are carried out using NEON intrinsic functions. Using these SIMD instructions, four results per clock cycle are possible or 8 GFLOPS at 2 GHz, rising to 16 GFLOPS with fused multiply and add instructions, as possible with the first two columns. Here we have a maximum of nearly 10 GFLOPS. But more than 12 GFLOPS are demonstrated later under the MP-MFLOPS Benchmark, with compiled code using a single CPU core.

NEON integer operations per second were slightly higher than those for floating point, where integer instructions per second would be somewhat higher, due to the inclusion of load, store and branching instructions.

With NEON operation, the much slower performance of System 3 older processor is clearly shown.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

 ARM NeonSpeed Benchmark 4A8 08-Feb-2023 10.50
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16  13068  39594  13739  43318  54907  54817 L1
      32  13074  39493  13764  43255  46180  45660
      64  13065  39273  13749  43106  45044  43823
     128  12888  28829  13632  29341  29244  29271 L2
     256  12647  26631  13425  26850  26852  26837
     512  12629  22447  13434  22401  22417  22393
    1024  12465  18418  13194  18358  18375  18341 L3
    4096  11104  10324  11518  10239   9853  10056
   16384   9022   8691   9324   8638   8589   8648 RAM
   65536   8898   8365   8936   8322   8374   8312
Max MFLOPS 3269   9899
  
          Total Elapsed Time   11.0 seconds

 System 2 Android 12  2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM NeonSpeed Benchmark 4A8 08-Feb-2023 11.26
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16  12829  38832  13490  42520  53871  53927 L1
      32  12827  38786  13499  42635  53916  53880
      64  12804  38518  13479  42122  43667  43600
     128  12599  28491  13330  28704  28805  28773 L2
     256  12488  27960  13172  28234  28509  28465
     512  12547  27304  13238  27373  27753  27759
    1024  12499  23922  13222  24250  24376  25347 L3
    4096   9494   8896  10109   9242   9403   9242 RAM
   16384   7968   7476   8194   7719   7735   7642
   65536   7892   7274   7914   6716   7229   7226
Max MFLOPS 3207   9708

          Total Elapsed Time   10.6 seconds

 System 3 Android 13  2.0 GHz ARM Cortex-A75

 ARM NeonSpeed Benchmark 4A8 08-Feb-2023 12.22
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16  12933  21026  14176  21588  20680  20761 L1
      32  12685  20668  13506  21296  20824  20824
      64  12540  20612  13405  21227  20822  20844
     128   9358  10086  10182  10055  10007  10016 L2
     256   9843  10438  10550  10388  10379  10383
     512   9827  10359  10414  10335  10270  10324
    1024   8380   8886   8706   8902   8986   9011 L3
    4096   4467   4561   4363   4576   4591   4596 RAM
   16384   4656   4736   4674   4613   4741   4759
   65536   4387   4601   4514   4588   4588   4588
Max MFLOPS 3233   5257

          Total Elapsed Time   10.3 seconds
   

Continued Below


NeonSpeed Armv9 CPU Phone

The first on power test was run after MemSpeed and other memory benchmarks, with some results slower than from the older System 2, using integers. Rerunning on battery provided gains similar to MemSpeed, with integer performance gains around MHz comparison ratio but L3 floating point cache improvements of up to 6.31 times. Comparing these two results, as shown below, performance was reduced by up to 32% during the hotter On Power run (Battery 47% faster).

 System 4 Android 13 1x 2.80 GHz Cortex-X2

Test 1 On Power

 ARM NeonSpeed Benchmark 4A8 20-Apr-2023 20.47
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16  16313  85505  13260  72262  73469  73365 L1
      32  13567  71006  12891  72240  73527  73528
      64  13591  61035  12889  65553  62412  60633
     128  13599  45930  12889  45743  45572  45718 L2
     256  13606  46165  12891  46201  46187  46215
     512  13595  45389  12878  45385  45550  45544
    1024  13603  45930  12886  45922  45797  45865
    4096  13595  38351  12878  38425  38827  38993 L3
   16384  13482  22725  12767  22666  22942  22846 RAM
   65536  13367  15431  12790  17360  18269  18185
Max MFLOPS 4078  21376   

          Total Elapsed Time   10.3 seconds

Test 2 on Battery

 ARM NeonSpeed Benchmark 4A8 23-Apr-2023 13.55
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16  19862 102403  18573 102534 103684 103639 L1
      32  19381 100863  18167 101897 103666 103409
      64  19051  85761  18163  91701  85459  88190
     128  19187  64767  18183  64770  64783  64820 L2
     256  19199  64334  18184  65047  65140  65178
     512  19185  63656  18192  64717  65401  65100
    1024  19181  62057  18172  63202  62816  62338
    4096  19153  56099  18067  56160  56082  55613 L3
   16384  17795  24262  16849  24127  24352  23700 RAM
   65536  15837  18834  15683  18968  19080  19083
Max MFLOPS 4966  25601 

          Total Elapsed Time   10.4 seconds

Test 1/System 2

  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16   1.27   2.20   0.98   1.70   1.36   1.36
      32   1.06   1.83   0.95   1.69   1.36   1.36
      64   1.06   1.58   0.96   1.56   1.43   1.39
     128   1.08   1.61   0.97   1.59   1.58   1.59
     256   1.09   1.65   0.98   1.64   1.62   1.62
     512   1.08   1.66   0.97   1.66   1.64   1.64
    1024   1.09   1.92   0.97   1.89   1.88   1.81
    4096   1.43   4.31   1.27   4.16   4.13   4.22
   16384   1.69   3.04   1.56   2.94   2.97   2.99
   65536   1.69   2.12   1.62   2.58   2.53   2.52

Test2/System 2

  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16   1.55   2.64   1.38   2.41   1.92   1.92
      32   1.51   2.60   1.35   2.39   1.92   1.92
      64   1.49   2.23   1.35   2.18   1.96   2.02
     128   1.52   2.27   1.36   2.26   2.25   2.25
     256   1.54   2.30   1.38   2.30   2.28   2.29
     512   1.53   2.33   1.37   2.36   2.36   2.35
    1024   1.53   2.59   1.37   2.61   2.58   2.46
    4096   2.02   6.31   1.79   6.08   5.96   6.02 L3 vs RAM
   16384   2.23   3.25   2.06   3.13   3.15   3.10
   65536   2.01   2.59   1.98   2.82   2.64   2.64

Battery/Power Best Case

     512   1.41   1.40   1.41   1.43   1.44   1.43
    1024   1.41   1.35   1.41   1.38   1.37   1.36
    4096   1.41   1.46   1.40   1.46   1.44   1.43
  
BusSpeed Benchmark next or Go To Start


BusSpeed Benchmark - BusSpeedv7i.apk

This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum bus speed can be estimated by multiplying the Int16 value by 16. Then, for each half reduction in increments, a near doubling of MB/second could be expected. Burst reading is also indicated on some cache based data transfers.

The near constant Read All performance indicates CPU speed limitation, influenced by calculations involved, where RAM Inc 2 to Read All data transfer speeds do not approach doubling on systems 1 and 2. This effect also disguises System 3’s slower RAM.

See MP-BusSpeed results, indicating that access by multiple cores is necessary to obtain maximum memory throughput, where adequate CPU performance is provided.
 
 System 1 Android 11  2.05 GHz ARM Cortex-A76

 ARM/Intel BusSpeed Benchmark 4A8 08-Feb-2023 10.52
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All
      16   3887   5358   7637   8100   8113   8111 L1
      32   7697   7796   7836   8102   8103   8111
      64   6288   6426   7983   8114   8118   8111
     128   2017   3596   6107   8099   8104   8108 L2
     256   1646   2526   4675   7276   8065   8094
     512    863   1304   2723   5462   8104   8101
    1024    791   1128   2277   4449   7705   7907 L3
    4096    608    996   1965   3548   7123   7894
   16384    558    886   1791   3198   6659   7945 RAM
   65536    548    873   1768   3199   6494   7957

          Total Elapsed Time    5.0 seconds

 System 2 Android 12  2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel BusSpeed Benchmark 4A8 08-Feb-2023 11.31
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All
      16   6809   6976   7643   7939   7952   7942 L1
      32   7561   7650   7685   7951   7958   7952
      64   6197   6285   7820   7959   7964   7946
     128   1977   3555   5903   7894   7925   7938 L2
     256   1526   2513   4872   7650   7913   7945
     512   1022   1838   3661   7276   5696   6919
    1024    910   1560   3071   5808   7796   6611 L3
    4096    648    992   2132   4132   7393   7440 RAM
   16384    586    877   1792   3650   6820   7898
   65536    570    857   1763   3501   6647   7896

          Total Elapsed Time    5.2 seconds

 System 3 Android 13  2.0 GHz ARM Cortex-A75

 ARM/Intel BusSpeed Benchmark 4A8 08-Feb-2023 12.24
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All
      16   6671   6851   7497   7964   7981   7983 L1
      32   7330   7498   7498   7979   7980   7990
      64   2827   2565   5606   7463   7836   7953
     128   1566   1426   2322   4300   6046   7990 L2
     256   1213    991   2076   3945   5492   7983
     512    604    625   1851   3750   5444   7974
    1024    616    588   1726   3202   4796   7103 L3
    4096    579    522   1228   2419   4788   7448 RAM
   16384    541    537   1135   2230   4545   7510
   65536    496    520   1145   2292   4582   7528

          Total Elapsed Time    4.9 seconds
  

Continued Below


BusSpeed Armv9 CPU Phone

This integer benchmark was run immediately after MemSpeed, when those for the on power run were often slower than the older phone. Comparing the on battery results often produced results proportional to the MHz ratio of 1.4. These latest results, indicates that many of the on battery measurements were around 50% faster.

 System 4 Android 13 1x 2.80 GHz Cortex-X2

Test 1 On Power

ARM/Intel BusSpeed Benchmark 4A8 20-Apr-2023 20.42
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All
      16   7774   8000   8823   9146   9166   7630 L1
      32   7262   7357   7375   7641   7648   7635
      64   6110   7378   7575   7644   7654   7633
     128   3745   3985   7557   7653   7653   7635 L2
     256   3742   3917   7567   7648   7654   7633
     512   3785   4060   7419   7652   7654   7597
    1024   3727   4073   6810   7647   7654   7626
    4096   3246   2934   5918   7611   7641   7625 L3
   16384   1803   1692   3441   6450   7556   7572 RAM
   65536   1485   1535   3175   6175   7495   7544

          Total Elapsed Time    5.1 seconds

Test 2 On Battery

ARM/Intel BusSpeed Benchmark 4A8 23-Apr-2023 14.03
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All
      16   9518   9771  10736  11133  11157  11145 L1
      32  10614  10747  10799  11160  11174  11153
      64   8911  10778  11062  11163  11167  11155
     128   5472   5824  11046  11169  11182  11152 L2
     256   5504   5782  11121  11174  11179  11155
     512   5544   5911  11065  11181  11172  11146
    1024   5479   6056  10871  11177  11178  11150
    4096   4731   4097   8153  11145  11145  11146 L3
   16384   2432   2023   4103   7354  10873  11063 RAM
   65536   1484   1712   3572   6648  10627  11050

          Total Elapsed Time    5.0 seconds

Test1/System 2

  KBytes  Inc32  Inc16   Inc8   Inc4   Inc2    All
      16   1.14   1.15   1.15   1.15   1.15   0.96
      32   0.96   0.96   0.96   0.96   0.96   0.96
      64   0.99   1.17   0.97   0.96   0.96   0.96
     128   1.89   1.12   1.28   0.97   0.97   0.96
     256   2.45   1.56   1.55   1.00   0.97   0.96
     512   3.70   2.21   2.03   1.05   1.34   1.10
    1024   4.10   2.61   2.22   1.32   0.98   1.15
    4096   5.01   2.96   2.78   1.84   1.03   1.02
   16384   3.08   1.93   1.92   1.77   1.11   0.96
   65536   2.61   1.79   1.80   1.76   1.13   0.96

Test2/System 2

  KBytes  Inc32  Inc16   Inc8   Inc4   Inc2    All
      16   1.40   1.40   1.40   1.40   1.40   1.40
      32   1.40   1.40   1.41   1.40   1.40   1.40
      64   1.44   1.71   1.41   1.40   1.40   1.40
     128   2.77   1.64   1.87   1.41   1.41   1.40
     256   3.61   2.30   2.28   1.46   1.41   1.40
     512   5.42   3.22   3.02   1.54   1.96   1.61
    1024   6.02   3.88   3.54   1.92   1.43   1.69
    4096   7.30   4.13   3.82   2.70   1.51   1.50
   16384   4.15   2.31   2.29   2.01   1.59   1.40
   65536   2.60   2.00   2.03   1.90   1.60   1.40

Battery/Power Best Case

     512   1.46   1.46   1.49   1.46   1.46   1.47
    1024   1.47   1.49   1.60   1.46   1.46   1.46
    4096   1.46   1.40   1.38   1.46   1.46   1.46
  
RandMem Benchmark next or Go To Start


RandMem Benchmark - RandMemi.apk

RandMem benchmark carries out four tests comprising serial and random address selections using the same program structure, with read and read/write tests, where the data read points to the next address, with no arithmetic calculations. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches.

This benchmark demonstrates the best and worst data transfer speeds from RAM, running a single program. Best is serial reading that is has minimum CPU instruction execution time reading all data in a burst. Worst is random access with a low probability in reading data form the same burst.

Some of System 3’s results were noticeably slower than those in the other memory benchmarks.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

 ARM/Intel RandMem Benchmark 4A8 08-Feb-2023 10.53
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16     8659    13607    14309    13669 L1
       32    14800    15595    14275    13640
       64    14693    15357    14261    13579
      128    12719    13268     8758     7856 L2
      256    12616    13060     4867     5225
      512    12746    13177     2816     3274
     1024    12251    12337     1416     1908 L3
     4096    11763     7213      664      717
    16384    11472     6327      556      597 RAM
    65536    11481     5996      526      565

          Total Elapsed Time    8.1 seconds

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel RandMem Benchmark 4A8 08-Feb-2023 11.37
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16    14413    15265    14036    13429 L1
       32    14467    15309    14068    13413
       64    14558    15147    14022    13378
      128    12462    13066     6195     6645 L2
      256    12480    13083     4764     4853
      512    10959    12560     1962     2452
     1024    10617    12740     1195     1534 L3
     4096    12067     6824      534      538 RAM
    16384    12051     6031      409      415
    65536    12002     5763      349      364

          Total Elapsed Time    8.6 seconds

 System 3 Android 13 2.0 GHz ARM Cortex-A75

 ARM/Intel RandMem Benchmark 4A8 08-Feb-2023 12.25
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16    12972    15051    12798    12393 L1
       32    13116    15184    12788    13243
       64    12814    15150    11406    12759
      128     8668     8727     2588     3199 L2
      256     8078     7972     2279     2567
      512     8017     7301     1555     1779
     1024     7165     6442     1056     1268 L3
     4096     7481     3425      484      410 RAM
    16384     7453     3262      343      273
    65536     7080     3014      333      292

          Total Elapsed Time    8.5 seconds
   

Continued Below


RandMem Armv9 CPU Phone

Run during the hot time period, the on power performance still produced better performance than the older phone, particularly during the random tests using the larger L3 cache. This helped even more during the later battery tests, where maximum gain was 10.72 times. This time, maximum On Battery speed gains were around 30% but little different from RAM.

 System 4 Android 13 1x 2.80 GHz Cortex-X2

Test 1

ARM/Intel RandMem Benchmark 4A8 20-Apr-2023 20.43
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16    26053    25057    23723    18443 L1
       32    23084    22915    22289    18268
       64    21887    22732    21187    16691
      128    20287    21627    13268    10698 L2
      256    20283    21661    10263     9161
      512    20217    21467     8842     8383
     1024    20015    21326     7138     7354
     4096    20218    20853     3323     4499 L3
    16384    19874    12556     1568     1962 RAM
    65536    19649    11471      983     1328

          Total Elapsed Time    7.9 seconds

Test 2 Battery

ARM/Intel RandMem Benchmark 4A8 23-Apr-2023 14.00
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16    31747    30518    30144    24456 L1
       32    30682    30415    29525    24245
       64    29039    30172    28299    22411
      128    26821    28695    17049    14193 L2
      256    26980    28762    13155    11756
      512    25989    27680    11462    10935
     1024    25887    27358     9344     9597
     4096    25894    25909     4078     5770 L3
    16384    23440    13046     1647     1987 RAM
    65536    22756    11750     1023     1372

Test1/System 2

   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16     1.81     1.64     1.69     1.37
       32     1.60     1.50     1.58     1.36
       64     1.50     1.50     1.51     1.25
      128     1.63     1.66     2.14     1.61
      256     1.63     1.66     2.15     1.89
      512     1.84     1.71     4.51     3.42
     1024     1.89     1.67     5.97     4.79
     4096     1.68     3.06     6.22     8.36 L3
    16384     1.65     2.08     3.83     4.73 RAM
    65536     1.64     1.99     2.82     3.65

Test2/System 2

   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16     2.20     2.00     2.15     1.82
       32     2.12     1.99     2.10     1.81
       64     1.99     1.99     2.02     1.68
      128     2.15     2.20     2.75     2.14
      256     2.16     2.20     2.76     2.42
      512     2.37     2.20     5.84     4.46
     1024     2.44     2.15     7.82     6.26
     4096     2.15     3.80     7.64    10.72 L3
    16384     1.95     2.16     4.03     4.79 RAM
    65536     1.90     2.04     2.93     3.77

Battery/Power Best and Worst Case

     1024     1.29     1.28     1.31     1.31
     4096     1.28     1.24     1.23     1.28
    16384     1.18     1.04     1.05     1.01
    65536     1.16     1.02     1.04     1.03
  
FFT Benchmarks next or Go To Start


FFT Benchmarks - fft1.apk, fft3c.apk

The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), with running times in milliseconds. Two versions are available FFT1, original version and with optimised C code as FFT3c. Memory used increases with FFT sizes, up to use from RAM and is often accessed on a skipped sequential basis, leading to burst reading effects. The charge from using a different cache or RAM is demonstrated where execution time is more than double on doubling the FFT size.

Here, on executing FFT1, system 2 is shown to be faster than system 1. This test was repeated later, showing system 1 slightly faster, as expected. As with all these first tests, the benchmarks were run with power connected, with the reason for the difference being unknown. This demonstrates the danger in assessing performance by running a single benchmark.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

 ARM/Intel FFT Benchmark 1 4A8 08-Feb-2023 10.55
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.047     0.044     0.042     0.044     0.044     0.042 
    2     0.092     0.091     0.091     0.092     0.091     0.091 
    4     0.197     0.197     0.196     0.204     0.202     0.203 
    8     0.434     0.429     0.429     0.573     0.461     0.302 
   16     1.196     1.199     1.183     1.395     1.428     1.265 
   32     3.331     3.275     3.271     4.362     4.296     4.123 
   64     7.407     7.325     6.456     8.545     8.260     7.313 
  128    14.196    13.447    12.777    24.470    24.741    23.636 
  256    43.757    43.396    43.050    66.080    65.481    65.891 
  512   121.602   121.637   121.264   157.855   157.641   157.182 
 1024   310.438   309.197   303.803   369.157   364.380   362.249 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    4.3 seconds

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel FFT Benchmark 1 4A8 08-Feb-2023 11.40
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.037     0.030     0.030     0.031     0.031     0.030 
    2     0.065     0.064     0.064     0.065     0.065     0.064 
    4     0.140     0.139     0.139     0.144     0.143     0.143 
    8     0.306     0.303     0.303     0.420     0.411     0.410 
   16     0.697     0.668     0.666     1.002     0.875     0.836 
   32     1.740     1.744     1.707     2.158     2.112     2.090 
   64     4.656     4.247     4.453     5.826     5.675     6.420 
  128    17.591    12.325    11.902    23.000    23.823    22.929 
  256    45.956    47.550    46.355    64.257    63.979    63.376 
  512   120.193   120.099   124.833   156.133   155.517   156.019 
 1024   295.659   334.325   304.642   361.975   360.212   361.947 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    4.1 seconds

 System 3 Android 13 2.0 GHz ARM Cortex-A75

 ARM/Intel FFT Benchmark 1 4A8 08-Feb-2023 12.26
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.034     0.030     0.030     0.026     0.025     0.025 
    2     0.065     0.065     0.065     0.055     0.055     0.054 
    4     0.141     0.142     0.139     0.154     0.152     0.154 
    8     0.329     0.337     0.335     0.440     0.442     0.454 
   16     0.872     0.895     0.877     1.054     1.071     1.089 
   32     2.182     2.168     2.146     2.729     2.840     2.793 
   64     5.401     5.475     5.492     9.277     9.631     9.695 
  128    16.977    17.529    17.099    39.834    43.928    43.814 
  256    85.865    82.130    81.941   112.404   108.405   110.697 
  512   215.935   221.886   219.700   258.905   259.124   258.621 
 1024   506.663   504.806   500.864   604.900   598.287   595.695 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    6.5 seconds

Continued Below


FFT Benchmark 1 Armv9 CPU Phone

The FFTs access data on a skipped sequential basis, with the same sort of impact as random access on burst reading. This affects the larger FFTs and use of the large L3 cache, leading to performance gains, over the older phone, being more than four times. The ups and downs on other performance comparisons are difficult to explain. The smaller FFTs are more dependent on data transmission speed where, in this case, some On Power measurements indicate faster speeds.

 System 4 Android 13 1x 2.80 GHz Cortex-X2

Test 1

ARM/Intel FFT Benchmark 1 4A8 20-Apr-2023 20.58
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.028     0.024     0.022     0.024     0.023     0.022 
    2     0.050     0.049     0.049     0.050     0.049     0.048 
    4     0.108     0.130     0.099     0.103     0.102     0.102 
    8     0.224     0.223     0.223     0.404     0.372     0.365 
   16     0.803     0.782     0.792     0.827     0.698     0.696 
   32     1.394     1.428     1.402     1.313     1.343     1.211 
   64     2.364     2.368     2.373     2.606     2.441     2.213 
  128     4.666     4.417     4.580     5.713     5.632     5.501 
  256    11.612    11.316    11.384    14.595    13.892    14.434 
  512    27.517    26.152    25.995    38.339    41.675    41.686 
 1024    79.904    78.725    78.795   105.524   105.813   107.723 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    1.3 seconds

Test 2 Battery

ARM/Intel FFT Benchmark 1 4A8 23-Apr-2023 14.06
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.027     0.023     0.022     0.024     0.023     0.022 
    2     0.050     0.178     0.049     0.050     0.049     0.049 
    4     0.108     0.107     0.107     0.112     0.111     0.111 
    8     0.245     0.245     0.242     0.400     0.412     0.397 
   16     0.850     0.857     0.865     0.950     0.892     0.694 
   32     1.524     1.404     1.417     1.391     1.259     1.212 
   64     2.543     2.188     2.174     2.316     2.287     2.183 
  128     4.584     4.687     4.464     4.886     4.555     4.635 
  256     9.222     9.279     9.224    10.926    10.972    10.583 
  512    22.076    21.046    21.753    33.690    31.855    33.518 
 1024    59.946    61.047    60.812    89.821    90.799    90.701 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    1.1 seconds

 Average Comparisons

           Test 1/Old 2        Test 2/old 2        Battery/Power

            SP        DP        SP        DP        SP        DP
    1      1.31      1.32      1.35      1.32      1.03      1.00
    2      1.30      1.37      0.70      1.29      0.53      0.94
    4      1.24      1.15      1.30      1.08      1.05      0.94
    8      1.36      1.18      1.25      1.06      0.92      0.90
   16      0.85      1.49      0.79      1.42      0.92      0.95
   32      1.23      2.18      1.19      2.28      0.97      1.04
   64      1.88      3.64      1.93      4.20      1.03      1.16
  128      3.06      4.37      3.04      5.61      0.99      1.28
  256      4.08      4.00      5.04      5.01      1.24      1.25
  512      4.58      3.52      5.63      4.19      1.23      1.19
 1024      3.94      3.40      5.14      4.00      1.31      1.18
  
Second FFT Benchmark Results below


FFT fft3c.apk Results

With all these performance measurements, selecting a representative sample is difficult. At least averaging the three running times is helpful. An example of comparing those for this benchmark with th earlier one (including using the revised System 1 times) shows that the latter was faster on the small FFTs. The average FFT3c/FFT1 running time ratios were between 0.76 and 3.11, with 1.53 average. Taking the total running time of one of each FFT size, produced ratios of 2.79 single precision and 2.47 double precision. For FFT3c total running time of one of each indicted System 1 was 2.6% faster than System 2 and 43% faster than System 3.


 System 1 Android 11 2.05 GHz ARM Cortex-A76
 
 ARM/Intel FFT Benchmark 3c 4A8 08-Feb-2023 10.56
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.035     0.029     0.028     0.030     0.028     0.028 
    2     0.066     0.062     0.061     0.063     0.059     0.062 
    4     0.141     0.132     0.134     0.136     0.136     0.134 
    8     0.307     0.290     0.290     0.360     0.350     0.338 
   16     0.702     0.676     0.675     0.790     0.766     0.790 
   32     1.545     1.476     1.472     1.754     1.766     1.783 
   64     3.423     3.333     3.367     4.380     4.278     4.231 
  128     8.240     8.024     8.108    11.155    10.916    10.553 
  256    19.756    19.283    19.493    26.542    26.701    26.368 
  512    43.903    43.320    43.422    60.771    61.454    60.828 
 1024    94.409    93.012    93.336   145.439   142.632   144.625 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    2.1 seconds

 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

 ARM/Intel FFT Benchmark 3c 4A8 08-Feb-2023 11.42
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.044     0.030     0.030     0.031     0.028     0.028 
    2     0.069     0.063     0.063     0.061     0.060     0.060 
    4     0.162     0.135     0.135     0.135     0.133     0.133 
    8     0.347     0.301     0.298     0.317     0.314     0.337 
   16     0.841     0.722     0.908     0.826     1.134     0.840 
   32     1.795     1.753     1.652     2.089     2.047     1.987 
   64     3.586     3.422     3.732     4.646     4.674     4.701 
  128     8.411     8.138     7.877    10.902    10.906    10.933 
  256    19.554    20.523    19.439    25.088    24.605    26.126 
  512    47.427    44.633    44.105    56.174    63.102    62.016 
 1024   107.446   102.961   101.591   145.147   141.521   141.941 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    2.1 seconds

 System 3 Android 13 2.0 GHz ARM Cortex-A75

 ARM/Intel FFT Benchmark 3c 4A8 08-Feb-2023 12.27
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.054     0.035     0.034     0.035     0.032     0.032 
    2     0.076     0.073     0.073     0.073     0.070     0.070 
    4     0.165     0.157     0.161     0.169     0.167     0.165 
    8     0.381     0.353     0.360     0.391     0.382     0.379 
   16     0.856     0.823     0.836     0.991     0.966     0.983 
   32     1.861     1.852     1.899     2.365     2.297     2.317 
   64     4.402     4.224     4.266     6.097     5.913     6.111 
  128    10.802    10.491    10.793    15.843    15.477    15.512 
  256    26.539    25.950    26.473    37.175    37.135    37.191 
  512    58.571    57.610    56.704    88.722    90.241    88.155 
 1024   125.591   124.655   126.555   217.677   222.146   221.802 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28
  
Continued Below


FFT Benchmark 3C Armv9 CPU Phone

With these, the smaller sizes are more dependent on processing speed, where the most noticeable feature is the increased performance gains with the double double precision versions compared with single precision ones. Many of the power and battery results were similar.

Running time of this benchmark is now less than one second, with some some measured FFT time being at microsecond level, possibly dependent on timer resolution and questioning validity of comparisons.

 System 4 Android 13 1x 2.80 GHz Cortex-X2

Test 1 Power

ARM/Intel FFT Benchmark 3c 4A8 20-Apr-2023 20.59
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.039     0.026     0.025     0.013     0.011     0.011 
    2     0.061     0.054     0.054     0.025     0.050     0.023 
    4     0.128     0.115     0.115     0.053     0.051     0.051 
    8     0.303     0.254     0.253     0.124     0.121     0.120 
   16     0.641     0.607     0.606     0.296     0.284     0.285 
   32     1.345     1.339     1.042     0.627     0.609     0.611 
   64     2.434     2.049     1.824     1.360     1.406     1.322 
  128     3.597     3.419     3.412     2.985     2.890     2.957 
  256     6.718     6.180     6.077     7.266     7.216     7.083 
  512    13.537    12.908    12.913    17.726    19.994    20.027 
 1024    31.804    30.518    30.398    46.998    44.458    44.174 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    0.7 seconds

Test 2 Battery

ARM/Intel FFT Benchmark 3c 4A8 23-Apr-2023 14.10
           Compiled for 64 bit ARM v8a

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.051     0.026     0.025     0.013     0.011     0.011 
    2     0.061     0.054     0.053     0.025     0.023     0.023 
    4     0.139     0.115     0.115     0.053     0.051     0.051 
    8     0.276     0.257     0.254     0.123     0.121     0.141 
   16     0.646     0.607     0.604     0.295     0.284     0.284 
   32     1.366     0.985     0.979     0.632     0.619     0.618 
   64     2.240     2.054     1.869     1.394     1.333     1.328 
  128     3.824     3.569     2.988     3.016     2.914     2.882 
  256     6.934     6.319     6.096     7.240     7.160     7.170 
  512    13.635    13.227    13.144    17.729    17.652    17.596 
 1024    30.851    30.087    29.866    40.093    38.497    38.396 

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    0.7 seconds

 Average Comparisons

           Test 1/Old 2        Test 2/old 2        Battery/Power

            SP        DP        SP        DP        SP        DP
    1      1.16      2.02      1.02      2.53      0.88      1.25
    2      1.15      2.30      1.16      2.58      1.01      1.12
    4      1.21      2.63      1.17      2.54      0.97      0.96
    8      1.17      3.06      1.20      3.02      1.03      0.99
   16      1.33      3.29      1.33      3.27      1.00      0.99
   32      1.40      3.39      1.56      3.40      1.12      1.00
   64      1.70      3.62      1.74      3.63      1.02      1.00
  128      2.34      3.57      2.35      3.57      1.00      1.00
  256      3.14      3.24      3.08      3.45      0.98      1.06
  512      3.46      3.15      3.40      3.59      0.98      1.14
 1024      3.36      3.16      3.44      3.66      1.02      1.16


  

MP-Whetstone Benchmark next or Go To Start


MP-Whetstone Benchmark - MP-WHETSi.apk

For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same shared code, with separate variables.

Before comparing results, it should be noted that the high Fixpt MOPS are impossible to achieve, where the compiler has found that some of the code can be ignored without changing he calculated result. However, the time for this function has little effect on overall MWIPS rating.

With mixed MHz CPU cores and big.LITTLE architectures, comparisons become more complex, where each one indicates superior performance in specific areas.

For this benchmark, overall seconds depend on calibrations and should not be compared. However, in an ideal world, on each system the time would be constant up to 8 threads accessing 8 CPU cores. Comparing overall MWIPS ratings, throughput running 2, 4 and 8 threads, over one thread, were around twice using 2 threads, then about 3.4 times at 4 threads, then between 5.1 and 6.2 with 8 threads.

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

    ARM/Intel MP-Whetstone Benchmark 4A8 08-Feb-2023 16.56
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads
     MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

1T  3886.0  680.6  663.1  734.2 119.3  62.5 23326.1  1976.9  741.3
2T  7695.3 1541.7 1409.7 1456.9 240.9 115.2 98493.7  4205.8 1474.0
4T 12943.7 2547.7 2495.0 2575.8 365.2 220.6148870.7  8186.7 2268.0
8T 24326.3 4564.2 4353.4 4700.6 695.7 435.2323353.9 22743.2 4101.4

 Overall Seconds   2.91 1T,   2.91 2T,   3.93 4T,   4.83 8T

 All calculations produced consistent numeric results

          Total Elapsed Time   14.9 seconds


 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)

    ARM/Intel MP-Whetstone Benchmark 4A8 08-Feb-2023 17.22
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads
     MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

1T  4064.7  957.0  728.2  738.0 129.4  64.6 18308.6  2444.3  751.1
2T  8660.6 1757.5 1505.7 1596.2 270.0 142.0 85717.2  5241.1 1505.1
4T 14117.1 3461.0 3322.1 2696.8 439.0 239.9140592.8 11249.6 2471.6
8T 20887.7 4732.1 4868.8 4176.3 518.0 386.2309958.3 19432.5 3457.2

 Overall Seconds   2.74 1T,   2.67 2T,   3.98 4T,   4.57 8T

 All calculations produced consistent numeric results

          Total Elapsed Time   14.3 seconds


 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

    ARM/Intel MP-Whetstone Benchmark 4A8 08-Feb-2023 15.43
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads
     MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp   Fixpt      If  Equal
                1      2      3  MOPS  MOPS    MOPS    MOPS   MOPS

1T  3856.7  819.3  818.8  666.1 130.1  63.4 50817.9  2984.8  562.2
2T  7716.2 1637.2 1636.4 1332.3 260.3 126.9112199.9  5982.7 1124.6
4T 13246.4 2792.2 2730.5 2385.4 421.9 230.4192831.0 11651.4 1966.1
8T 20674.2 4431.3 4528.9 3840.0 596.8 390.2289064.4 21237.2 3009.2

 Overall Seconds   4.99 1T,   4.99 2T,   6.67 4T,   8.09 8T

 All calculations produced consistent numeric results

          Total Elapsed Time   25.7 seconds
  

Continued Below


MP-Whetstone Benchmark Armv9 CPU Phone

As indicated by Overall Seconds, the earlier time for twin fast core devices was effectively the same using 1 and 2 threads. System 4 CPU has only one fast core, resulting in running time being 26% longer using two threads.

Some of the test function running times were, again, in the microsecond range, possibly distorting comparisons. The single core benchmark obtained an overall speed rating of 1.50 times the older phone used for comparison purposes. This time it was between 1.50 and 1.75 times, depending on the thread count.

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

 Test 1 Battery

     ARM/Intel MP-Whetstone Benchmark 4A8 23-Apr-2023 14.30
           Compiled for 64 bit ARM v8a
 
                   Using 1, 2, 4 and 8 Threads
     MWIPS MFLOPS MFLOPS MFLOPS    Cos   Exp    Fixpt      If  Equal
                1      2      3   MOPS  MOPS     MOPS    MOPS   MOPS

1T  6937.5 1419.7 1389.3 1092.2  236.7 103.3  83276.2  3985.3 1923.4
2T 12987.7 2695.4 2562.8 2087.1  449.1 197.9 125540.0  7418.4 3212.5
4T 24739.0 5315.1 5214.5 4090.7  835.9 384.5 244974.6 14825.7 5227.7
8T 32198.9 7676.4 7993.3 5510.1 1035.3 510.5 331293.6 25184.5 5594.2

 Overall Seconds   4.20 1T,   5.29 2T,   6.00 4T,   9.04 8T

 All calculations produced consistent numeric results

          Total Elapsed Time   25.1 seconds

Test1/System 2

1T    1.71   1.48   1.91   1.48   1.83  1.60     4.55    1.63   2.56
2T    1.50   1.53   1.70   1.31   1.66  1.39     1.46    1.42   2.13
4T    1.75   1.54   1.57   1.52   1.90  1.60     1.74    1.32   2.12
8T    1.54   1.62   1.64   1.32   2.00  1.32     1.07    1.30   1.62
 

MP-Dhrystone Benchmark next or Go To Start


MP Dhrystone Benchmark - MP-Dhryi.apk

This benchmark does not provide reasonable increases in measured performance using multiple cores, probably because many of the variables used are shared by all threads. Results using one thread are only slightly slower than from the single core version, indicating that threading overheads were not excessive. The lack of improvement using multiple cores probably invalidates comparisons of the two systems. At least the System 4/System 2 performance comparison indicated between 2.0 and 2.45 times gain.

 System 1 Android 11 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

 ARM/Intel MP-Dhrystone 2 Benchmark 4A8 08-Feb-2023 16.58
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads

 Threads                        1        2        4        8
 Seconds                     0.80     2.03     5.47    14.00
 Dhrystones per Second   25133472 19708774 14614211 11428905
 VAX MIPS rating            14305    11217     8318     6505

 Internal pass count correct all threads

          Total Elapsed Time   22.7 seconds

 System 2 Android 12 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

 ARM/Intel MP-Dhrystone 2 Benchmark 4A8 08-Feb-2023 17.24
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads

 Threads                        1        2        4        8
 Seconds                     0.84     2.24     6.23    14.31
 Dhrystones per Second   23687920 17834612 12843313 11183452
 VAX MIPS rating            13482    10151     7310     6365

 Internal pass count correct all threads

          Total Elapsed Time   24.1 seconds

 System 3 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

 ARM/Intel MP-Dhrystone 2 Benchmark 4A8 08-Feb-2023 15.45
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads

 Threads                        1        2        4        8
 Seconds                     0.75     1.97     4.98    12.88
 Dhrystones per Second   21326073 16280555 12851505  9937004
 VAX MIPS rating            12138     9266     7314     5656

 Internal pass count correct all threads

          Total Elapsed Time   21.3 seconds

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

ARM/Intel MP-Dhrystone 2 Benchmark 4A8 23-Apr-2023 14.32
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads

Threads                        1        2        4        8
Seconds                     0.69     2.01     5.08    14.28
Dhrystones per Second   57735505 39843345 31467495 22401220
VAX MIPS rating            32860    22677    17910    12750

Internal pass count correct all threads

          Total Elapsed Time   22.6 seconds

System 4/Syestem 2          2.44     2.23     2.45     2.00
  

NEON-Linpack-MP Benchmark - NEON-Linpacki-MP.apk

This is a multithreading version of the above Further details and results can be found in android neon benchmarks.htm and android benchmarks.htm.

This benchmark is not generally available with the new 4A8 compilation as overall running time had increased to more than 400 seconds, on a new phone.

MP-BusSpeed Benchmark next or Go To Start


MP-BusSpeed Benchmark - MP-BusSpd2i.apk

This is a multithreading version of BusSpeed above, except, as for other memory benchmarks, restricted to three memory size demands that were originally representative of using L1 cache, L2 cache and RAM data. To avoid caching effects of RAM based data, this version arranges for threads to have staggered starting points, each reading all the data.

Considering Read All, performance of all three systems was virtually the same for cache based data, using the simple integer arithmetic involved. Systems 1 and 2 RAM speeds were quite similar, with system 3 far behind, maybe due to dual channel versus single channel operation.

Estimated bus speeds calculated as 16 times Inc16 results were similar to calculated MB/second when greater than one thread was used.

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

 ARM/Intel MP-BusSpd2 Benchmark 4A8 08-Feb-2023 17.01
           Compiled for 64 bit ARM v8a

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll
 12.3 1T   7329   7124   7451   7334   7341   7101 L1
      2T  10325  13362  14290   7684   8059   7832 <<< Later 14130
      4T  17070  19398  25187  24043  27212  20101
      8T  14174  17228  36750  29288  41665  29522
122.9 1T   1878   2887   4854   7296   7368   6407 L2
      2T   1863   3247   6737   7374  13119   7689
      4T   3830   6261   9539  14764  17344  15561
      8T   5462   8906  16427  25436  32650  29293
49152 1T    404    569   1155   2233   4053   4376 RAM
      2T    409    777   1583   3176   6429   9715
      4T    564    942   1821   3646   7426  11040
      8T    598    970   1950   3715   7974  15460
 No Errors Found
          Total Elapsed Time   58.4 seconds


 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)

 ARM/Intel MP-BusSpd2 Benchmark 4A8 08-Feb-2023 17.26
           Compiled for 64 bit ARM v8a

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll
 12.3 1T   7161   7297   7497   7588   7702   7460 L1
      2T   8249  12429  13881  13746  15061  15482
      4T   7947  10882  15414  19060  22373  19375
      8T  12283  11971  29090  27379  39212  26439
122.9 1T   1992   3367   6029   7489   7375   7503 L2
      2T   3907   7106  11767  14529  15642  15813
      4T   4709   7833  12544  18015  19659  19260
      8T   4742   8651  15108  25444  37308  32776
49152 1T    528    789   1730   3469   6325   7353 RAM
      2T    726    988   1832   3623   7074  13999  Calculated
      4T    719    882   1762   3321   6886  13740   Bus Speed
      8T    681    861   1800   3451   7147  13906     13776
 No Errors Found
          Total Elapsed Time   52.9 seconds


 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

 ARM/Intel MP-BusSpd2 Benchmark 4A8 08-Feb-2023 15.47
           Compiled for 64 bit ARM v8a

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll
 12.3 1T   7116   7549   7746   7936   7963   7976 L1
      2T  12590  13817  14578  15785  15865  15924
      4T  19944  23807  26173  27694  28498  20714
      8T  16635  16726  35602  29673  43358  32010
122.9 1T   1232   1142   2415   4406   5734   7975 L2
      2T   2718   3123   5270   8813  11478  15947
      4T   3100   4607   7739  13599  18013  20644
      8T   3189   6323   9391  19850  27135  30640
49152 1T    547    540   1116   2269   4488   7518 RAM
      2T    581    580   1140   2289   4582   9156
      4T    642    625   1691   3324   8091   9188  
      8T    601    687   1586   3099   5079   9027
 No Errors Found
          Total Elapsed Time   48.8 seconds
  

Continued Below


MP-BusSpeed Benchmark Armv9 CPU Phone

This was run on battery when maximum speeds were expected. Examining the main measurements, reading all data, performance improvements, executing the integer functions, was mainly proportion to the CPU MHz ratio, using one thread, but increasing with multiple ones. Maximum gain was with RAM data transfers using all CPU cores, reaching an increase of 2.47 times, with an estimated bus speed of 47.5 GB per second (2.968 x 16).

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz
  Cortex A510, 3x 2.52 GHz Cortex A710

ARM/Intel MP-BusSpd2 Benchmark 4A8 23-Apr-2023 14.34
           Compiled for 64 bit ARM v8a

   MB/Second Reading Data, 1, 2, 4 and 8 Threads
  KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll
 12.3 1T   9766  10568  10655  10683  10312  10724 L1
      2T  16006  17611  19131  19771  19436  19690
      4T  29633  30846  35796  35823  37133  37949
      8T  17413  18447  42972  39292  53233  52517
122.9 1T   4904   5381   8001   9509   9478   9553 L2
      2T   8182   8579  15623  18945  19070  19051
      4T  15433  15194  26980  34383  31191  35705
      8T  14336  15505  27156  35831  39276  47641
49152 1T   1158   1163   2593   5707  10124  10218 RAM
      2T   2580   2145   4723   9139  16890  18311  Calculated 
      4T   4236   3485   7626  12461  21916  30342   Bus Speed
      8T   2821   2968   6508  10792  21131  34406     47488
 No Errors Found
          Total Elapsed Time   50.9 seconds

 System 4 / System 2 

 12.3 1T   1.36   1.45   1.42   1.41   1.34   1.44
      2T   1.94   1.42   1.38   1.44   1.29   1.27
      4T   3.73   2.83   2.32   1.88   1.66   1.96
      8T   1.42   1.54   1.48   1.44   1.36   1.99
122.9 1T   2.46   1.60   1.33   1.27   1.29   1.27
      2T   2.09   1.21   1.33   1.30   1.22   1.20
      4T   3.28   1.94   2.15   1.91   1.59   1.85
      8T   3.02   1.79   1.80   1.41   1.05   1.45
49152 1T   2.19   1.47   1.50   1.65   1.60   1.39
      2T   3.55   2.17   2.58   2.52   2.39   1.31
      4T   5.89   3.95   4.33   3.75   3.18   2.21
      8T   4.14   3.45   3.62   3.13   2.96   2.47
  
MP-RandMem Benchmark next or Go To Start


MP-RandMem Benchmark - MP-RndMemi.apk

This is a multithreading version of RandMem above. The most striking feature of these MP results is the apparent constant or near performance at all thread sizes during read/write tests, over the memory area covered. This is probably because write back involves accessing RAM.

This program simply reads (or writes) data that supplies the next location to access. this lack of arithmetic calculations apparently provides faster data transmission speeds than BusSpeed.

Repeating the benchmark on System 1 continued to produce variable performance on RndRDWR tests using RAM.

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

 ARM/Intel MP-RndMem Benchmark 4A8 08-Feb-2023 17.04
           Compiled for 64 bit ARM v8a

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR
12.29 1T   15672   16244   15166   13508 L1
      2T   14435   10708   21438    9174
      4T   35744    8391   34088    7762
      8T   52284    8129   32321    7232
122.9 1T   11052   11762    7956    7209 L2
      2T   17349    9400   14378    5457
      4T   30743    7405   18898    5343
      8T   44553    6837   21266    4174
12288 1T   11287    6549     407     424 RAM
      2T    9081    4458     641     223
      4T   14381    3463     539      64
      8T   16627    2564    1061     121
 No Errors Found
          Total Elapsed Time   47.9 seconds


 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR
12.29 1T   15277   15160   13995   13764 L1
      2T   27401   14764   27575   13529
      4T   30145   14883   29903   13394
      8T   43856   14293   33190   13297
122.9 1T   12005   13509    7296    7303 L2
      2T   25241   12840   14676    7336
      4T   30128   12674   15276    7226
      8T   46484   11959   18064    7166
12288 1T   11371    6158     437     429 RAM
      2T   15348    5818     471     402
      4T   14136    5793     499     404
      8T   17555    5276     597     392
 No Errors Found
          Total Elapsed Time   47.2 seconds


 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

 ARM/Intel MP-RndMem Benchmark 4A8 08-Feb-2023 15.49
           Compiled for 64 bit ARM v8a

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR
12.29 1T   13840   15739   13328   13741 L1
      2T   25791   15710   25075   13919
      4T   34426   15334   33819   13779
      8T   50511   15029   38275   13788
122.9 1T    8965    9269    2727    3397 L2
      2T   16943    9249    6348    3391
      4T   24738    9152    8399    3410
      8T   42321    9190   12827    3402
12288 1T    7704    3364     510     358 RAM
      2T    9140    3371     550     334
      4T   15521    3367     574     358
      8T   14550    3358     747     358
 No Errors Found
          Total Elapsed Time   42.6 seconds
  

Continued Below


MP-RandMem Benchmark Armv9 CPU Phone

Cached based measurements indicated gains over the older phone between 1.05 and 2.94. Best performance was random access of RAM based data, between 4.53 and 10.56, influenced by the larger L3 cache.

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

 ARM/Intel MP-RndMem Benchmark 4A8 23-Apr-2023 14.38
           Compiled for 64 bit ARM v8a

 Battery

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR
12.29 1T   30966   18582   17061   14184
      2T   29434   17173   29474   14222
      4T   58965   25538   88024   22464
      8T   91141   23590   67089   21167
122.9 1T   26009   19920   12525   10496
      2T   39542   23049   23892   13454
      4T   71554   23106   39923   12058
      8T   75854   20575   42745    9824
12288 1T   23597   12335    1980    2921
      2T   33194   11639    3260    2735
      4T   44727   10552    5269    2372
      8T   50346    9798    5297    1920
No Errors Found

System 4 / System 2 

 KB         SerRD SerRDWR   RndRD RndRDWR
12.29 1T     2.03    1.23    1.22    1.03
      2T     1.07    1.16    1.07    1.05
      4T     1.96    1.72    2.94    1.68
      8T     2.08    1.65    2.02    1.59
122.9 1T     2.17    1.47    1.72    1.44
      2T     1.57    1.80    1.63    1.83
      4T     2.38    1.82    2.61    1.67
      8T     1.63    1.72    2.37    1.37
12288 1T     2.08    2.00    4.53    6.81
      2T     2.16    2.00    6.92    6.80
      4T     3.16    1.82   10.56    5.87
      8T     2.87    1.86    8.87    4.90
 
MP-MFLOPS Benchmark next or Go To Start


MP-MFLOPS Benchmark - MP-MFLOPS2i.apk

The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run.

As indicated earlier, on using SIMD with 128 bit registers and linked (fused) multiply and add, up to eight single precision floating point operations could be expected per clock cycle, or 16 GFLOPS per core at 2 GHz. The first two processors, with Cortex A76 CPUs appear to have reasonable implementation of SIMD, achieving over 12 GFLOPS at 32 operations per word, with System 3 far behind. All show acceptable improvements using two cores, performance improvements then becoming disappointing using four cores, with these big.LITTLE CPU architectures.

Note that all systems obtained the same sumchecks of numeric calculations at all levels of threading.

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

 ARM/Intel MP-MFLOPS2 Benchmark 4A8 08-Feb-2023 17.06
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     5378    5545    3318   12106   12306   11395
 2T    10988   10354    3174   22955   23278   12780
 4T     9979   10692    2591   25718   25633   24694
 8T    13285   14803    2433   30061   31648   28941
 Results x 100000, 0 indicates ERRORS
 1T    40392   76406   99700   35218   66014   99520
 2T    40392   76406   99700   35218   66014   99520
 4T    40392   76406   99700   35218   66014   99520
 8T    40392   76406   99700   35218   66014   99520

          Total Elapsed Time    8.1 seconds


 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)

 ARM/Intel MP-MFLOPS2 Benchmark 4A8 08-Feb-2023 17.31
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     6819    6238    2804   12506   12537   12441
 2T     8797    9307    2946   22427   24126   22731
 4T     9364    9132    2554   25008   26004   25345
 8T    10985   13262    2398   33664   34024   32553
 Results x 100000, 0 indicates ERRORS
 1T    40392   76406   99700   35218   66014   99520
 2T    40392   76406   99700   35218   66014   99520
 4T    40392   76406   99700   35218   66014   99520
 8T    40392   76406   99700   35218   66014   99520

          Total Elapsed Time    7.5 seconds


 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

 ARM/Intel MP-MFLOPS2 Benchmark 4A8 08-Feb-2023 15.52
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     5825    4972    1567    7724    7327    7052
 2T    11131   11772    1673   14574   15183   14065
 4T    11598   13049    1775   17670   17991   17216
 8T    13773   15038    1748   23906   24232   22806
 Results x 100000, 0 indicates ERRORS
 1T    40392   76406   99700   35218   66014   99520
 2T    40392   76406   99700   35218   66014   99520
 4T    40392   76406   99700   35218   66014   99520
 8T    40392   76406   99700   35218   66014   99520

          Total Elapsed Time   11.5 seconds
  

Continued Below


MP-MFLOPS Benchmark Armv9 CPU Phone

Using the older CPUs, 8 single precision floating point operations per clock cycle could be expected, that would lead to 22.4 GFLOPS at 2.8 GHz. Measured results indicate a maximum of 31.8 using 1 core, at 11.36 operations per clock cycle, when perhaps 6.5 could be expected with the particular code used. It seems that extended SIMD operation has been applied to existing SIMD vector instructions.

Again, running times of individual tests could be too short to provide accurate performance estimates and comparisons. But it is clear that more than twice as fast as the older phone can be achieved. On Power heating effects indicate possible reductions in performance of more than 25%.

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

Test 1 Power

ARM/Intel MP-MFLOPS2 Benchmark 4A8 20-Apr-2023 20.48
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
KB     12.8     128   12800    12.8     128   12800
MFLOPS
1T    15822   13964    5340   24338   21850   21784
2T    17818   22599    5582   30511   30811   29994
4T    28770   25695   14235   48935   51359   48815
8T    44099   36862   25214   66160   71096   74910
Results x 100000, 0 indicates ERRORS
1T    40392   76406   99700   35218   66014   99520
2T    40392   76406   99700   35218   66014   99520
4T    40392   76406   99700   35218   66014   99520
8T    40392   76406   99700   35218   66014   99520

          Total Elapsed Time    4.0 seconds

Test 2 Battery

ARM/Intel MP-MFLOPS2 Benchmark 4A8 23-Apr-2023 14.13
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
KB     12.8     128   12800    12.8     128   12800
MFLOPS
1T    15285   14091    6830   31790   30388   30516
2T    21438   19857    8629   40890   41320   41764
4T    38093   25569   14322   64398   66969   64473
8T    40847   39072   31887   66206   68989   70401
Results x 100000, 0 indicates ERRORS
1T    40392   76406   99700   35218   66014   99520
2T    40392   76406   99700   35218   66014   99520
4T    40392   76406   99700   35218   66014   99520
8T    40392   76406   99700   35218   66014   99520

          Total Elapsed Time    3.2 seconds

Test1/System 2

1T     2.32    2.24    1.90    1.95    1.74    1.75
2T     2.03    2.43    1.89    1.36    1.28    1.32
4T     3.07    2.81    5.57    1.96    1.98    1.93
8T     4.01    2.78   10.51    1.97    2.09    2.30

Test2/System 2

1T     2.24    2.26    2.44    2.54    2.42    2.45
2T     2.44    2.13    2.93    1.82    1.71    1.84
4T     4.07    2.80    5.61    2.58    2.58    2.54
8T     3.72    2.95   13.30    1.97    2.03    2.16

Battery/Power

1T     0.97    1.01    1.28    1.31    1.39    1.40
2T     1.20    0.88    1.55    1.34    1.34    1.39
4T     1.32    1.00    1.01    1.32    1.30    1.32
8T     0.93    1.06    1.26    1.00    0.97    0.94
  

NEON-MFLOPS-MP Benchmark next or Go To Start


NEON-MFLOPS-MP Benchmark - NEON-MFLOPS2i-MP.apk

This benchmark carries out the same calculations as MP-MFLOPS but uses hand coded NEON Intrinsic Functions. Measured maximum performance was essentially the same. In both cases, performance at 2 operations per word can vary significantly, being more dependent on data flow than processing speed.

All produced identical sumchecks, these being different to those from MP-MFLOPS, probably due to a variance initial run time calibration or SIMD content.

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

 ARM NEON-MFLOPS2-MP Benchmark 4A8 08-Feb-2023 17.07
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     7929    7999    3322   13136   13104   13090
 2T    14163   13998    3171   25686   25710   25825
 4T    15732   15495    3008   27646   27012   24837
 8T     9105   12776    2439   29803   28991   27127
 Results x 100000, 12345 indicates ERRORS
 1T    44934   86735   99850   36770   79897   99759
 2T    44934   86735   99850   36770   79897   99759
 4T    44934   86735   99850   36770   79897   99759
 8T    44934   86735   99850   36770   79897   99759

          Total Elapsed Time    3.6 seconds


 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)

 ARM NEON-MFLOPS2-MP Benchmark 4A8 08-Feb-2023 17.33
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     4396    4753    2555   12669   11585   11782
 2T     4661    6779    2894   22112   21236   21738
 4T     7706    6001    2561   23015   26865   24635
 8T     7286    7062    2397   35348   31644   29849
 Results x 100000, 12345 indicates ERRORS
 1T    44934   86735   99850   36770   79897   99759
 2T    44934   86735   99850   36770   79897   99759
 4T    44934   86735   99850   36770   79897   99759
 8T    44934   86735   99850   36770   79897   99759

          Total Elapsed Time    4.1 seconds


 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     5486    5040    1706    7138    7167    7176
 2T    11637   11560    1787   14195   14325   14398
 4T    10948   10623    1853   17213   17304   17096
 8T    12279   11952    1846   23173   23078   23495
 Results x 100000, 12345 indicates ERRORS
 1T    44934   86735   99850   36770   79897   99759
 2T    44934   86735   99850   36770   79897   99759
 4T    44934   86735   99850   36770   79897   99759
 8T    44934   86735   99850   36770   79897   99759

          Total Elapsed Time    5.9 seconds
  

Continued Below


NEON-MFLOPS-MP Benchmark Armv9 CPU Phone

The On Battery and On Power tests were carried out consecutively, when the phone was not particularly warm. Subject to inaccuracies due to short running times, it can be assumed, from the calculations below, that performance was the same. Similarly, it was an improvement of at least twice, of that for the older phone.

Comparing NEON-MFLOPS-MP with MP-MFLOPS indicates that performance was similar at 32 Ops/Word but the latter could be faster at 2 Ops/Word.

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

 Test 1 Battery

ARM NEON-MFLOPS2-MP Benchmark 4A8 23-Apr-2023 14.16
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
KB     12.8     128   12800    12.8     128   12800
MFLOPS
1T     8810    9275    5476   29395   28331   30117
2T    16769   11592    8731   39449   40333   40440
4T     6822   17552   12335   62263   59255   59455
8T    25900   24135   18693   66554   64566   65969
Results x 100000, 12345 indicates ERRORS
1T    44934   86735   99850   36770   79897   99759
2T    44934   86735   99850   36770   79897   99759
4T    44934   86735   99850   36770   79897   99759
8T    44934   86735   99850   36770   79897   99759

          Total Elapsed Time    1.8 seconds

Test 2 Power

ARM NEON-MFLOPS2-MP Benchmark 4A8 23-Apr-2023 14.17
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
KB     12.8     128   12800    12.8     128   12800
MFLOPS
1T     9327    9188    5500   28636   28474   29788
2T    18024   18392    8359   38319   39596   39531
4T    31653   20778   10451   61957   64741   61611
8T    24930   22931   18816   56111   59569   66356
Results x 100000, 12345 indicates ERRORS
1T    44934   86735   99850   36770   79897   99759
2T    44934   86735   99850   36770   79897   99759
4T    44934   86735   99850   36770   79897   99759
8T    44934   86735   99850   36770   79897   99759

          Total Elapsed Time    1.8 seconds

Test2/Test1 - Power/Battery

1T     1.06    0.99    1.00    0.97    1.01    0.99
2T     1.07    1.59    0.96    0.97    0.98    0.98
4T     4.64    1.18    0.85    1.00    1.09    1.04
8T     0.96    0.95    1.01    0.84    0.92    1.01

Test1/System2

1T     2.00    1.95    2.14    2.32    2.45    2.56
2T     3.60    1.71    3.02    1.78    1.90    1.86
4T     0.89    2.92    4.82    2.71    2.21    2.41
8T     3.55    3.42    7.80    1.88    2.04    2.21

Battery NEON/Normal MFLOPS

1T     0.58    0.66    0.80    0.92    0.93    0.99
2T     0.78    0.58    1.01    0.96    0.98    0.97
4T     0.18    0.69    0.86    0.97    0.88    0.92
8T     0.63    0.62    0.59    1.01    0.94    0.94


 

OpenGL Benchmark next or Go To Start


OpenGL Benchmark - JavaOpenGL1.apk

Necessary for early Android devices, the benchmark does not rely on complex visual scenes or mathematical functions. The objective being to generate moderate to excessive loading via multiple simple objects. It uses all Java code, with OpenGL ES GL10 statements, to measure graphics performance in Frames Per Second (FPS). Four tests draw a background of 50 cubes first as wireframes then colour shaded. The third test views the cubes in and out of a tunnel with slotted sides and roof, also containing rotating plates. The last test adds textures to the cubes and plates. The 50 cubes are redrawn 15, 30 and 60 times, with randomised positions, colours rotational settings. With 6 x 2 triangles per cube, minimum triangles per frame for the three sets of tests are 9000, 18000 and 36000.

Systems 1 and 3 do not appear to have the option to run with a faster refresh speed than 60 MHz. So maximum performance cannot be demonstrated. System 2 default is much higher, providing up to near 90 FPS, but 60 MHz refresh rate was set to enable comparisons. These still show significant superior performance. On the other hand, it should be borne in mind that System 2 has fewer than half the number of pixels to deal with.

  System 1 Android 11 2.05 GHz ARM Cortex-A76
     Graphics Mali-76 MC4, refresh 60 MHz

 Android Java OpenGL Benchmark 4A8 09-Feb-2023 10.56

           --------- Frames Per Second --------
 Triangles WireFrame   Shaded  Shaded+ Textured
 
   9000+      59.50    60.06    59.37    49.99
  18000+      44.03    44.23    38.75    30.12
  36000+      22.78    23.19    21.54    16.32

      Screen Pixels 1200 Wide 1928 High

      Total Elapsed Time  120.4 seconds


 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)
 Graphics 660 MHz Adreno 619, default refresh MHz

 Android Java OpenGL Benchmark 4A8 09-Feb-2023 11.29

           --------- Frames Per Second --------
 Triangles WireFrame   Shaded  Shaded+ Textured
 
   9000+      88.58    86.98    89.82    76.51
  18000+      63.02    63.01    55.57    45.03
  36000+      33.92    33.76    31.49    25.04

      Screen Pixels 1339 Wide 720 High

      Total Elapsed Time  120.5 seconds

 System 2 Android 12 2.0 GHz ARM Cortex-A75
 Graphics 660 MHz Adreno 619, refresh 60 MHz

 Android Java OpenGL Benchmark 4A8 09-Feb-2023 19.14

           --------- Frames Per Second --------
 Triangles WireFrame   Shaded  Shaded+ Textured
 
   9000+      50.43    47.05    53.55    56.48
  18000+      59.00    59.39    54.26    44.57
  36000+      33.35    33.50    31.14    25.02

      Screen Pixels 1339 Wide 720 High

      Total Elapsed Time  120.5 seconds



 System 3 Android 13 2.0 GHz ARM Cortex-A75
     Graphics Mali-62, refresh 60 MHz

 Android Java OpenGL Benchmark 4A8 09-Feb-2023 15.07

           --------- Frames Per Second --------
 Triangles WireFrame   Shaded  Shaded+ Textured
 
   9000+      37.88    59.82    54.16    41.19
  18000+      26.59    35.84    31.73    28.13
  36000+      16.46    20.42    19.35    15.65

      Screen Pixels 1200 Wide 1848 High

      Total Elapsed Time  120.6 seconds

 

Continued Below


Armv9 CPU Phone

All these results were exceptionally slow. Perhaps the old version of OpenGL ES used to produce the benchmark is no longer applicable.

 System 4 Android 13 1x 2.80 GHz Cortex-X2
          Graphics Xclipse 920

Power

Android Java OpenGL Benchmark 4A8 20-Apr-2023 21.02

           --------- Frames Per Second --------
Triangles WireFrame   Shaded  Shaded+ Textured

   9000+      24.12    24.24    14.92    16.06
  18000+       8.46     8.46     6.11     6.67
  36000+       2.53     2.47     2.07     2.32

      Screen Pixels 1080 Wide 2009 High

      Total Elapsed Time  121.9 seconds

Battery

Android Java OpenGL Benchmark 4A8 20-Apr-2023 21.05

           --------- Frames Per Second --------
Triangles WireFrame   Shaded  Shaded+ Textured

   9000+      24.01    24.20    14.83    15.71
  18000+       8.41     8.37     6.06     6.63
  36000+       2.52     2.45     2.06     2.31

      Screen Pixels 1080 Wide 2009 High

      Total Elapsed Time  122.1 seconds

Battery Later

Android Java OpenGL Benchmark 4A8 23-Apr-2023 14.49

           --------- Frames Per Second --------
Triangles WireFrame   Shaded  Shaded+ Textured

   9000+      33.77    31.61    18.93    18.33
  18000+       9.08     8.81     6.25     6.60
  36000+       2.53     2.46     2.07     2.32

      Screen Pixels 1080 Wide 2009 High

      Total Elapsed Time  121.9 seconds
  

Java Draw Benchmark next or Go To Start


Java Drawing Benchmark - JavaDraw.apk

This all Java benchmark uses small to rather excessive simple objects to measure drawing performance, again via Frames Per Second (FPS). Five 10 second tests draw on a background of continuously changing colour shades.

  • Test 1 loads a PNG file twice, the bitmaps moving for each frame, side to side or circling.
  • Plus Test 2 generates 2 SweepGradient multi-coloured circles moving around.
  • Plus Test 3 draws 200 random small circles in the middle of the screen.
  • Plus Test 4 draws 80 lines from the centre of each side to the opposite side, with changing colours.
  • Plus Test 5 draws the same small random circles as Test 3 but with 4000, filling the screen.

As with the OpenGL benchmark, these results depend on the available refresh rates and screen pixel content. In this case, System 2 was the only one allowed to run free of the imposition of VSYNC that limits the maximum refresh rate at 60 FPS. But, as shown, 60 FPS can be selected in SettingsSettings, showing that it was slower than System 1.

 System 1 Android 11 2.05 GHz ARM Cortex-A76
     Graphics Mali-76 MC4, refresh 60 MHz

 Android Java Drawing Benchmark 4A809-Feb-2023 11.04

 Test                            Frames     FPS

 Display PNG Bitmap Twice          599    59.88
 Plus 2 SweepGradient Circles      601    60.03
 Plus 200 Random Small Circles     601    60.03
 Plus 320 Long Lines               518    51.75
 Plus 4000 Random Small Circles    217    21.68

      Screen pixels 1200 Wide 1928 High

      Total Elapsed Time   50.1 seconds


System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)
   Graphics 660 MHz Adreno 619, default refresh MHz

 Android Java Drawing Benchmark 4A809-Feb-2023 11.25

 Test                            Frames     FPS

 Display PNG Bitmap Twice          879    87.81
 Plus 2 SweepGradient Circles      893    89.22
 Plus 200 Random Small Circles     844    84.37
 Plus 320 Long Lines               202    20.11
 Plus 4000 Random Small Circles    136    13.55

      Screen pixels 1339 Wide 720 High

      Total Elapsed Time   50.2 seconds


 System 2 Android 12 2.0 GHz ARM Cortex-A75
  Graphics 660 MHz Adreno 619, refresh 60 MHz

 Android Java Drawing Benchmark 4A809-Feb-2023 19.18

 Test                            Frames     FPS

 Display PNG Bitmap Twice          497    49.48
 Plus 2 SweepGradient Circles      476    47.47
 Plus 200 Random Small Circles     516    51.55
 Plus 320 Long Lines               209    20.85
 Plus 4000 Random Small Circles    139    13.90

      Screen pixels 1339 Wide 720 High

      Total Elapsed Time   50.2 seconds


  System 3 Android 13 2.0 GHz ARM Cortex-A75
    Graphics Mali-62, refresh 60 MHz

 Android Java Drawing Benchmark 4A809-Feb-2023 15.12

 Test                            Frames     FPS

 Display PNG Bitmap Twice          596    59.58
 Plus 2 SweepGradient Circles      600    59.98
 Plus 200 Random Small Circles     407    40.63
 Plus 320 Long Lines               106    10.54
 Plus 4000 Random Small Circles     74     7.33

      Screen pixels 1920 Wide 1128 High

      Total Elapsed Time   50.2 seconds
  

Continued Below


Armv9 CPU Phone

System 4 also does not impose VSYNC, with first results indicating that the graphics speed was around 33% faster than System 2, with the CPU speed dependent last tests up to 70% faster.

System 4 Android 13 2.80 GHz Cortex-X2
          Graphics Xclipse 920

Battery

Android Java Drawing Benchmark 4A830-Apr-2023 13.48

Test                            Frames     FPS

Display PNG Bitmap Twice         1187   118.61
Plus 2 SweepGradient Circles     1194   119.30
Plus 200 Random Small Circles    1162   116.19
Plus 320 Long Lines               343    34.21
Plus 4000 Random Small Circles    236    23.51

      Screen pixels 1080 Wide 2009 High

      Total Elapsed Time   50.1 seconds

  
Java Whetstone Benchmark next or Go To Start


Java Whetstone Benchmark - Java Whetstone.apk

Java performed quite well on both systems, at around half the speed of the optimised compiled C version above. Compared with System 3, some System 4 speeds were slower than expected by the MHz comparison. Then, as before, higher gains were observed on using such as COS and EXP functions.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

Android Java Whetstone Benchmark 4A8 02-Mar-2023 17.13

Test        MFLOPS    MOPS   millisecs    Results 

N1 float    620.56             0.031  -1.124750137
N2 float    571.43             0.235  -1.131330490
N3 if              1014.71     0.102   1.000000000
N4 fixpt           2881.98     0.109  12.000000000
N5 cos              139.13     0.598   0.499110132
N6 float    274.09             1.968   0.999999821
N7 equal            630.29     0.293   3.000000000
N8 exp               72.73     0.512   0.935364604

MWIPS      2598.66             3.848

Total Elapsed Time   13.5 seconds


 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

Android Java Whetstone Benchmark 4A8 02-Mar-2023 17.26

Test        MFLOPS    MOPS   millisecs    Results 

N1 float    605.30             0.032  -1.124750137
N2 float    559.53             0.240  -1.131330490
N3 if               993.28     0.104   1.000000000
N4 fixpt           2720.21     0.116  12.000000000
N5 cos              134.19     0.620   0.499110132
N6 float    270.51             1.994   0.999999821
N7 equal            405.80     0.455   3.000000000
N8 exp               68.38     0.544   0.935364604

MWIPS      2435.86             4.105

Total Elapsed Time   14.6 seconds


 System 3 Android 13 2.0 GHz ARM Cortex-A75

Android Java Whetstone Benchmark 4A8 02-Mar-2023 17.33

Test        MFLOPS    MOPS   millisecs    Results 

N1 float    385.54             0.050  -1.124750137
N2 float    359.17             0.374  -1.131330490
N3 if              1000.00     0.104   1.000000000
N4 fixpt           1913.73     0.165  12.000000000
N5 cos              125.02     0.666   0.499110132
N6 float    184.60             2.922   0.999999821
N7 equal            310.33     0.596   3.000000000
N8 exp               59.71     0.623   0.935364604

MWIPS      1818.81             5.498

 System 4 Android 13 1x 2.80 GHz Cortex-X2

 Battery

Android Java Whetstone Benchmark 4A8 30-Apr-2023 13.44

Test        MFLOPS    MOPS   millisecs    Results  System 4/System 2

N1 float    798.00             0.024  -1.124750137     1.32
N2 float    736.04             0.183  -1.131330490     1.32
N3 if              1352.94     0.077   1.000000000     1.36
N4 fixpt           4186.05     0.075  12.000000000     1.54
N5 cos              227.32     0.366   0.499110132     1.69
N6 float    367.44             1.468   0.999999821     1.36
N7 equal            835.44     0.221   3.000000000     2.06
N8 exp              101.20     0.368   0.935364604     1.48

MWIPS      3595.56             2.781                   1.48

Total Elapsed Time   15.8 seconds
  


Java Linpack Benchmark next or Go To Start


Java Linpack Benchmark - LinpackJava.apk

The Java version carries out double precision floating point calculations. Performancs is shown to be much slower than the C results, where the sumcheck values are the same as here, showing that it was executing identical arithmetic calculations. The System 4 speed was 2.5 times faster than the older System 2.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

Android Java Linpack Benchmark 4A8 03-Mar-2023 10.52

Speed              920.22 MFLOPS

norm. resid                1.67
resid            7.41628980e-14
machep           2.22044605e-16
x[0]-1          -1.49880108e-14
x[n-1]-1        -1.89848137e-14


 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

Android Java Linpack Benchmark 4A8 03-Mar-2023 10.49

Speed              884.88 MFLOPS

norm. resid                1.67
resid            7.41628980e-14
machep           2.22044605e-16
x[0]-1          -1.49880108e-14
x[n-1]-1        -1.89848137e-14


 System 3 Android 13 2.0 GHz ARM Cortex-A75

Android Java Linpack Benchmark 4A8 03-Mar-2023 10.56

Speed              645.24 MFLOPS

norm. resid                1.67
resid            7.41628980e-14
machep           2.22044605e-16
x[0]-1          -1.49880108e-14
x[n-1]-1        -1.89848137e-14

 System 4 Android 13 1x 2.80 GHz Cortex-X2

 Battery

Android Java Linpack Benchmark 4A8 30-Apr-2023 13.46

Speed             2346.11 MFLOPS

norm. resid                1.67
resid            7.41628980e-14
machep           2.22044605e-16
x[0]-1          -1.49880108e-14
x[n-1]-1        -1.89848137e-14

System 4/System 2 MFLOPS   2.65  
  

DriveSpeed Benchmark next or Go To Start


DriveSpeed Benchmarks - DriveSpd1.apk

DriveSpeed carries out four tests.

Test 1 - Write and read three 8 and 16 MB files; Results given in MBytes/second
Test 2 - Write three 8 MB files, read can be cached in RAM; Results given in MBytes/second
Test 3 - Random write and read 1 KB from 4 to 16 MB; Results are average time in milliseconds
Test 4 - Write and read 200 files 4 KB to 16 KB; Results in MB/sec, msecs/file and delete seconds.
Buttons - RunS SD Card Not used now, RunI Main Drive, More > Don't Delete, Read Only or Both and Save See below

As can be seen, there were wide variations on measured performance, making it difficult to declare a winner, but System 3 appears to have a greater number of lowest scores. Random reading speeds were too fast to register within the calculations used.

This was not run on System 4.

 System 1 Android 11 2.05 GHz ARM Cortex-A76

Android DriveSpeed1 Benchmark 4A8 05-Mar-2023 10.30
          Internal Drive Data Cached
           Compiled for 64 bit ARM v8a

                     MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8    1249.5 1264.4 1293.2 2927.6 2978.2 3162.4
  16    1272.8 1314.8 1335.7 2970.1 3168.8 3539.9
Cached
   8     871.2  455.3 1264.1 2847.8 3026.6 3206.2

Random      Write                Read
From MB     4      8     16      4      8     16
msecs    0.16   0.16   0.19   0.00   0.00   0.00

200 Files   Write                Read            Delete 
File KB     4      8     16      4      8     16   secs 
MB/sec  16.70  35.42  60.87 126.61 245.65 344.08  
msecs    0.25   0.23   0.27   0.03   0.03   0.05  0.027
No delete

          Total Elapsed Time   16.4 seconds

   Path Used /data/user/0/com.drivespeed/files/

Android DriveSpeed1 Benchmark 4A8 05-Mar-2023 10.37
          Internal Drive Read Only

                     MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8       0.0    0.0    0.0  420.3  396.7  420.9


 System 2 Android 12 2.0 GHz Snapdragon 750 (Cortex-A76)

Android DriveSpeed1 Benchmark 4A8 05-Mar-2023 10.43
          Internal Drive Data Cached
           Compiled for 64 bit ARM v8a

                     MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8    1661.5 1649.2 1831.3 1993.7 2369.6 2969.9
  16    1669.1 1530.7 1117.6 2125.8 2612.2 2167.9
Cached
   8    1070.1 1557.8 1790.7 2124.0 2607.2 3217.0

Random      Write                Read
From MB     4      8     16      4      8     16
msecs    0.22   0.43   0.47   0.00   0.00   0.00

200 Files   Write                Read            Delete 
File KB     4      8     16      4      8     16   secs 
MB/sec  44.73  83.50  70.39 388.90 455.49 435.65  
msecs    0.09   0.10   0.23   0.01   0.02   0.04  0.011
No delete

          Total Elapsed Time   16.3 seconds

   Path Used /data/user/0/com.drivespeed/files/


 Continued Below

 
System 2 Android DriveSpeed1 Benchmark 4A8 05-Mar-2023 10.45 Internal Drive Read Only MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 0.0 0.0 0.0 338.4 425.1 393.9 System 3 Android 13 2.0 GHz ARM Cortex-A75 Android DriveSpeed1 Benchmark 4A8 05-Mar-2023 10.51 Internal Drive Data Cached Compiled for 64 bit ARM v8a MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 849.8 1095.8 1478.3 2370.3 2270.8 2500.7 16 1519.2 1351.3 1234.3 1760.5 1853.6 1810.0 Cached 8 1612.1 1493.0 1262.3 2056.7 2007.7 1926.3 Random Write Read From MB 4 8 16 4 8 16 msecs 0.36 0.37 0.36 0.00 0.00 0.00 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 66.03 178.95 323.02 519.97 837.431283.88 msecs 0.06 0.05 0.05 0.01 0.01 0.01 0.006 No delete Total Elapsed Time 16.5 seconds Path Used /data/user/0/com.drivespeed/files/ Android DriveSpeed1 Benchmark 4A8 05-Mar-2023 10.59 Internal Drive Read Only MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 0.0 0.0 0.0 169.3 167.5 192.1
System 3 SD Card Option

Using RunS produces results on the latest versions of Android, but does not access the SD card. Following is an example of the start of a log after selecting this button without the SD card inserted and is the same with it in place. So, it is using a different file path on the internal drive. Writing speeds were much slower than via RunI but, on using the Read Only procedures, produced the same reading performance.

                     MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8      60.0   62.5   62.3 1211.5 1140.5 1106.5
  16      65.5   67.5   56.5 1256.2 1756.7 2147.9

  Path Used /storage/emulated/0/
   

CPU Stress Tests next or Go To Start


CPU Stress Tests - MP-FPU-Stress.apk, MP-Int-Stress.apk, CP_MHz2.apk

USE AT YOUR OWN RISK

There are two main stress test programs that can use multiple threads to exercise (presently) all CPU cores, one using floating point instructions, and the other carryinfg out integer arithmetic. Further detail is covered in the earlier report - android benchmarks.htm. The third program monitors MHz of up to 8 cores. Each of the stress test applications has five buttons:

RunB - Run Benchmark - Runs most combinations of number of threads, data sizes and calculations per data word for the FPU tests. This is mainly to help to decide which options to use for stress testing. The benchmark runs using fixed parameters, carrying out exactly the same number of calculations using all thread combinations and data sizes. The pass count changes according to the number of calculations per word, for the FPU tests.

RunS - Run Stress Tests - Default running time is 15 minutes, with the middle data size, intended for containment in L2 cache, using 8 threads. and 32 operations per word in the FPU tests.

False Errors - These can be caused if the run button is tapped again when the tests are running. The main unique symptoms are multiple “End Time” message displays.

SetS - Specify run time parameters for stress test - These are 1, 2, 4, 8, 16 or 32 threads, 2, 8 or 32 Operations per word for FPU tests, 12.8 or 16 KB, 128 or 160 KB, 12.8 or 16 MB for FPU or Integer tests, and running time in minutes.

Info - Test description and details - This is essentially the same as details provided here.

Save - This provides alternative methods to divert the logged output. Currently I select the Google Drive option, allowing me to access the files on my PCs.

Unexpected Faster Speed - Performance depends on whether the data comes from caches or RAM. Then, increasing the number of threads can lead to CPU cores using dedicated smaller and faster caches.

Sumchecks - The programs include sumchecks to show whether the correct arithmetic calculations were produced, as shown for the benchmark results. For integers, each test section uses a different data pattern for all words, checked by the program after manipulation. Floating point numeric results depend on the number of calculations carried out, constant for stress test reported time slots, easily verified manually.

CP_MHz2 measurements are instantaneous at a constant sampling rate, not averages over that time. The program has Set, Run and Save buttons, as above. Default running time is 15 minutes and sampling rate 10 seconds.

Later below are example results of Stress Test Benchmarks, followed by extended Reliability type Tests. Those for stress tests are from logs running default parameters, with 15 minutes running time. Some of the latter include only necessary detail. Examples of full output are as follows.

  ARM/Intel MP-Int Stress Test 4A8 09-Mar-2023 10.24.37
            Compiled for 64 bit ARM v8a

            Data                         Same All
 Seconds    Size Threads  MB/sec Sumcheck Threads

    8.7   160 KB     8     57397 00000000  Yes
   17.4   160 KB     8     56966 00000000  Yes


  ARM/Intel MP-FPU Stress Test 4A8 13-Mar-2023 11.59.35
            Compiled for 64 bit ARM v8a

            Data            Ops/          Nmeric
 Seconds    Size Threads    Word  MFLOPS Results

    9.4   128 KB       8      32   38431   35216
   18.6   128 KB       8      32   37721   35216
   

As seen via the CPU-Z utility app, core MHz values are shown to change at extremely rapid rates. Here, CP_MHz2.apk provides samples at a selected number of seconds rate, as representative and not average. Example output:

  MHz Measurement Test 4A8 13-Mar-2023 12.00.55
  Running time 15 minutes, 30 second samples

                       MHz for Core
  Secs     0     1     2     3     4     5     6     7

  0.00  1805  1478  1805  1805  1805  1805  1651  1651
 30.10  1805  1805  1805  1805  1805  1805  2035  2035
  


Integer Stress Test Benchmark Next or Go To Start


Integer Stress Test Benchmark

Measured performance was similar to earlier tests, such as MP-RandMem Serial Read, but show improved throughput using more than eight threads. Maximum single core Integer MOPS (Million Operations Per Second) would be around 2400 for System 1 and 3800 for System 2, particularly the latter suggesting SIMD activity.

The usual relative performance attributes are show to apply, with System 2 indicated as much faster, with cache based data, using 1 or 2 treads, then possibly slower at 4 and 8.

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

 ARM/Intel MP-Int Stress Test 4A8 07-Mar-2023 10.40.16
            Compiled for 64 bit ARM v8a

                 MB/second 
               KB    KB    MB            Same All
  Secs Thrds   16   160    16  Sumcheck   Tests

   1.8   1  14159 14594 13354  00000000    Yes
   1.2   2  21954 29948 13697  FFFFFFFF    Yes
   1.1   4  32124 32881 13805  5A5A5A5A    Yes
   1.0   8  41607 40944 14064  AAAAAAAA    Yes
   1.0  16  42412 44068 13862  CCCCCCCC    Yes
   0.8  32  42941 50142 20698  0F0F0F0F    Yes

            End Time 07-Mar-2023 10.40.31


 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)


  ARM/Intel MP-Int Stress Test 4A8 07-Mar-2023 10.44.17
            Compiled for 64 bit ARM v8a

                 MB/second 
               KB    KB    MB            Same All
  Secs Thrds   16   160    16  Sumcheck   Tests

   1.8   1  15333 14398 12557  00000000    Yes
   1.2   2  25656 25554 13615  FFFFFFFF    Yes
   1.2   4  29025 31166 13079  5A5A5A5A    Yes
   1.1   8  43667 40739 12317  AAAAAAAA    Yes
   1.0  16  39954 43161 13182  CCCCCCCC    Yes
   0.9  32  40849 42656 15047  0F0F0F0F    Yes

            End Time 07-Mar-2023 10.44.27


 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

  ARM/Intel MP-Int Stress Test 4A8 07-Mar-2023 10.48.58
            Compiled for 64 bit ARM v8a

                 MB/second 
               KB    KB    MB            Same All
  Secs Thrds   16   160    16  Sumcheck   Tests

   2.8   1  11252 11433  6011  00000000    Yes
   1.9   2  20286 16018  8505  FFFFFFFF    Yes
   1.7   4  24332 23788  8086  5A5A5A5A    Yes
   1.5   8  36755 33932  8156  AAAAAAAA    Yes
   1.4  16  37736 39228  8096  CCCCCCCC    Yes
   1.1  32  35649 36291 12974  0F0F0F0F    Yes

            End Time 07-Mar-2023 10.49.16
  


Continued Below


Armv9 CPU Phone

Performance gains over the older device continued to be similar to MP-RandMem Serial Read, except for RAM speed improvement that were more significant. This time, performance on power was significantly faster than using the battery.

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

 System 4 Battery

  ARM/Intel MP-Int Stress Test 4A8 23-Apr-2023 14.41.16
            Compiled for 64 bit ARM v8a

                  MB/second
               KB    KB    MB          Same All
  Secs Thrds   16   160    16 Sumcheck  Tests

   1.6   1  19675 16316 13029 00000000    Yes
   1.1   2  31241 28440 15894 FFFFFFFF    Yes
   0.9   4  46282 40016 16222 5A5A5A5A    Yes
   0.7   8  59097 56981 18473 AAAAAAAA    Yes
   0.5  16  63286 67726 30086 CCCCCCCC    Yes
   0.4  32  65657 64560 61397 0F0F0F0F    Yes

            End Time 23-Apr-2023 14.41.27

 System 4 Power

  ARM/Intel MP-Int Stress Test 4A8 20-Apr-2023 20.51.13
            Compiled for 64 bit ARM v8a

                  MB/second
               KB    KB    MB          Same All
 Secs Thrds    16   160    16 Sumcheck  Tests

   1.2   1  23224 20831 19265 00000000    Yes
   0.9   2  38975 37282 18468 FFFFFFFF    Yes
   0.5   4  62257 66630 40302 5A5A5A5A    Yes
   0.4   8  82663 90286 51540 AAAAAAAA    Yes
   0.3  16  88619 89234 72478 CCCCCCCC    Yes
   0.3  32  94039 86710 74422 0F0F0F0F    Yes

            End Time 20-Apr-2023 20.51.21

 System 4/System 2
 
         1   1.51  1.45  1.53
         2   1.52  1.46  1.36
         4   2.14  2.14  3.08
         8   1.89  2.22  4.18
        16   2.22  2.07  5.50
        32   2.30  2.03  4.95

 System 4 Power/Battery

         1   1.18  1.28  1.48
         2   1.25  1.31  1.16
         4   1.35  1.67  2.48
         8   1.40  1.58  2.79
        16   1.40  1.32  2.41
        32   1.43  1.34  1.21
  
Floating Point Stress Test Benchmark Next or Go To Start


Floating Point Stress Test Benchmark

This program uses the same C code as MP-MFLOPS, with the addition of tests using 8 floating point calculations per data word read/written. Performance was also similar, including variations with multithreaded activity, apparent in results from multiple runs.

Again, at 12.8 and 128 KB. System 2 was much faster using 1 or 2 threads, but not so at more than 2.


 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

  ARM/Intel MP-FPU Stress Test 4A8 07-Mar-2023 10.41.57
            Compiled for 64 bit ARM v8a

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
  Secs Thrd Word 12.8   128  12.8    12.8    128   12.8

   0.3   T1   2  9427  8174  3316   40392  76406  99700
   0.4   T2   2 12505  9288  2517   40392  76406  99700
   0.4   T4   2 11865 15337  2318   40392  76406  99700
   0.4   T8   2 14857 16797  2240   40392  76406  99700
   0.7   T1   8 12064 11755 11519   54760  85092  99819
   0.5   T2   8 22060 21418 10649   54760  85092  99819
   0.5   T4   8 26292 24186  9696   54760  85092  99819
   0.5   T8   8 26257 24723  8943   54760  85092  99819
   2.5   T1  32 12560 12096 11976   35218  66014  99520
   1.4   T2  32 20570 23527 22632   35218  66014  99520
   1.2   T4  32 25966 26414 25899   35218  66014  99520
   1.1   T8  32 28518 30202 28717   35218  66014  99520

            End Time 07-Mar-2023 10.42.09


 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)

  ARM/Intel MP-FPU Stress Test 4A8 07-Mar-2023 10.46.20
            Compiled for 64 bit ARM v8a

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
  Secs Thrd Word 12.8   128  12.8    12.8    128   12.8

   0.4   T1   2  7773  7983  2859   40392  76406  99700
   0.4   T2   2  8975  7726  2545   40392  76406  99700
   0.4   T4   2  8026  7542  2467   40392  76406  99700
   0.4   T8   2 13882 11752  2336   40392  76406  99700
   0.7   T1   8 11229 10090 11035   54760  85092  99819
   0.6   T2   8 15553 17641 10259   54760  85092  99819
   0.6   T4   8 18031 15945 10135   54760  85092  99819
   0.5   T8   8 21272 21474  9410   54760  85092  99819
   2.5   T1  32 11955 11956 12435   35218  66014  99520
   1.4   T2  32 22202 22806 22787   35218  66014  99520
   1.3   T4  32 23857 24021 25369   35218  66014  99520
   1.0   T8  32 28250 32201 28726   35218  66014  99520

            End Time 07-Mar-2023 10.46.33


 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55

  ARM/Intel MP-FPU Stress Test 4A8 07-Mar-2023 10.50.13
            Compiled for 64 bit ARM v8a

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
  Secs Thrd Word 12.8   128  12.8    12.8    128   12.8

   0.7   T1   2  5440  4195  1617   40392  76406  99700
   0.5   T2   2  9855 10851  1781   40392  76406  99700
   0.5   T4   2  8167  8485  1881   40392  76406  99700
   0.5   T8   2 12014 10806  1847   40392  76406  99700
   1.3   T1   8  6384  6381  5647   54760  85092  99819
   0.8   T2   8 12496 12140  6674   54760  85092  99819
   0.8   T4   8 12311 11922  7397   54760  85092  99819
   0.6   T8   8 17907 17982  7476   54760  85092  99819
   4.5   T1  32  6903  6912  6866   35218  66014  99520
   2.2   T2  32 13696 13797 13740   35218  66014  99520
   2.0   T4  32 13620 16951 16788   35218  66014  99520
   1.4   T8  32 21211 21290 22181   35218  66014  99520

            End Time 07-Mar-2023 10.50.32
  

Continued Below


Armv9 CPU Phone

Battery/Power performance comparisons indicated wide variances but overall elapsed time was much longer on power, unlike the integer stress testing benchmark where is was somewhat shorter.

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710


 System 4 Battery

  ARM/Intel MP-FPU Stress Test 4A8 23-Apr-2023 14.45.52
            Compiled for 64 bit ARM v8a

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
  Secs Thrd Word 12.8   128  12.8    12.8    128   12.8

   0.2   T1   2 15743 13802  6168   40392  76406  99700
   0.1   T2   2 23790 22564  8635   40392  76406  99700
   0.1   T4   2 31487 16944 11190   40392  76406  99700
   0.1   T8   2 29239 16754 14704   40392  76406  99700
   0.5   T1   8 17614 16465 14614   54760  85092  99819
   0.4   T2   8 23473 21702 13270   54760  85092  99819
   0.4   T4   8 28836 22915 14793   54760  85092  99819
   0.3   T8   8 35877 33822 26051   54760  85092  99819
   1.7   T1  32 14379 21304 22032   35218  66014  99520
   1.1   T2  32 24714 27766 30000   35218  66014  99520
   0.7   T4  32 44493 37534 46516   35218  66014  99520
   0.7   T8  32 40943 39881 52404   35218  66014  99520

            End Time 23-Apr-2023 14.46.02

 System 4 Power

  ARM/Intel MP-FPU Stress Test 4A8 20-Apr-2023 20.49.55
            Compiled for 64 bit ARM v8a

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
  Secs Thrd Word 12.8   128  12.8    12.8    128   12.8

   0.2   T1   2 13959 13834  5427   40392  76406  99700
   0.1   T2   2 21365 24557  9061   40392  76406  99700
   0.1   T4   2 21907 21840 12173   40392  76406  99700
   0.1   T8   2 18322 31692 12821   40392  76406  99700
   0.5   T1   8 17088 17742 16266   54760  85092  99819
   0.4   T2   8 23468 22740 13810   54760  85092  99819
   0.4   T4   8 31470 24004 14281   54760  85092  99819
   0.3   T8   8 28966 26081 23677   54760  85092  99819
   1.7   T1  32 14975 20595 21972   35218  66014  99520
   1.2   T2  32 24720 26515 28342   35218  66014  99520
   0.8   T4  32 45125 33106 45770   35218  66014  99520
   0.7   T8  32 49057 37660 46982   35218  66014  99520

            End Time 20-Apr-2023 20.50.18


 System 4/System 2
        T1    2  1.80  1.73  1.90
        T2    2  2.38  3.18  3.56
        T4    2  2.73  2.90  4.93
        T8    2  1.32  2.70  5.49
        T1    8  1.52  1.76  1.47
        T2    8  1.51  1.29  1.35
        T4    8  1.75  1.51  1.41
        T8    8  1.36  1.21  2.52
        T1   32  1.25  1.72  1.77
        T2   32  1.11  1.16  1.24
        T4   32  1.89  1.38  1.80
        T8   32  1.74  1.17  1.64

 System 4 Battery/Power
        T1    2  1.13  1.00  1.14
        T2    2  1.11  0.92  0.95
        T4    2  1.44  0.78  0.92
        T8    2  1.60  0.53  1.15
        T1    8  1.03  0.93  0.90
        T2    8  1.00  0.95  0.96
        T4    8  0.92  0.95  1.04
        T8    8  1.24  1.30  1.10
        T1   32  0.96  1.03  1.00
        T2   32  1.00  1.05  1.06
        T4   32  0.99  1.13  1.02
        T8   32  0.83  1.06  1.12
 
Integer Stress Tests Next Page or Go To Start


Integer Stress Tests

Following are results from 15 minute tests at 160 KB and 8 threads. MHz samples were at 30 second intervals, with average measured MB/second over the same time slot. System 1 tests were run with power on and using the battery, starting with 5% available charge, without a major reduction in performance.

In all cases, CPU MHz of each of the six LITTLE CPU cores was essentially constant, performance degradation being imposed by MHz reductions on the two main cores. Performance of System 2 was better than System 1, in spite of LITTLE CPU cores running at lower MHz. This is probably caused by the latter being produced by a later fabrication level. As expected, the older technology based System 3 was the slowest.

 System              1 Power   1 Battery   2 Power    3 Power        
 Mean MB/second        48110      48088      54838      39839
 Usual Slow CPU MHz     2000       2000       1805       2002

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55
                                     MHz for Core
    Secs MB/sec      0      1      2      3      4      5      6      7 Average

       0  52349
      30  51608   2000   2000   2000   2000   2000   2000   2050   2050    2013
      60  48982   2000   2000   2000   2000   2000   2000   1796   1796    1949
      90  46641   1275    875   1275   1175   1375   1275   1986   1986    1403
     120  50087   2000   2000   2000   2000   2000   1800   1308   1308    1802
     150  49026   2000   2000   2000   2000   2000   2000   1530   1530    1883
     180  46743   2000   2000   2000   2000   2000   2000   1530   1419    1869
     210  48994   2000   2000   2000   2000   2000   2000   1733   1733    1933
     240  49110   2000   2000   2000   2000   2000   2000   1530   1530    1883
     270  48631   2000   2000   2000   2000   2000   2000   1419   1419    1855
     300  48052   2000   2000   2000   2000   2000   2000   1530   1530    1883
     330  48752   2000   2000   2000   2000   2000   2000   1530   1308    1855
     360  47384   2000   2000   2000   2000   2000   2000   1419   1530    1869
     390  48812   2000   2000   2000   2000   2000   2000   1530   1419    1869
     420  47352   2000   2000   2000   2000   2000   2000   1530   1530    1883
     450  46944   2000   2000   2000   2000   2000   2000   1419   1419    1855
     480  47086   2000   2000   2000   2000   2000   2000   1419   1419    1855
     510  47789   2000   2000   2000   2000   2000   2000   1419   1419    1855
     540  47799   2000   2000   2000   2000   2000   2000   1169   1308    1810
     570  46693   2000   2000   2000   2000   2000   2000   1308   1419    1841
     600  49389   2000   2000   2000   2000   2000   2000   1419   1308    1841
     630  48092   2000   2000   2000   2000   2000   2000   1419   1308    1841
     660  47454   2000   2000   2000   2000   2000   2000   1419   1419    1855
     690  46836   2000   2000   2000   2000   2000   2000   1530   1530    1883
     720  47261   2000   2000   2000   2000   2000   2000   1308   1419    1841
     750  47122   2000   2000   2000   2000   2000   2000   1419   1419    1855
     780  47362   2000   2000   2000   2000   2000   2000   1169   1419    1824
     810  48045   2000   2000   2000   2000   2000   2000   1419   1419    1855
     840  46429   1175   1933   2000   2000   2000   2000   1530   1419    1757
     870  46835   2000   2000   2000   2000   2000   2000   1419   1308    1841
     900  47738   1866   1866   1866   1866   2000   2000   1419   1530    1802


 System 1 Battery - Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55
       0  53347
      30  52694   2000   2000   2000   2000   2000   2000   1923   2050    1997
      60  48780   2000   2000   2000   2000   2000   2000   1733   1733    1933
      90  49702   2000   2000   2000   2000   2000   2000   1670   1530    1900
     120  49449   2000   2000   2000   2000   2000   2000   1530   1670    1900
     150  49864   1075   1375   1375   1375   1375   1075   1986   1419    1382
     180  49477   2000   2000   2000   2000   2000   2000   1530   1530    1883
     210  47739   2000   2000   2000   2000   2000   2000   1530   1530    1883
     240  47961   2000   2000   2000   2000   2000   2000   1530   1530    1883
     270  46765   2000   2000   2000   2000   2000   2000   1419   1419    1855
     300  48323   2000   2000   2000   2000   2000   2000   1670   1419    1886
     330  46877   2000   2000   2000   2000   2000   2000    919    919    1730
     360  48398   2000   2000   2000   2000   2000   2000   1670   1670    1918
     390  47699   2000   2000   2000   2000   2000   2000   1419   1419    1855
     420  46764   2000   2000   2000   2000   2000   2000   1419   1419    1855
     450  48355   2000   2000   2000   2000   2000   2000   1308   1419    1841
     480  46643   2000   2000   2000   2000   2000   2000   1419   1419    1855
     510  47094   1933   1933   1933   1933   1933   1933   1308   1085    1749
     540  47462   2000   2000   2000   2000   2000   2000   1419   1419    1855
     570  47156   2000   2000   2000   2000   2000   2000   1530   1530    1883
     600  47482   2000   2000   2000   2000   2000   2000   1419   1419    1855
     630  47205   2000   2000   2000   2000   2000   2000   1419   1419    1855
     660  46806   2000   2000   2000   2000   2000   2000   1419   1419    1855
     690  47632   2000   2000   2000   2000   2000   2000   1419   1419    1855
     720  45909   1800   1800   1800   1800   1800   1800   1419   1419    1705
     750  45615   1866   1866   1866   1866   1866   1866   1085   1419    1713
     780  47168   1866   1866   1866   1866   1866   1866   1419   1085    1713
     810  26772   2000   2000   2000   2000   2000   2000    774    774    1694
     840  46179   2000   2000   2000   2000   2000   2000   1419   1419    1855
     870  46743   1933   1933   1933   1933   1933   1933   1308   1419    1791
     900  45630   1933   1933   1933   1933   1933   1933   1419   1419    1805

                   Integer Stress Tests continued Below or  Go To Start
 

Integer Stress Tests Continued

System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55) MHz for Core Secs MB/sec 0 1 2 3 4 5 6 7 Average 0 57397 30 56976 1805 1805 1805 1805 1805 1805 2035 2035 1863 60 56325 1805 1805 1805 1805 1805 1805 2035 2035 1863 90 56726 1805 1805 1805 1805 1805 1805 2035 2035 1863 120 56830 1805 1805 1805 1805 1805 1805 2035 2035 1863 150 56265 1805 1805 1805 1805 1805 1805 2035 2035 1863 180 56821 1805 1805 1805 1805 1805 1805 2035 2035 1863 210 56761 1805 1805 1805 1805 1805 1805 2035 2035 1863 240 56769 1805 1805 1805 1805 1805 1805 2035 2035 1863 270 56569 1805 1805 1805 1805 1805 1805 2035 2035 1863 300 56707 1805 1805 1805 1805 1805 1805 2035 2035 1863 330 56857 1805 1805 1805 1805 1805 1805 2035 2035 1863 360 56524 1805 1805 1805 1805 1805 1805 2035 2035 1863 390 56576 1805 1805 1805 1805 1805 1805 2035 2035 1863 420 56923 1805 1805 1805 1805 1805 1805 2035 2035 1863 450 56738 1805 1805 1805 1805 1805 1805 2035 2035 1863 480 56887 1805 1805 1805 1805 1805 1805 2035 2035 1863 510 55698 1805 1805 1805 1805 1805 1805 2035 2035 1863 540 56602 1805 1805 1805 1805 1805 1805 2035 2035 1863 570 56645 1805 1805 1805 1805 1805 1805 2035 2035 1863 600 56850 1805 1805 1805 1805 1805 1805 2035 2035 1863 630 56741 1805 1805 1805 1805 1805 1805 2035 2035 1863 660 56755 1805 1805 1805 1805 1805 1805 2035 2035 1863 690 56257 1805 1805 1805 1805 1805 1805 2035 2035 1863 720 55140 1805 1805 1805 1805 1805 1805 2035 2035 1863 750 56556 1805 1805 1805 1805 1805 1805 2035 2035 1863 780 56802 1805 1805 1805 1805 1805 1805 2035 2035 1863 810 56824 1805 1805 1805 1805 1805 1805 2035 2035 1863 840 30514 1805 1805 1805 1805 1805 1805 2035 2035 1863 870 33652 1709 1709 1709 1709 1709 1709 1805 1805 1733 900 50302 1709 1709 1709 1709 1709 1709 1805 1805 1733 System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55 MHz for Core Secs MB/sec 0 1 2 3 4 5 6 7 Average 0 44416 30 44323 2002 2002 2002 2002 2002 2002 2002 2002 2002 60 43513 2002 2002 2002 2002 2002 2002 1872 1872 1970 90 43487 2002 2002 2002 2002 2002 2002 1536 1536 1886 120 43751 2002 2002 2002 2002 2002 2002 1742 1742 1937 150 43154 2002 2002 2002 2002 2002 2002 1229 1229 1809 180 42516 2002 2002 2002 2002 2002 2002 1536 1536 1886 210 42549 2002 2002 2002 2002 2002 2002 1482 1482 1872 240 42621 2002 2002 2002 2002 2002 2002 1229 1229 1809 270 40041 2002 2002 2002 2002 2002 2002 1742 1742 1937 300 42976 2002 2002 2002 2002 2002 2002 1229 1229 1809 330 39246 2002 2002 2002 2002 2002 2002 2002 2002 2002 360 40390 2002 2002 2002 2002 2002 2002 1536 1536 1886 390 38808 2002 2002 2002 2002 2002 2002 2002 2002 2002 420 38806 2002 2002 2002 2002 2002 2002 1536 1536 1886 450 39480 2002 2002 2002 2002 2002 2002 2002 2002 2002 480 38574 2002 2002 2002 2002 2002 2002 2002 2002 2002 510 38316 2002 2002 2002 2002 2002 2002 2002 2002 2002 540 38770 2002 2002 2002 2002 2002 2002 1872 1872 1970 570 38459 2002 2002 2002 2002 2002 2002 1229 1872 1889 600 37892 2002 2002 2002 2002 2002 2002 2002 2002 2002 630 38772 2002 2002 2002 2002 2002 2002 1536 1536 1886 660 39099 2002 2002 2002 2002 2002 2002 2002 1229 1905 690 38011 2002 2002 2002 2002 2002 2002 1229 1536 1847 720 39059 2002 2002 2002 2002 2002 2002 2002 1742 1970 750 39290 2002 2002 2002 2002 2002 2002 2002 1742 1970 780 38913 2002 2002 2002 2002 2002 2002 1742 1742 1937 810 39524 2002 2002 2002 2002 2002 2002 1872 1872 1970 840 37500 2002 2002 2002 2002 2002 2002 1229 1229 1809 870 24380 2002 2002 2002 2002 2002 2002 1872 1536 1928 900 38368 2002 2002 2002 2002 2002 2002 1742 1742 1937

Continued Below


Armv9 CPU Phone

At least on this particular hardware and software, the MHz measuring program would not run properly in the background. An example is provided below where recoding stopped when the stress test started execution. Because of this, the table only provides performance measurements using 8, 4, 2 and 1 threads. In this case, they were executed in that order.

Timeout variance refernceother results

 System 4 Android 13 1x 2.80 GHz Cortex-X2, 
 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

 Threads    8       4       2       1 
        Battery Battery Bat+Pow   Power
         30-Apr  30-Apr  30-Apr  30-Apr
 Start    15:00   15:17   15:44   16:06
 End      15.17   15.44   16.06   16.32

   Secs  MB/sec  MB/sec  MB/sec  MB/sec

     10  133083   96175   44760   16160
     30  119760   88868   47865   15773
     60  111445   82186   47151   15757
     90  111613   82591   43305   15771
    120  109574   81741   43289   15977
    150  109483   74503   44553   15769
    180  108523   80390   41614   15768
    210  106909   79071   43289   15770
    240  107657   76151   43296   15768
    270  104187   66732   41341   15731
    300  104027   73007   40234   15765
    330                   40548   15985
    360         Timeout   42721   15770
    390 Timeout   69770   39264   15766
    420           61693   38915   15991
    450           63592   41352   15768
    480           63941   40039   15770
    510  111579   62500   39279   15761
    540  111350   62786   39488   15769
    570  109626   62670   33665   15768
    600  108377   62609   39265   15769
    630  106509   62758   37640   15771
    660  106738   62372   38942   15721
    690  105756   62816   37879   16274
    720   90875   62794   38051   15769
    750   87526   62403   36682   15771
    780   89403   62037   37333   15708
    810   91222   62149   35351   15746
    840   90148   62758   35344   15765
    870   90497   62562   37108   15765
    900   88864   62803   33745   15769

Start S  133083   96175   44760   16160
End   E   88864   62803   33745   15769
%E/S         67      65      75      98
Benchmk   90286   66630   37282   20831


  MHz Measurement Test 4A8 30-Apr-2023 15.01.35
  Running time 15 minutes, 30 second samples

                        MHz for Core
   Secs     0     1     2     3     4     5     6     7
   0.00   960   960  1152   960  1920  1632  1152  1344
  30.09  1728  1728  1728  1728  2112  1824  1824  2304
  60.32  1440  1728  1728  1728  1824  1824  1824   960
  90.53  1728  1728  1728  1728  2016  1728  1728  2208
 821.61  1344  1344  1344  1344  1536  1536  1536  1536
1277.15  1152  1056  1056  1056  2515  2515  2400  2400

            End Time 30-Apr-2023 15.23.14
 

Floating Point Stress Tests Next Page or Go To Start


Floating Point Stress Tests

These were also run for 15 minutes using 8 threads, but with 128 KB data. The testing arrangements were as used for the integer exercise. Performance is measured in MFLOPS. The significant observation here is that System 2 performed relatively better than the integer stress tests by demonstrating all cores running at maximum MHz throughout the 15 minute test.

 System              1 Power   2 Power    3 Power        
 Mean  MFLOPS          31603     37395      22990
 Usual Slow CPU MHz     2000      1805       2002

 System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55

                                   MHz for Core
    Secs MFLOPS      0      1      2      3      4      5      6      7 Average

       0  34841
      30  32620   2000   2000   2000   2000   2000   2000   2050   2050    2013
      60  32965   2000   2000   2000   2000   2000   2000   1796   1796    1949
      90  32142   2000   2000   2000   2000   2000   2000   1733   1733    1933
     120  31115   2000   2000   2000   2000   2000   2000   1733   1733    1933
     150  31404   2000   2000   2000   2000   2000   2000   1670   1670    1918
     180  32130   2000   2000   2000   2000   2000   2000   1530   1796    1916
     210  31275   2000   2000   2000   2000   2000   2000   1670   1530    1900
     240  31024   2000   2000   2000   2000   2000   2000   1796   1796    1949
     270  31986   2000   2000   2000   2000   2000   2000   1670   1670    1918
     300  32255   2000   2000   2000   2000   2000   2000   1530   1530    1883
     330  32591   2000   2000   2000   2000   2000   2000   1530   1733    1908
     360  31627   2000   2000   2000   2000   2000   2000   1419   1670    1886
     390  31064   2000   2000   2000   2000   2000   2000   1530   1530    1883
     420  32626   2000   2000   2000   2000   2000   2000   1530   1530    1883
     450  31898   2000   2000   2000   2000   2000   2000   1530   1530    1883
     480  30940   1866   1933   2000   2000   2000   2000   1530   1530    1857
     510  31994   2000   2000   2000   2000   2000   2000   1860   1419    1910
     540  31563   2000   2000   2000   2000   2000   1933   1419   1419    1846
     570  30872   2000   2000   2000   2000   2000   2000   1733   1169    1863
     600  31143   2000   2000   2000   2000   2000   2000   1670   1670    1918
     630  31670   2000   2000   2000   2000   2000   2000   1419   1419    1855
     660  31703   2000   2000   2000   2000   2000   2000   1530   1530    1883
     690  30936   1866   1800   1800   1800   1800   1800   1670   1670    1776
     720  30664   2000   2000   2000   2000   2000   2000   1530   1530    1883
     750  31153   2000   2000   2000   2000   2000   2000   1530   1530    1883
     780  30367   1933   1933   2000   2000   2000   2000   1670   1308    1856
     810  30412   2000   2000   2000   2000   2000   2000   1733   1733    1933
     840  30837   2000   2000   2000   2000   2000   2000   1530   1530    1883
     870  30699   2000   2000   2000   2000   2000   2000   1419   1308    1841
     900  31165   2000   2000   2000   2000   2000   2000   1530   1530    1883
 
 System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)

                                  MHz for Core
    Secs MFLOPS      0      1      2      3      4      5      6      7 Average

       0  38431
      30  37700   1805   1805   1805   1805   1805   1805   2035   2035    1863
      60  37537   1805   1805   1805   1805   1805   1805   2035   2035    1863
      90  37643   1805   1805   1805   1805   1805   1805   2035   2035    1863
     120  37777   1805   1805   1805   1805   1805   1805   2035   2035    1863
     150  37524   1805   1805   1805   1805   1805   1805   2035   2035    1863
     180  37956   1805   1805   1805   1805   1805   1805   2035   2035    1863
     210  32704   1805   1805   1805   1805   1805   1805   2035   2035    1863
     240  37343   1805   1805   1805   1805   1805   1805   2035   2035    1863
     270  35775   1805   1805   1805   1805   1805   1805   2035   2035    1863
     300  37173   1805   1805   1805   1805   1805   1805   2035   2035    1863
     330  37469   1805   1805   1805   1805   1805   1805   2035   2035    1863
     360  37749   1805   1805   1805   1805   1805   1805   2035   2035    1863
     390  37643   1805   1805   1805   1805   1805   1805   2035   2035    1863
     420  37404   1805   1805   1805   1805   1805   1805   2035   2035    1863
     450  37339   1805   1805   1805   1805   1805   1805   2035   2035    1863
     480  37850   1805   1805   1805   1805   1805   1805   2035   2035    1863
     510  36378   1805   1805   1805   1805   1805   1805   2035   2035    1863
     540  37348   1805   1805   1805   1805   1805   1805   2035   2035    1863
     570  37537   1805   1805   1805   1805   1805   1805   2035   2035    1863
     600  37885   1805   1805   1805   1805   1805   1805   2035   2035    1863
     630  37787   1805   1805   1805   1805   1805   1805   2035   2035    1863
     660  37526   1805   1805   1805   1805   1805   1805   2035   2035    1863
     690  37721   1805   1805   1805   1805   1805   1805   2035   2035    1863
     720  37841   1805   1805   1805   1805   1805   1805   2035   2035    1863
     750  37871   1805   1805   1805   1805   1805   1805   2035   2035    1863
     780  37513   1805   1805   1805   1805   1805   1805   2035   2035    1863
     810  37863   1805   1805   1805   1805   1805   1805   2035   2035    1863
     840  37711   1805   1805   1805   1805   1805   1805   2035   2035    1863
     870  37709   1805   1805   1805   1805   1805   1805   2035   2035    1863
     900  37528   1805   1805   1805   1805   1805   1805   2035   2035    1863

               Floating Point Stress Tests continued Below or  Go To Start
 

Floating Point Stress Tests Continued

System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55 MHz for Core Secs MFLOPS 0 1 2 3 4 5 6 7 Average 0 24716 30 24173 2002 2002 2002 2002 2002 2002 2002 2002 2002 60 23879 2002 2002 2002 2002 2002 2002 2002 2002 2002 90 24361 2002 2002 2002 2002 2002 2002 1742 1742 1937 120 24068 2002 2002 2002 2002 2002 2002 1872 1872 1970 150 23441 2002 2002 2002 2002 2002 2002 2002 2002 2002 180 23664 2002 2002 2002 2002 2002 2002 1872 1872 1970 210 23991 2002 2002 2002 2002 2002 2002 1536 1536 1886 240 23745 2002 2002 2002 2002 2002 2002 2002 2002 2002 270 23953 2002 2002 2002 2002 2002 2002 2002 2002 2002 300 23268 2002 2002 2002 2002 2002 2002 1872 1872 1970 330 23559 2002 2002 2002 2002 2002 2002 2002 1872 1986 360 23203 2002 2002 2002 2002 2002 2002 1536 1536 1886 390 23776 2002 2002 2002 2002 2002 2002 1482 1482 1872 420 22230 2002 2002 2002 2002 2002 2002 1536 1536 1886 450 23387 2002 2002 2002 2002 2002 2002 2002 2002 2002 480 23495 2002 2002 2002 2002 2002 2002 1742 1742 1937 510 23657 2002 2002 2002 2002 2002 2002 2002 2002 2002 540 23402 2002 2002 2002 2002 2002 2002 2002 2002 2002 570 21686 1820 1820 1820 1820 2002 2002 1872 1872 1879 600 23454 2002 2002 2002 2002 2002 2002 1872 1872 1970 630 22161 2002 2002 2002 2002 2002 2002 1742 1742 1937 660 20981 2002 2002 2002 2002 2002 2002 1536 2002 1944 690 21042 2002 2002 2002 2002 2002 2002 2002 2002 2002 720 22550 2002 2002 2002 2002 2002 2002 2002 2002 2002 750 22236 2002 2002 2002 2002 2002 2002 2002 2002 2002 780 22007 2002 2002 2002 2002 2002 2002 2002 2002 2002 810 20619 2002 2002 2002 2002 2002 2002 1229 1229 1809 840 21725 2002 2002 2002 2002 2002 2002 1229 1742 1873 870 21958 2002 2002 2002 2002 2002 2002 1536 1536 1886 900 22304 2002 2002 2002 2002 2002 2002 1742 1742 1937
Continued Below


Armv9 CPU Phone



 System 4 Android 13 1x 2.80 GHz Cortex-X2, 
 4x 1.82 GHz Cortex A510, 3x 2.52 GHz Cortex A710

Threads    8       8       4       2       1
        Battery   Power Battery Battery Battery
         27-Apr  27-Apr  30-Apr  30-Apr  30-Apr
Start     20:35   20:50   14:06   14:22   14:40
End       20:50   21:09   14:22   14:40   14:57
   Secs  MFLOPS  MFLOPS  MFLOPS  MFLOPS  MFLOPS

  Start   84416   75701   66146   40172   18014
     30   78275   72473   62252   40037   18003
     60   77460   61675   61739 Timeout   18000
     90   76556   65468   60870   41500   18007
    120   75133   62711   60758   38685   18011
    150   74824   62759   60320   39085   18002
    180   74002   62159   60111   38975   18017
    210   71878   58489   59853   38780   18014
    240   73117   58442   59472   38367   18006
    270   72064   55940   58854   38418   18005
    300   72885   53904   35216   37431   18002
    330   71437   55761   58663   36239   18015
    360   71531   54161   57187   36538
    390   70866   53668   54066   35590
    420   70526   53834   51857   35860
    450   69574   53701   55682   35227
    480   62070   53907   50873 Timeout Timeout
    510   62157   53930   52357   34290
    540   59206   53534   51482   34310
    570   57785   53970   49558   35564
    600   56564   53967 Timeout   36059
    630   59496   68216   59774   36938
    660   55328   53941   47969   35854   31675
    690   55826   59331   52595   34714   31642
    720   56265   57811   57567   36331   30553
    750   53968   58897   49164   36803   25729
    780   56221   55074   59303   34276   22781
    810   54436   56509   49458   34620   22491
    840   55442   56757   58579   35851   22494
    870   53653   53610   51860   35835   22493
    900   54026   52228   50180   34358   22486

Start S   84416   75701   66146   40172   18014
End   E   54026   52228   50180   34358   22486
%E/S         64      69      76      86     125
 
More Integer Stress Tests Next Page or Go To Start


More Integer Stress Tests

Following are summary results from 15 minute tests at 160 KB using 1, 2, 4 and 8 threads, comparing changes in average, maximum and typical minimum multiprocessing gains (typical to exclude the odd exceptionally slow result).

Main observations are that average performance can reduces following extended running time and MP gains can be nowhere near being proportional to the number core CPU cores used. For example, using 8 cores might lead to a three times improvement over that from a single core and with less that four times apparently inevitable.

System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55
System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)
System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55
System 4 Android 13 1 x 2.80 GHz Cortex-X2, 4 x 1.82 GHz Cortex A510, 3 x 2.52 GHz Cortex A710  

 System          1             2             3             4
 Threads         MB/sec   Gain MB/sec   Gain MB/sec   Gain MB/sec   Gain

   1    Best      14594    1.0  14398    1.0  11433    1.0  20831    1.0

   2    Minimum   23529    1.6  30435    2.1  20842    1.8  33665    1.6
        Average   25460    1.7  30707    2.1  21712    1.9  40107    1.9
        Maximum   29863    2.0  30833    2.1  22919    2.0  47865    2.3

   4    Minimum   30093    2.1  30379    2.1  23305    2.0  61693    3.0
        Average   34008    2.3  35550    2.5  28169    2.5  69532    3.3
        Maximum   40437    2.8  36440    2.5  29441    2.6  96175    4.6

   8    Minimum   44260    3.0  50302    3.5  36674    3.2  87526    4.2
        Average   48066    3.3  55361    3.8  39996    3.5 104589    5.0
        Maximum   53708    3.7  57397    4.0  44521    3.9 133083    6.4

 

More Floating Point Stress Tests Next Page or Go To Start


More Floating Point Stress Tests

These were run using the same profile as the integer stress tests, where MP gains were even worse. Here, running a program that uses all eight CPU cores can be shown to be only three times faster than using a single core.


System 1 Android 11 2 x 2.05 GHz ARM Cortex-A76 and 6 x 2.0 GHz ARM Cortex-A55
System 2 Android 12 2.0 GHz Snapdragon 750 (2 x 2.0 GHz Cortex-A76 and 6 x 1.8 GHz Cortex-A55)
System 3 Android 13 2 x 2.0 GHz ARM Cortex-A75 and 6 x 2.0 GHz Cortex-A55
System 4 Android 13 1 x 2.80 GHz Cortex-X2, 4 x 1.82 GHz Cortex A510, 3 x 2.52 GHz Cortex A710

 System          1             2             3             4
 Threads         MFLOPS   Gain MFLOPS   Gain MFLOPS   Gain MFLOPS   Gain

   1    Best      12096    1.0  12413    1.0   6917    1.0  31675    1.0

   2    Minimum   22221    1.8  24629    2.0  13358    1.9  34276    1.1
        Average   23468    1.9  24896    2.0  13821    2.0  36783    1.2
        Maximum   24427    2.0  24990    2.0  13830    2.0  41500    1.3

   4    Minimum   21944    1.8  26128    2.1  16433    2.4  35216    1.1
        Average   25164    2.1  27510    2.2  16859    2.4  55459    1.8
        Maximum   28083    2.3  27807    2.2  17087    2.5  66146    2.1

   8    Minimum   29787    2.5  35775    2.9  20619    3.0  53653    1.7
        Average   31555    2.6  37249    3.0  22881    3.3  65709    2.1
        Maximum   34876    2.9  38431    3.1  24716    3.6  84416    2.7
  
Go To Start