Roy Longbottom at Linkedin Android, Linux and Windows FFT Benchmarks
For Intel and ARM CPUs

Contents


General Comparison Linux/Windows/Intel/AMD
Android/ARM/Intel + Windows 10 Linux/ARM
Numeric Checks MFLOPS and MHz

Benchmarks Available

A series of Fast Fourier Transform benchmarks have been produced from the same C language code, to measure performance of Intel based PCs using Windows and Linux, then Intel and ARM Android devices, plus ARM CPUs using Linux (for Raspberry Pi). The Android apps, downloaded from the following buttons, automatically select code for Intel or ARM (or MIPS) processors, and 64 or 32 bit operation. Other benchmark execution programs and source codes are available in http://www.roylongbottom.org.uk/FFT Benchmarks.zip. Compilers used were gcc 4.8 for Linux, Android and Raspberry Pi and C/C++ Version 18 for Windows. The benchmarks for the latter would not run on older systems, so the original, from C/C++ Version 15 was used, producing similar performance at 32 bits.

Download Android Apps


A Settings, Security option may need changing to allow installation of non-Market applications

Logo fft1.apk
Original FFT Benchmark
Download
Logo fft3c.apk
Optimised FFT Benchmark
Download

All have an option to save results via Email

For maximum and consistent performance, some units might need setting of a CPU Mode (example ICS Settings, Developer Options, CPU Mode, Change Normal to Performance).

General

The FFT benchmarks started life in early 2000, based on a program from Scott in Compuserve PC Hardware Forum. Three of my Windows versions were produced that provided a graphical output, starting with one that was optimised all C code. The second one was further optimised including assembly language. The third had SSE SIMD assembly code and further tuning changes. The latest are all C code, with text output FFT1, being the original and FFT3c, the third one without assembly code. Android varieties are run by downloading and installing the apk files, Windows versions by clicking on an EXE file or from a Command Prompt function, and those for Linux via a Terminal command. For further details of the original versions, and numerous results on PCs, see fftgraf results.htm.

The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results are displayed and saved in a log file (FFT-tests.txt), with FFT running time in milliseconds. An example of Linux results is shown below. As shown, some checks of numeric calculations are carried out on the largest FFTs. These are subject to variation due to different rounding effects.


 ###################################################

  Assembler CPUID and RDTSC      
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4
         Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz
  Measured - Minimum 3711 MHz, Maximum 3711 MHz
  Linux Functions
  get_nprocs() - CPUs 8, Configured CPUs 8
  get_phys_pages() and size - RAM Size 31.36 GB, Page Size 4096 Bytes
  uname() - Linux, roy-i7UB14, 3.13.0-43-generic
  #72-Ubuntu SMP Mon Dec 8 19:35:06 UTC 2014, x86_64

 ###################################################

   FFT 64 Bit Benchmark Version 1.0 Tue Sep  8 14:33:26 2015

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.015     0.014     0.027     0.016     0.016     0.016
    2     0.032     0.032     0.032     0.037     0.037     0.037
    4     0.074     0.074     0.074     0.110     0.109     0.111
    8     0.228     0.225     0.225     0.273     0.272     0.274
   16     0.569     0.570     0.566     0.677     0.671     0.673
   32     1.390     1.386     1.395     1.937     1.945     1.936
   64     3.938     3.943     3.949     4.547     4.527     4.537
  128     9.172     9.163     9.162    10.613    10.609    10.621
  256    21.554    21.511    21.500    24.491    24.560    24.542
  512    49.499    49.491    49.533    55.553    55.892    55.066
 1024   111.279   111.210   111.124   238.652   238.292   238.592

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

               End at Tue Sep  8 14:33:28 2015
  


To Start

Comparison

Following are double precision execution times, in milliseconds, for the 64 bit compilation of the optimised version 3c.0 benchmark. The first column is for the same PC that produced the above log file for version 1.0.

Version 3c Improvements - Performance improvements, of the optimised program, are typical of results on other systems, shown below, of around 25% at small FFT sizes, rising to more than three times faster on the largest ones.

SP, DP and Cache Effects - With data in the same cache, performance can be proportional to FFT size and changes in this relationship can indicate different cache capacity. Then, single precision data uses half the space occupied at double precision. The Core i7, here, has cache sizes of 32 KB for L1 data, 256 KB for L2 and 10 MB for L3. The effects of the latter are clear in the above, where DP calculations take more than twice as long as SP with 1024K FFTs. L2 effects are probably between 32K and 64K at single precision, with L1 4K to 8K. Without caching effects, DP performance can be only slightly slower, but see results for different systems.

32 Bit and 64 Bit Compilations, Windows and Linux Versions - On the Core i7, these each produce performance differences varying from similar to 25%. Relative result ratios on the other systems can be different. The only 64 Bit Android results (A53), at the time, show improvement of up to 62%, with L2 cache based data.

Different Systems 64 Bit DP Version 3c - Below is a summary of double precision optimised benchmark results on PCs, Android devices and Raspberry Pi 2 (for Linux/ARM example). Pentium 4 results are also included, for comparison purposes. PCs demonstrate superior performance through higher CPU MHz and larger secondary caches, but Android devices appear to be catching up quickly. The Raspberry Pi 2 performance is quite good, compared with the higher MHz Pentium 4, and costs less than $50.


         Double Precision Version 3c.0 Milliseconds

                   Linux, P4 Windows
                                                   Rpi 2
         Core i7  Core 2  Phenom    Atom Pentium  ARM V7
           4820K    6600  II 945    N455       4 BCM2836
     MHz    3900    2400    3000    1666    1900     900

  K Size
       1    0.02    0.05    0.03    0.17    0.09    0.24
       2    0.04    0.12    0.05    0.42    0.20    0.55
       4    0.08    0.29    0.14    0.91    0.42    1.30
       8    0.18    0.64    0.41    1.98    0.98    3.07
      16    0.37    1.29    0.88    3.92    6.23    8.68
      32    0.78    2.86    2.11   10.01    15.2   23.24
      64    1.70    6.21    4.64   23.51    34.6   56.37
     128    3.66   14.49   10.45   53.03    73.7  125.95
     256    8.09   34.85   26.32  115.83   156.0  272.65
     512   21.05   81.85   79.23  245.59   344.0  587.89
    1024   65.15  178.81  197.41  538.71   804.0 1279.49

                       Android

 Version  32 Bit  32 Bit  32 Bit  64 Bit  32 Bit  32 Bit

        Snapdrag    Atom     ARM     ARM     ARM     ARM
             800   Z3745     A15     A53      A9     A53
     MHz    2150    1860    1700    1300    1200    1300

  K Size
       1    0.07    0.08    0.08    0.20    0.21    0.20
       2    0.19    0.20    0.17    0.48    0.55    0.47
       4    0.41    0.43    0.41    1.07    1.38    1.06
       8    0.90    0.96    0.90    2.40    3.09    2.33
      16    2.66    2.86    3.23    5.64    9.08    9.12
      32    6.07    5.56    8.88   15.40   22.02   22.93
      64   13.56   15.03   23.08   36.16   52.11   50.41
     128   33.09   34.77   53.11   82.23  118.45  112.46
     256   72.55   72.93  120.66  193.91  258.56  264.79
     512  156.30  157.56  264.30  424.72  552.00  550.88
    1024  337.07  332.37  586.18  960.28 1175.65 1206.83
  


To Start

Linux/Windows/Intel/AMD


Intel Core i7 4820K AMD Phenom II 945 Intel Core 2 6600
Intel Atom N455

Disassembly of the Windows benchmarks showed that the main calculations used the same SSE type instructions for 64 bit and 32 bit compilations, where 64 bit could produce slightly faster performance, due to more registers being available for optimisation or, with small FFT sizes, 32 bit versions could be faster, probably due to fewer multiple data. FFT execution times were generally similar between Linux and Windows versions.

The new 32 bit version would not run on the Intel Atom based netbook, so the C/C++ 15 results are quoted, where performance is quite similar to those from Linux. Results for V15 are also provide for the Phenom based PC, with most not significantly different to the new benchmarks.

 

Core i7 4820K at 3.9 GHz, 32 KB L1, 256 KB L2, 10 MB L3

Ubuntu 14.04 Windows 10 64 Bit 32 Bit 64 Bit 32 Bit K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.01 0.02 0.02 0.02 0.02 0.02 0.02 0.02 2 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.04 4 0.07 0.11 0.09 0.12 0.08 0.12 0.09 0.12 8 0.23 0.27 0.26 0.30 0.25 0.30 0.26 0.31 16 0.57 0.67 0.64 0.74 0.62 0.77 0.65 0.76 32 1.39 1.95 1.55 2.16 1.54 2.00 1.59 1.98 64 3.94 4.53 4.48 5.00 3.94 4.69 4.12 4.74 128 9.16 10.61 10.31 11.42 9.03 10.56 9.77 10.74 256 21.51 24.56 23.21 26.11 21.33 21.99 22.48 22.11 512 49.49 55.89 52.82 59.51 44.46 59.23 44.90 60.55 1024 111.21 238.29 118.96 296.58 114.09 187.79 109.55 187.70 Version 3c.0 1 0.01 0.02 0.02 0.02 0.01 0.01 0.02 0.02 2 0.03 0.04 0.03 0.03 0.03 0.03 0.03 0.03 4 0.06 0.08 0.07 0.08 0.06 0.07 0.07 0.08 8 0.14 0.18 0.17 0.17 0.14 0.17 0.16 0.18 16 0.31 0.37 0.38 0.43 0.33 0.38 0.38 0.41 32 0.68 0.78 0.84 0.93 0.73 0.82 0.81 0.86 64 1.47 1.70 1.80 2.00 1.56 1.75 1.86 1.95 128 3.20 3.66 3.87 4.23 4.11 3.81 3.76 3.97 256 6.97 8.09 8.36 9.76 8.57 8.87 8.31 9.12 512 15.43 21.05 18.90 25.34 17.60 22.90 19.56 23.18 1024 38.06 65.15 45.51 78.25 42.91 56.67 46.56 56.48 To Index To Start

Phenom II 945 3.0 GHz, 64 KB L1, 512 KB L2, 6 MB L3

64 Bit 32 Bit 64 Bit 32 Bit 32 Bit OpenSuse Ubuntu Windows 7 V15 Compiler K Size SP DP SP DP SP DP SP DP SP DP Version 1.0 1 0.03 0.03 0.03 0.03 0.03 0.04 0.04 0.04 0.04 0.03 2 0.06 0.07 0.07 0.07 0.08 0.09 0.08 0.09 0.09 0.08 4 0.14 0.17 0.15 0.18 0.18 0.23 0.29 0.22 0.21 0.18 8 0.34 0.78 0.35 0.78 0.47 0.85 0.45 0.83 0.49 0.79 16 1.53 1.95 1.52 1.93 1.70 2.10 1.72 2.16 1.79 2.04 32 3.91 4.87 3.87 5.04 4.22 5.12 5.06 5.21 4.47 4.98 64 10.14 13.51 10.29 12.95 10.78 14.83 10.73 14.22 10.59 13.23 128 26.69 32.18 25.89 30.18 30.27 33.78 28.82 32.11 29.03 31.43 256 63.94 91.17 60.35 84.63 69.96 92.51 65.58 88.46 66.94 103.04 512 178.48 317.27 165.89 269.39 186.37 275.06 188.56 280.71 207.74 317.79 1024 534.61 762.15 490.83 694.37 538.48 678.87 522.42 641.49 603.57 738.49 Version 3c.0 1 0.02 0.03 0.03 0.03 0.02 0.02 0.02 0.02 0.04 0.03 2 0.05 0.05 0.06 0.05 0.05 0.04 0.05 0.05 0.08 0.05 4 0.10 0.14 0.13 0.14 0.10 0.12 0.11 0.13 0.18 0.15 8 0.26 0.41 0.31 0.40 0.27 0.50 0.29 0.40 0.42 0.52 16 0.67 0.88 0.78 0.95 0.70 1.19 0.72 0.96 1.03 1.26 32 1.61 2.11 1.82 2.22 1.71 2.76 1.81 2.29 2.38 2.87 64 4.00 4.64 4.36 4.90 4.23 6.00 4.33 5.05 5.77 6.22 128 9.31 10.45 9.95 11.06 9.54 12.94 10.06 11.44 13.09 13.66 256 21.25 26.32 22.33 27.45 21.46 29.81 22.76 29.16 30.10 31.64 512 50.53 79.23 52.80 80.46 48.58 76.09 53.72 82.36 73.98 79.61 1024 149.60 197.41 151.82 200.81 131.05 170.16 152.89 204.64 200.29 177.68 To Index To Start

Core 2 6600 2.4 GHz, 32 KB L1, 4 MB L2

64 Bit 32 Bit 64 Bit 32 Bit OpenSuse Ubuntu Windows Vista K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.06 0.07 0.04 0.04 0.04 0.04 0.04 0.04 2 0.14 0.16 0.09 0.11 0.09 0.11 0.09 0.11 4 0.33 0.45 0.23 0.30 0.21 0.30 0.23 0.31 8 0.96 1.08 0.65 0.73 0.63 0.72 0.65 0.71 16 1.47 2.29 1.56 1.68 1.50 1.68 1.51 1.63 32 5.03 5.12 3.62 3.84 3.80 3.74 3.51 3.73 64 11.29 11.39 8.21 8.59 7.58 8.41 7.54 8.64 128 24.97 28.91 18.14 19.76 17.05 17.87 16.88 20.50 256 61.32 83.74 39.01 58.01 40.27 60.55 40.31 56.01 512 214.06 277.56 108.41 246.76 129.95 243.44 107.80 245.49 1024 683.95 677.40 441.70 624.58 462.70 591.28 499.00 568.59 Version 3c.0 1 0.04 0.05 0.03 0.03 0.03 0.03 0.04 0.04 2 0.11 0.12 0.08 0.08 0.07 0.07 0.08 0.08 4 0.25 0.29 0.18 0.19 0.16 0.18 0.18 0.20 8 0.62 0.64 0.43 0.42 0.47 0.40 0.45 0.44 16 1.35 1.29 0.95 1.04 0.86 0.91 0.94 1.03 32 1.97 2.86 2.09 2.26 1.87 2.05 2.22 2.37 64 6.48 6.21 4.78 4.86 4.17 4.66 4.65 5.39 128 14.22 14.49 9.95 11.41 8.87 16.62 10.59 13.23 256 32.12 34.85 22.68 28.41 21.52 29.47 28.63 31.75 512 74.38 81.85 53.95 69.23 51.23 65.83 60.95 73.48 1024 170.14 178.81 127.26 155.29 119.84 172.03 132.35 162.88 To Index To Start

Atom N455 1.66 GHz, 24 KB L1, 512 KB L2

64 Bit 32 Bit 64 Bit 32 Bit Ubuntu Ubuntu Windows Windows XP V15 Compiler K Size SP DP SP DP SP DP SP DP Version 1.0 1 0.14 0.17 0.31 0.22 N/A 32 Bit OS 0.49 0.34 2 0.39 0.55 0.45 0.56 1.22 0.52 4 0.94 1.11 1.39 1.59 1.40 1.18 8 2.08 2.37 3.08 2.74 3.10 2.56 16 4.61 5.58 5.88 6.19 7.61 5.62 32 10.18 26.36 12.89 22.07 15.47 26.10 64 55.42 92.95 51.73 103.15 41.60 109.53 128 188.96 214.09 217.04 233.85 233.15 235.58 256 437.06 449.59 448.49 476.09 476.23 483.28 512 930.17 962.97 957.59 1005.09 998.35 1011.16 1024 1980.58 2140.33 2164.96 2219.15 2075.99 2152.18 Version 3c.0 1 0.12 0.17 0.23 0.22 0.52 0.22 2 0.27 0.42 0.50 0.49 1.12 0.48 4 0.77 0.91 1.07 1.06 2.43 1.26 8 1.44 1.98 2.76 2.29 3.14 2.41 16 3.29 3.92 5.23 5.80 7.26 5.66 32 7.27 10.01 11.54 13.84 15.64 13.25 64 18.25 23.51 27.70 31.05 36.28 29.98 128 43.76 53.03 61.77 68.57 83.01 68.37 256 97.14 115.83 136.10 146.29 177.92 147.35 512 214.41 245.59 294.54 307.63 391.90 315.94 1024 455.59 538.71 620.95 665.59 802.62 699.23
To Index

To Start

Android/ARM/Intel Plus Windows 10/Intel


Android
Cortex-A9 1.2 GHz 32 bit
Android
Cortex-A15 1.7 GHz 32 bit
Android
Qualcomm 800 2.1 GHz 32 bit
Android
ARM Cortex-A53 1.3 GHz 64 bit
Android
ARM Cortex-A53 1.3 GHz 32 bit
Android
Atom Z3745 1.86 GHz 32 bit
Android Dual boot with Windows
Atom Z8300 1.84 GHz 32 bit
Windows
Atom Z8300 1.84 GHz 32 bit
Windows
Atom Z8300 1.84 GHz 64 bit
Windows Dual boot with Android
Atom Z8300 1.84 GHz 32 bit
Windows Dual boot with Android
Atom Z8300 1.84 GHz 64 bit
Windows
Core i7 3.9 GHz 32 bit<
Windows
Core i7 3.9 GHz 64 bit


Initially, only one tablet was available that runs at 64 bits, a Lenovo TAB 2 A8-50F using Android 5. In this case, 64 bit and 32 bit results were similar for the non-optimised version, but averaged 40% faster with the more efficient code.

All systems produced significant gains, using the optimised benchmark, but some struggled running the smaller FFTs. Typical SP execution times were 20% faster than DP, at large FFTs, but could be slower running small ones.

October 2015 - Upgrades A1 Asus MemoPad 7 Android 4.4.2 to 5.0 and T7 Nexus 7, Android 4.1.2 to 5.0.2 - ignoring exceptions, the upgrades produced somewhat faster average speeds but some were slower but some results were slower.

February 2016 - Intel version run on a Atom based Windows 10 tablet. Results for rerun on a Core i7 PC included for comparison.

April 2016 - Dual boot Windows/Android Intel Atom based tablet included.

  

               Single Precision and Double Precision Results in milliseconds 

         T7 Nexus 7                            T11 VOYO A15       T21 Kindle HDX 7
         Cortex-A9 1.2 GHz                     Cortex-A15 1.7 GHz Qualcomm 800 2.1 GHz
L1/L2 KB 32/1024                               32/2048            16/2048
         Android 4.1.2      Android 5.0.2      Android 4.2.2      Android 4.4.3
           32 Bit             32 Bit             32 Bit             32 Bit
   K Size      SP      DP         SP      DP         SP      DP         SP      DP
Version 1.0
        1    0.64    0.38       0.18    0.21       0.10    0.17       0.14    0.18
        2    0.77    0.97       0.40    0.67       0.22    0.36       0.33    0.53
        4    1.14    1.77       1.13    1.86       0.57    0.90       1.03    1.30
        8    3.28    4.40       3.26    5.12       2.12    2.31       2.50    3.09
       16    7.76    9.39       7.74    9.69       4.71    5.97       1.95    2.20
       32   17.80   22.26      18.09   22.73      10.76   11.37       4.18    5.77
       64   61.05  140.58      41.64   84.68      20.10   49.70      14.61   20.01
      128  153.19  289.15     139.98  274.54      77.67  213.70      33.19   60.52
      256  450.16  645.72     444.09  645.70     408.51  448.95     107.49  310.93
      512 1084.11 1457.85    1102.20 1438.29     782.85 1101.70     584.54  497.23
     1024 2388.33 3129.21    2388.56 3185.93    1799.89 2280.30     875.95  963.37

Version 3c.0
        1    0.66    0.21       0.27    0.25       0.23    0.08       0.35    0.07
        2    1.09    0.55       0.65    0.65       0.50    0.17       0.81    0.19
        4    2.67    1.38       1.67    1.45       1.07    0.41       1.66    0.41
        8    3.56    3.09       4.30    3.23       2.41    0.90       1.08    0.90
       16    7.78    9.08       8.33   10.35       5.26    3.23       3.36    2.66
       32   17.85   22.02      19.23   25.38      11.88    8.88       6.54    6.07
       64   39.52   52.11      46.41   58.90      23.75   23.08      12.57   13.56
      128   89.73  118.45     103.31  128.44      49.74   53.11      27.41   33.09
      256  203.34  258.56     221.99  267.12     100.25  120.66      63.39   72.55
      512  437.25  552.00     464.30  558.13     226.76  264.30     150.38  156.30
     1024  918.32 1175.65     933.05 1182.49     505.68  586.18     306.32  337.07


         T22 Lenovo TAB 2 A8-50F
         ARM Cortex-A53 1.3 GHz
L1/L2 KB 32/512
         Android 5.0.2
           64 Bit             32 Bit
   K Size      SP      DP         SP      DP
Version 1.0
        1    0.20    0.21       0.21    0.21
        2    0.44    0.50       0.43    0.53
        4    1.06    1.26       1.03    1.24
        8    2.52    3.03       2.52    2.85
       16    5.89    6.41       5.68    6.60
       32   14.09   25.29      13.05   30.59
       64   49.97  109.32      45.80   92.16
      128  188.37  256.98     153.25  221.98
      256  447.62  583.33     362.62  504.60
      512  826.77 1019.84     840.44 1107.14
     1024 1846.27 2299.97    1835.82 2423.72

Version 3c.0
        1    0.17    0.20       0.34    0.20
        2    0.37    0.48       0.74    0.47
        4    2.55    1.07       1.62    1.06
        8    1.93    2.40       3.63    2.33
       16    4.59    5.64       8.07    9.12
       32   10.68   15.40      18.20   22.93
       64   28.17   36.16      45.33   50.41
      128   66.87   82.23     101.38  112.46
      256  148.69  193.91     222.13  264.79
      512  347.25  424.72     501.52  550.88
     1024  760.74  960.28    1085.65 1206.83


                             Intel CPUs Android

                                               Dual Boot with W2
         A1 Asus MemoPad 7                     A5 Teclast X98 Plus
         Atom Z3745 1.86 GHz                   Atom Z8300 1.84 GHz
 L1/L2/L324/1024                               KB 24/1024/0
         Android 4.4.2      Android 5.0        Android 5.1
           32 Bit             32 Bit             32 Bit
   K Size      SP      DP         SP      DP         SP      DP
 Version 1.0
        1    0.09    0.11       0.10    0.09       0.09    0.12
        2    0.21    0.29       0.16    0.23       0.18    0.31
        4    0.61    0.66       0.48    0.52       0.61    0.57
        8    1.35    1.17       1.07    1.17       1.17    1.56
       16    3.20    2.57       2.38    2.59       3.15    3.34
       32    5.41    5.75       5.30    6.02       6.65    9.20
       64   11.74   29.95      11.77   28.31      15.62   45.48
      128   67.54   99.31      54.05   97.58      49.67  110.14
      256  194.13  225.94     189.11  219.98     222.78  264.65
      512  438.49  501.59     433.06  487.49     521.72  602.38
     1024  970.84 1121.61     968.37 1116.94    1187.13 1433.75

Version 3c.0
        1    0.09    0.08       0.10    0.08       0.15    0.13
        2    0.21    0.20       0.16    0.20       0.20    0.21
        4    0.50    0.43       1.66    0.43       0.45    0.52
        8    1.12    0.96       0.87    0.96       0.97    1.05
       16    2.64    2.86       2.01    2.34       2.14    2.61
       32    4.87    5.56       4.51    5.73       4.82    6.53
       64   11.11   15.03      10.01   14.47      11.10   17.79
      128   27.29   34.77      26.80   33.71      29.95   43.74
      256   62.57   72.93      61.16   72.04      77.43   86.13
      512  132.64  157.56     131.10  152.68     152.95  185.74
     1024  282.99  332.37     274.01  363.60     314.54  460.91


                             Intel CPUs - Windows

                                               Dual Boot with A5
         W1 Pipo W1S Tablet                    W2 Teclast X98 Plus
         Atom Z8300 1.84 GHz                   Atom Z8300 1.84 GHz
L1/L2/L3 KB 24/1024/0                          KB 24/1024/0
         Windows 10                            Windows 10
           32 bit             64 bit             32 Bit             64 Bit
    K Size     SP      DP         SP      DP         SP      DP         SP      DP
   Version 1.0
        1    0.11    0.12       0.10    0.12       0.11    0.12       0.10    0.12
        2    0.24    0.45       0.23    0.35       0.24    0.34       0.22    0.33
        4    0.67    0.75       0.63    0.74       0.65    0.74       0.72    0.74
        8    1.44    1.80       1.50    1.69       1.46    1.66       1.37    1.68
       16    3.29    3.71       3.16    3.65       3.25    3.61       3.21    3.78
       32    7.32    7.83       5.94    6.98       7.33    8.10       6.98    7.97
       64   14.36   31.51      13.95   25.44      16.40   28.29      15.96   29.96
      128   46.45  120.79      50.90  115.44      38.56  121.13      76.10  136.39
      256  209.39  235.36     203.02  266.34     232.47  266.35     259.73  298.24
      512  455.89  534.68     491.49  576.91     565.20  597.42     596.50  629.28
     1024 1024.78 1195.81    1040.39 1182.20    1205.59 1450.84    1288.20 1439.44

   Version 3c.0
        1    0.08    0.08       0.08    0.09       0.08    0.09       0.09    0.08
        2    0.19    0.20       0.20    0.22       0.19    0.23       0.18    0.19
        4    0.46    0.44       0.46    0.48       0.45    0.51       0.48    0.43
        8    1.20    0.97       1.06    1.07       1.00    1.12       1.08    0.93
       16    2.27    2.26       2.26    2.25       2.67    2.68       2.51    2.50
       32    5.11    5.54       5.31    5.83       5.54    5.59       5.74    6.06
       64   12.48   14.29      11.22   15.59      10.64   14.72      12.54   14.77
      128   27.62   34.25      27.47   31.65      32.82   36.71      28.28   36.95
      256   71.32   70.99      62.74   67.95      66.71   77.48      67.25   78.47
      512  143.07  144.60     140.50  146.76     157.72  153.43     150.14  168.63
     1024  298.00  322.13     289.98  334.07     332.39  365.36     300.79  370.48


         2015 Top End Desktop PC
         Corei7-4820K 3.9 GHz
 L1/L2/L332/256/10 MB
         Windows 10
           32 bit             64 bit
   K Size      SP      DP         SP      DP
 Version 1.0
        1    0.02    0.02       0.02    0.02
        2    0.04    0.04       0.04    0.04
        4    0.09    0.12       0.08    0.12
        8    0.26    0.31       0.25    0.30
       16    0.65    0.77       0.62    0.76
       32    1.59    1.96       1.51    1.93
       64    4.33    4.87       3.91    4.78
      128    9.94   10.57       9.21   10.60
      256   21.87   22.00      21.01   22.06
      512   45.09   55.15      44.72   58.29
     1024  105.75  199.77     111.23  199.11

Version 3c.0
        1    0.02    0.02       0.01    0.01
        2    0.03    0.03       0.03    0.03
        4    0.07    0.08       0.06    0.07
        8    0.16    0.18       0.14    0.16
       16    0.37    0.41       0.33    0.38
       32    0.81    0.86       0.73    0.82
       64    1.76    1.86       1.56    1.75
      128    3.77    4.05       3.38    3.76
      256    8.24    9.36       7.38    8.78
      512   19.09   22.96      17.28   22.50
     1024   45.68   57.37      42.19   56.66


  
To Start

Linux/ARM

Following are results from a Raspberry Pi 2, with version 3c.0 around 30% faster on larger FFTs and SP/DP relationships similar to that on Android based devices.

   
 Raspberry Pi 2 ARM V7 900 MHz, Linux Debian, 32/512 KB L1/L2

            Version 1.0             Version 3c.0
  K Size      SP      DP              SP      DP

       1    0.31    0.36            0.35    0.24
       2    0.67    0.91            0.78    0.55
       4    1.71    2.42            1.82    1.30
       8    2.95    3.67            4.02    3.07
      16    6.76    9.34            6.38    8.68
      32   15.69   37.32           15.50   23.24
      64   57.98  130.56           40.70   56.37
     128  243.61  347.12           95.91  125.95
     256  667.43  808.14          212.87  272.65
     512 1553.41 1715.45          456.70  587.89
    1024 3220.45 3739.41          987.87 1279.49

  

To Start

Numeric Checks

As indicated earlier, checks of numeric calculations are carried out. These are for angle precision and double precision 1024K sized FFTs. Below is a summary of results from systems initially tested. Differences are due to variations in rounding but should be consistent with a particular benchmark running on the same hardware platform.


      Square Check   Maximum Noise  Average Noise

  64 Bit Version 1.0 and 3c.0 Windows and Linux #####
  SP  9.999520e-001  3.346482e-006  4.565234e-011
  DP  1.000000e+000  1.133294e-023  1.428110e-028

  64 Bit Version 1.0 and 3c.0 Linux Intel Atom
  SP  As #####       3.346483e-06   As #####
  DP  As #####       As #####       As #####

  64 Bit Version 1.0 and 3c.0 Android 5 Cortex-A53+A104
  SP  As #####       As #####       As #####
  DP  As #####       As #####       As #####

  32 Bit Version 1.0 and 3c.0 Windows
  SP  As #####       As #####       As #####
  DP  As #####       As #####       1.428095e-028

  32 Bit Version 1.0 and 3c.0 Windows, including Atom,  V15 compiler
  SP  As #####       3.338028e-006  1.043382e-011
  DP  As #####       As #####       1.428096e-028

  32 Bit Version 1.0 Linux Core i7
  SP  As #####       As #####       4.565256e-011
  DP  As #####       1.134835e-23   1.428102e-28
  32 Bit Version 3c.0 Linux Core i7
  SP  As #####       As #####       As #####
  DP  As #####       As #####       As #####

  32 Bit Version 1.0 and 3c.0 Linux Phenom, Core 2
  SP  As #####       As #####       4.565256e-011
  DP  As #####       1.134835e-23   1.428102e-28

  32 Bit Version 1.0 and 3c.0 Atom
  SP  As #####       As #####       4.569256e-11
  DP  As #####       1.134835e-23   1.428088e-28

  32 Bit Version 1.0 and 3c.0, Android (so far)
  SP  As #####       As #####       As #####
  DP  As #####       As #####       As #####

  32 Vit Version Raspberry Pi
  SP  As #####       3.346483e-06   As #####
  DP  As #####       As #####       As #####
  


To Start

MFLOPS and MHz

Earlier, the benchmark program was modified to count the number of floating point operations at each FFT size, these being shown below. Million Floating Point Operations Per Second (MFLOPS), of each measurement, can be calculated from these (Op count/1000/milliseconds). These are shown below, for tests on a Core i7 CPU.

Next are maximum MFLOPS from tests on different systems and calculated MFLOPS/MHz, to reflect efficiency of the different platforms. As indicated earlier, Android single precision calculations could be slow on small FFTs. This leads to apparent poor relative performance, when comparisons are based on maximum MFLOPS.

Note, using SSE SIMD instructions, the i7 CPU could obtain up to 8 MFLOPS per MHz.


                           MFLOPS  MFLOPS
   FFT size   FP op count      SP      DP

       1024         53312    4443    3332
       2048        116864    4495    3339
       4096        254080    4381    3176
       8192        549120    4038    3102
      16384       1179904    3758    3233
      32768       2523648    3690    3223
      65536       5374464    3661    3161
     131072      11404288    3569    3118
     262144      24118272    3462    2983
     524288      50857984    3296    2416
    1048576     106956800    2810    1642


  CPU             Core i7         Phenom II       Core 2          Atom
  MHz             3900            3000            2400            1666
                    SP      DP      SP      DP      SP      DP      SP      DP

  MFLOPS          4495    3339    2647    2164    1904    1720     441     312
  MFLOPS/MHz      1.15    0.86    0.88    0.72    0.79    0.72    0.26    0.19

  CPU             Pentium 4       Cortex-A53 64b  Cortex-A53 32b  Atom Z3745
  MHz             1900            1300            1300            1860
                    SP      DP      SP      DP      SP      DP      SP      DP

  MFLOPS           726     602     316     261     159     269     573     635
  MFLOPS/MHz      0.38    0.32    0.24    0.20    0.12    0.21    0.31    0.34

  CPU             Qualcomm 800    Cortex-A15      Cortex-A9        ARM V7 RPi2
  MHz             2100            1700            1200             900
                    SP      DP      SP      DP      SP      DP      SP      DP

  MFLOPS           507     730     241     683     154     258     185     225
  MFLOPS/MHz      0.24    0.35    0.14    0.40    0.13    0.21    0.21    0.25

  CPU             Atom Z3745 64b  Atom Z3745 32b  Core i7 64b      Core i7 32b
  OS              Windows 10      Windows 10      Windows 10       Windows 10
  MHz             1840            1840            3900             3900
                    SP      DP      SP      DP      SP      DP      SP      DP

  MFLOPS           666     613     650     658    4443    4443    3652    3554
  MFLOPS/MHz      0.36    0.33    0.35    0.36    1.14    1.14    0.94    0.91
  


To Start


Roy Longbottom at Linkedin Roy Longbottom April 2016

The Official Internet Home for my Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection