Contents
Benchmarks Available
A series of Fast Fourier Transform benchmarks have been produced from the same C language code, to measure performance of Intel based PCs using Windows and Linux, then Intel and ARM Android devices, plus ARM CPUs using Linux (for Raspberry Pi). The Android apps, downloaded from the following buttons, automatically select code for Intel or ARM (or MIPS) processors, and 64 or 32 bit operation. Other benchmark execution programs and source codes are available in
http://www.roylongbottom.org.uk/FFT Benchmarks.zip.
Compilers used were gcc 4.8 for Linux, Android and Raspberry Pi and C/C++ Version 18 for Windows. The benchmarks for the latter would not run on older systems, so the original, from C/C++ Version 15 was used, producing similar performance at 32 bits.
Download Android Apps
A Settings, Security option may need changing to allow installation of non-Market applications
|
fft1.apk
Original FFT Benchmark
|
|
|
|
fft3c.apk
Optimised FFT Benchmark
|
|
All have an option to save results via Email
For maximum and consistent performance, some units might need setting of a CPU Mode (example ICS Settings, Developer Options, CPU Mode, Change Normal to Performance).
General
The FFT benchmarks started life in early 2000, based on a program from Scott in Compuserve PC Hardware Forum. Three of my Windows versions were produced that provided a graphical output, starting with one that was optimised all C code. The second one was further optimised including assembly language. The third had SSE SIMD assembly code and further tuning changes. The latest are all C code, with text output FFT1, being the original and FFT3c, the third one without assembly code.
Android varieties are run by downloading and installing the apk files, Windows versions by clicking on an EXE file or from a Command Prompt function, and those for Linux via a Terminal command.
For further details of the original versions, and numerous results on PCs, see
fftgraf results.htm.
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results are displayed and saved in a log file (FFT-tests.txt), with FFT running time in milliseconds. An example of Linux results is shown below. As shown, some checks of numeric calculations are carried out on the largest FFTs. These are subject to variation due to different rounding effects.
###################################################
Assembler CPUID and RDTSC
CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4
Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz
Measured - Minimum 3711 MHz, Maximum 3711 MHz
Linux Functions
get_nprocs() - CPUs 8, Configured CPUs 8
get_phys_pages() and size - RAM Size 31.36 GB, Page Size 4096 Bytes
uname() - Linux, roy-i7UB14, 3.13.0-43-generic
#72-Ubuntu SMP Mon Dec 8 19:35:06 UTC 2014, x86_64
###################################################
FFT 64 Bit Benchmark Version 1.0 Tue Sep 8 14:33:26 2015
Size milliseconds
K Single Precision Double Precision
1 0.015 0.014 0.027 0.016 0.016 0.016
2 0.032 0.032 0.032 0.037 0.037 0.037
4 0.074 0.074 0.074 0.110 0.109 0.111
8 0.228 0.225 0.225 0.273 0.272 0.274
16 0.569 0.570 0.566 0.677 0.671 0.673
32 1.390 1.386 1.395 1.937 1.945 1.936
64 3.938 3.943 3.949 4.547 4.527 4.537
128 9.172 9.163 9.162 10.613 10.609 10.621
256 21.554 21.511 21.500 24.491 24.560 24.542
512 49.499 49.491 49.533 55.553 55.892 55.066
1024 111.279 111.210 111.124 238.652 238.292 238.592
1024 Square Check Maximum Noise Average Noise
SP 9.999520e-01 3.346482e-06 4.565234e-11
DP 1.000000e+00 1.133294e-23 1.428110e-28
End at Tue Sep 8 14:33:28 2015
|
To Start
Comparison
Following are double precision execution times, in milliseconds, for the 64 bit compilation of the optimised version 3c.0 benchmark. The first column is for the same PC that produced the above log file for version 1.0.
Version 3c Improvements - Performance improvements, of the optimised program, are typical of results on other systems, shown below, of around 25% at small FFT sizes, rising to more than three times faster on the largest ones.
SP, DP and Cache Effects - With data in the same cache, performance can be proportional to FFT size and changes in this relationship can indicate different cache capacity. Then, single precision data uses half the space occupied at double precision. The Core i7, here, has cache sizes of 32 KB for L1 data, 256 KB for L2 and 10 MB for L3. The effects of the latter are clear in the above, where DP calculations take more than twice as long as SP with 1024K FFTs. L2 effects are probably between 32K and 64K at single precision, with L1 4K to 8K. Without caching effects, DP performance can be only slightly slower, but see results for different systems.
32 Bit and 64 Bit Compilations, Windows and Linux Versions - On the Core i7, these each produce performance differences varying from similar to 25%. Relative result ratios on the other systems can be different.
The only 64 Bit Android results (A53), at the time, show improvement of up to 62%, with L2 cache based data.
Different Systems 64 Bit DP Version 3c - Below is a summary of double precision optimised benchmark results on PCs, Android devices and Raspberry Pi 2 (for Linux/ARM example). Pentium 4 results are also included, for comparison purposes. PCs demonstrate superior performance through higher CPU MHz and larger secondary caches, but Android devices appear to be catching up quickly. The Raspberry Pi 2 performance is quite good, compared with the higher MHz Pentium 4, and costs less than $50.
Double Precision Version 3c.0 Milliseconds
Linux, P4 Windows
Rpi 2
Core i7 Core 2 Phenom Atom Pentium ARM V7
4820K 6600 II 945 N455 4 BCM2836
MHz 3900 2400 3000 1666 1900 900
K Size
1 0.02 0.05 0.03 0.17 0.09 0.24
2 0.04 0.12 0.05 0.42 0.20 0.55
4 0.08 0.29 0.14 0.91 0.42 1.30
8 0.18 0.64 0.41 1.98 0.98 3.07
16 0.37 1.29 0.88 3.92 6.23 8.68
32 0.78 2.86 2.11 10.01 15.2 23.24
64 1.70 6.21 4.64 23.51 34.6 56.37
128 3.66 14.49 10.45 53.03 73.7 125.95
256 8.09 34.85 26.32 115.83 156.0 272.65
512 21.05 81.85 79.23 245.59 344.0 587.89
1024 65.15 178.81 197.41 538.71 804.0 1279.49
Android
Version 32 Bit 32 Bit 32 Bit 64 Bit 32 Bit 32 Bit
Snapdrag Atom ARM ARM ARM ARM
800 Z3745 A15 A53 A9 A53
MHz 2150 1860 1700 1300 1200 1300
K Size
1 0.07 0.08 0.08 0.20 0.21 0.20
2 0.19 0.20 0.17 0.48 0.55 0.47
4 0.41 0.43 0.41 1.07 1.38 1.06
8 0.90 0.96 0.90 2.40 3.09 2.33
16 2.66 2.86 3.23 5.64 9.08 9.12
32 6.07 5.56 8.88 15.40 22.02 22.93
64 13.56 15.03 23.08 36.16 52.11 50.41
128 33.09 34.77 53.11 82.23 118.45 112.46
256 72.55 72.93 120.66 193.91 258.56 264.79
512 156.30 157.56 264.30 424.72 552.00 550.88
1024 337.07 332.37 586.18 960.28 1175.65 1206.83
|
To Start
Linux/Windows/Intel/AMD
Disassembly of the Windows benchmarks showed that the main calculations used the same SSE type instructions for 64 bit and 32 bit compilations, where 64 bit could produce slightly faster performance, due to more registers being available for optimisation or, with small FFT sizes, 32 bit versions could be faster, probably due to fewer multiple data.
FFT execution times were generally similar between Linux and Windows versions.
The new 32 bit version would not run on the Intel Atom based netbook, so the C/C++ 15 results are quoted, where performance is quite similar to those from Linux. Results for V15 are also provide for the Phenom based PC, with most not significantly different to the new benchmarks.
Core i7 4820K at 3.9 GHz, 32 KB L1, 256 KB L2, 10 MB L3
Ubuntu 14.04 Windows 10
64 Bit 32 Bit 64 Bit 32 Bit
K Size SP DP SP DP SP DP SP DP
Version 1.0
1 0.01 0.02 0.02 0.02 0.02 0.02 0.02 0.02
2 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.04
4 0.07 0.11 0.09 0.12 0.08 0.12 0.09 0.12
8 0.23 0.27 0.26 0.30 0.25 0.30 0.26 0.31
16 0.57 0.67 0.64 0.74 0.62 0.77 0.65 0.76
32 1.39 1.95 1.55 2.16 1.54 2.00 1.59 1.98
64 3.94 4.53 4.48 5.00 3.94 4.69 4.12 4.74
128 9.16 10.61 10.31 11.42 9.03 10.56 9.77 10.74
256 21.51 24.56 23.21 26.11 21.33 21.99 22.48 22.11
512 49.49 55.89 52.82 59.51 44.46 59.23 44.90 60.55
1024 111.21 238.29 118.96 296.58 114.09 187.79 109.55 187.70
Version 3c.0
1 0.01 0.02 0.02 0.02 0.01 0.01 0.02 0.02
2 0.03 0.04 0.03 0.03 0.03 0.03 0.03 0.03
4 0.06 0.08 0.07 0.08 0.06 0.07 0.07 0.08
8 0.14 0.18 0.17 0.17 0.14 0.17 0.16 0.18
16 0.31 0.37 0.38 0.43 0.33 0.38 0.38 0.41
32 0.68 0.78 0.84 0.93 0.73 0.82 0.81 0.86
64 1.47 1.70 1.80 2.00 1.56 1.75 1.86 1.95
128 3.20 3.66 3.87 4.23 4.11 3.81 3.76 3.97
256 6.97 8.09 8.36 9.76 8.57 8.87 8.31 9.12
512 15.43 21.05 18.90 25.34 17.60 22.90 19.56 23.18
1024 38.06 65.15 45.51 78.25 42.91 56.67 46.56 56.48
To Index To Start
Phenom II 945 3.0 GHz, 64 KB L1, 512 KB L2, 6 MB L3
64 Bit 32 Bit 64 Bit 32 Bit 32 Bit
OpenSuse Ubuntu Windows 7 V15 Compiler
K Size SP DP SP DP SP DP SP DP SP DP
Version 1.0
1 0.03 0.03 0.03 0.03 0.03 0.04 0.04 0.04 0.04 0.03
2 0.06 0.07 0.07 0.07 0.08 0.09 0.08 0.09 0.09 0.08
4 0.14 0.17 0.15 0.18 0.18 0.23 0.29 0.22 0.21 0.18
8 0.34 0.78 0.35 0.78 0.47 0.85 0.45 0.83 0.49 0.79
16 1.53 1.95 1.52 1.93 1.70 2.10 1.72 2.16 1.79 2.04
32 3.91 4.87 3.87 5.04 4.22 5.12 5.06 5.21 4.47 4.98
64 10.14 13.51 10.29 12.95 10.78 14.83 10.73 14.22 10.59 13.23
128 26.69 32.18 25.89 30.18 30.27 33.78 28.82 32.11 29.03 31.43
256 63.94 91.17 60.35 84.63 69.96 92.51 65.58 88.46 66.94 103.04
512 178.48 317.27 165.89 269.39 186.37 275.06 188.56 280.71 207.74 317.79
1024 534.61 762.15 490.83 694.37 538.48 678.87 522.42 641.49 603.57 738.49
Version 3c.0
1 0.02 0.03 0.03 0.03 0.02 0.02 0.02 0.02 0.04 0.03
2 0.05 0.05 0.06 0.05 0.05 0.04 0.05 0.05 0.08 0.05
4 0.10 0.14 0.13 0.14 0.10 0.12 0.11 0.13 0.18 0.15
8 0.26 0.41 0.31 0.40 0.27 0.50 0.29 0.40 0.42 0.52
16 0.67 0.88 0.78 0.95 0.70 1.19 0.72 0.96 1.03 1.26
32 1.61 2.11 1.82 2.22 1.71 2.76 1.81 2.29 2.38 2.87
64 4.00 4.64 4.36 4.90 4.23 6.00 4.33 5.05 5.77 6.22
128 9.31 10.45 9.95 11.06 9.54 12.94 10.06 11.44 13.09 13.66
256 21.25 26.32 22.33 27.45 21.46 29.81 22.76 29.16 30.10 31.64
512 50.53 79.23 52.80 80.46 48.58 76.09 53.72 82.36 73.98 79.61
1024 149.60 197.41 151.82 200.81 131.05 170.16 152.89 204.64 200.29 177.68
To Index To Start
Core 2 6600 2.4 GHz, 32 KB L1, 4 MB L2
64 Bit 32 Bit 64 Bit 32 Bit
OpenSuse Ubuntu Windows Vista
K Size SP DP SP DP SP DP SP DP
Version 1.0
1 0.06 0.07 0.04 0.04 0.04 0.04 0.04 0.04
2 0.14 0.16 0.09 0.11 0.09 0.11 0.09 0.11
4 0.33 0.45 0.23 0.30 0.21 0.30 0.23 0.31
8 0.96 1.08 0.65 0.73 0.63 0.72 0.65 0.71
16 1.47 2.29 1.56 1.68 1.50 1.68 1.51 1.63
32 5.03 5.12 3.62 3.84 3.80 3.74 3.51 3.73
64 11.29 11.39 8.21 8.59 7.58 8.41 7.54 8.64
128 24.97 28.91 18.14 19.76 17.05 17.87 16.88 20.50
256 61.32 83.74 39.01 58.01 40.27 60.55 40.31 56.01
512 214.06 277.56 108.41 246.76 129.95 243.44 107.80 245.49
1024 683.95 677.40 441.70 624.58 462.70 591.28 499.00 568.59
Version 3c.0
1 0.04 0.05 0.03 0.03 0.03 0.03 0.04 0.04
2 0.11 0.12 0.08 0.08 0.07 0.07 0.08 0.08
4 0.25 0.29 0.18 0.19 0.16 0.18 0.18 0.20
8 0.62 0.64 0.43 0.42 0.47 0.40 0.45 0.44
16 1.35 1.29 0.95 1.04 0.86 0.91 0.94 1.03
32 1.97 2.86 2.09 2.26 1.87 2.05 2.22 2.37
64 6.48 6.21 4.78 4.86 4.17 4.66 4.65 5.39
128 14.22 14.49 9.95 11.41 8.87 16.62 10.59 13.23
256 32.12 34.85 22.68 28.41 21.52 29.47 28.63 31.75
512 74.38 81.85 53.95 69.23 51.23 65.83 60.95 73.48
1024 170.14 178.81 127.26 155.29 119.84 172.03 132.35 162.88
To Index To Start
Atom N455 1.66 GHz, 24 KB L1, 512 KB L2
64 Bit 32 Bit 64 Bit 32 Bit
Ubuntu Ubuntu Windows Windows XP
V15 Compiler
K Size SP DP SP DP SP DP SP DP
Version 1.0
1 0.14 0.17 0.31 0.22 N/A 32 Bit OS 0.49 0.34
2 0.39 0.55 0.45 0.56 1.22 0.52
4 0.94 1.11 1.39 1.59 1.40 1.18
8 2.08 2.37 3.08 2.74 3.10 2.56
16 4.61 5.58 5.88 6.19 7.61 5.62
32 10.18 26.36 12.89 22.07 15.47 26.10
64 55.42 92.95 51.73 103.15 41.60 109.53
128 188.96 214.09 217.04 233.85 233.15 235.58
256 437.06 449.59 448.49 476.09 476.23 483.28
512 930.17 962.97 957.59 1005.09 998.35 1011.16
1024 1980.58 2140.33 2164.96 2219.15 2075.99 2152.18
Version 3c.0
1 0.12 0.17 0.23 0.22 0.52 0.22
2 0.27 0.42 0.50 0.49 1.12 0.48
4 0.77 0.91 1.07 1.06 2.43 1.26
8 1.44 1.98 2.76 2.29 3.14 2.41
16 3.29 3.92 5.23 5.80 7.26 5.66
32 7.27 10.01 11.54 13.84 15.64 13.25
64 18.25 23.51 27.70 31.05 36.28 29.98
128 43.76 53.03 61.77 68.57 83.01 68.37
256 97.14 115.83 136.10 146.29 177.92 147.35
512 214.41 245.59 294.54 307.63 391.90 315.94
1024 455.59 538.71 620.95 665.59 802.62 699.23
To Index
|
To Start
Android/ARM/Intel Plus Windows 10/Intel
Android Cortex-A9 1.2 GHz 32 bit
|
Android Cortex-A15 1.7 GHz 32 bit
|
Android Qualcomm 800 2.1 GHz 32 bit
|
Android ARM Cortex-A53 1.3 GHz 64 bit
|
Android ARM Cortex-A53 1.3 GHz 32 bit
|
Android Atom Z3745 1.86 GHz 32 bit
|
Android Dual boot with Windows Atom Z8300 1.84 GHz 32 bit
|
Windows Atom Z8300 1.84 GHz 32 bit
|
Windows Atom Z8300 1.84 GHz 64 bit
|
Windows Dual boot with Android Atom Z8300 1.84 GHz 32 bit
|
Windows Dual boot with Android Atom Z8300 1.84 GHz 64 bit
|
Windows Core i7 3.9 GHz 32 bit<
|
Windows Core i7 3.9 GHz 64 bit
|
|
|
Initially, only one tablet was available that runs at 64 bits, a Lenovo TAB 2 A8-50F using Android 5. In this case, 64 bit and 32 bit results were similar for the non-optimised version, but averaged 40% faster with the more efficient code.
All systems produced significant gains, using the optimised benchmark, but some struggled running the smaller FFTs. Typical SP execution times were 20% faster than DP, at large FFTs, but could be slower running small ones.
October 2015 - Upgrades A1 Asus MemoPad 7 Android 4.4.2 to 5.0 and T7 Nexus 7, Android 4.1.2 to 5.0.2 - ignoring exceptions, the upgrades produced somewhat faster average speeds but some were slower but some results were slower.
February 2016 - Intel version run on a Atom based Windows 10 tablet. Results for rerun on a Core i7 PC included for comparison.
April 2016 - Dual boot Windows/Android Intel Atom based tablet included.
Single Precision and Double Precision Results in milliseconds
T7 Nexus 7 T11 VOYO A15 T21 Kindle HDX 7
Cortex-A9 1.2 GHz Cortex-A15 1.7 GHz Qualcomm 800 2.1 GHz
L1/L2 KB 32/1024 32/2048 16/2048
Android 4.1.2 Android 5.0.2 Android 4.2.2 Android 4.4.3
32 Bit 32 Bit 32 Bit 32 Bit
K Size SP DP SP DP SP DP SP DP
Version 1.0
1 0.64 0.38 0.18 0.21 0.10 0.17 0.14 0.18
2 0.77 0.97 0.40 0.67 0.22 0.36 0.33 0.53
4 1.14 1.77 1.13 1.86 0.57 0.90 1.03 1.30
8 3.28 4.40 3.26 5.12 2.12 2.31 2.50 3.09
16 7.76 9.39 7.74 9.69 4.71 5.97 1.95 2.20
32 17.80 22.26 18.09 22.73 10.76 11.37 4.18 5.77
64 61.05 140.58 41.64 84.68 20.10 49.70 14.61 20.01
128 153.19 289.15 139.98 274.54 77.67 213.70 33.19 60.52
256 450.16 645.72 444.09 645.70 408.51 448.95 107.49 310.93
512 1084.11 1457.85 1102.20 1438.29 782.85 1101.70 584.54 497.23
1024 2388.33 3129.21 2388.56 3185.93 1799.89 2280.30 875.95 963.37
Version 3c.0
1 0.66 0.21 0.27 0.25 0.23 0.08 0.35 0.07
2 1.09 0.55 0.65 0.65 0.50 0.17 0.81 0.19
4 2.67 1.38 1.67 1.45 1.07 0.41 1.66 0.41
8 3.56 3.09 4.30 3.23 2.41 0.90 1.08 0.90
16 7.78 9.08 8.33 10.35 5.26 3.23 3.36 2.66
32 17.85 22.02 19.23 25.38 11.88 8.88 6.54 6.07
64 39.52 52.11 46.41 58.90 23.75 23.08 12.57 13.56
128 89.73 118.45 103.31 128.44 49.74 53.11 27.41 33.09
256 203.34 258.56 221.99 267.12 100.25 120.66 63.39 72.55
512 437.25 552.00 464.30 558.13 226.76 264.30 150.38 156.30
1024 918.32 1175.65 933.05 1182.49 505.68 586.18 306.32 337.07
T22 Lenovo TAB 2 A8-50F
ARM Cortex-A53 1.3 GHz
L1/L2 KB 32/512
Android 5.0.2
64 Bit 32 Bit
K Size SP DP SP DP
Version 1.0
1 0.20 0.21 0.21 0.21
2 0.44 0.50 0.43 0.53
4 1.06 1.26 1.03 1.24
8 2.52 3.03 2.52 2.85
16 5.89 6.41 5.68 6.60
32 14.09 25.29 13.05 30.59
64 49.97 109.32 45.80 92.16
128 188.37 256.98 153.25 221.98
256 447.62 583.33 362.62 504.60
512 826.77 1019.84 840.44 1107.14
1024 1846.27 2299.97 1835.82 2423.72
Version 3c.0
1 0.17 0.20 0.34 0.20
2 0.37 0.48 0.74 0.47
4 2.55 1.07 1.62 1.06
8 1.93 2.40 3.63 2.33
16 4.59 5.64 8.07 9.12
32 10.68 15.40 18.20 22.93
64 28.17 36.16 45.33 50.41
128 66.87 82.23 101.38 112.46
256 148.69 193.91 222.13 264.79
512 347.25 424.72 501.52 550.88
1024 760.74 960.28 1085.65 1206.83
Intel CPUs Android
Dual Boot with W2
A1 Asus MemoPad 7 A5 Teclast X98 Plus
Atom Z3745 1.86 GHz Atom Z8300 1.84 GHz
L1/L2/L324/1024 KB 24/1024/0
Android 4.4.2 Android 5.0 Android 5.1
32 Bit 32 Bit 32 Bit
K Size SP DP SP DP SP DP
Version 1.0
1 0.09 0.11 0.10 0.09 0.09 0.12
2 0.21 0.29 0.16 0.23 0.18 0.31
4 0.61 0.66 0.48 0.52 0.61 0.57
8 1.35 1.17 1.07 1.17 1.17 1.56
16 3.20 2.57 2.38 2.59 3.15 3.34
32 5.41 5.75 5.30 6.02 6.65 9.20
64 11.74 29.95 11.77 28.31 15.62 45.48
128 67.54 99.31 54.05 97.58 49.67 110.14
256 194.13 225.94 189.11 219.98 222.78 264.65
512 438.49 501.59 433.06 487.49 521.72 602.38
1024 970.84 1121.61 968.37 1116.94 1187.13 1433.75
Version 3c.0
1 0.09 0.08 0.10 0.08 0.15 0.13
2 0.21 0.20 0.16 0.20 0.20 0.21
4 0.50 0.43 1.66 0.43 0.45 0.52
8 1.12 0.96 0.87 0.96 0.97 1.05
16 2.64 2.86 2.01 2.34 2.14 2.61
32 4.87 5.56 4.51 5.73 4.82 6.53
64 11.11 15.03 10.01 14.47 11.10 17.79
128 27.29 34.77 26.80 33.71 29.95 43.74
256 62.57 72.93 61.16 72.04 77.43 86.13
512 132.64 157.56 131.10 152.68 152.95 185.74
1024 282.99 332.37 274.01 363.60 314.54 460.91
Intel CPUs - Windows
Dual Boot with A5
W1 Pipo W1S Tablet W2 Teclast X98 Plus
Atom Z8300 1.84 GHz Atom Z8300 1.84 GHz
L1/L2/L3 KB 24/1024/0 KB 24/1024/0
Windows 10 Windows 10
32 bit 64 bit 32 Bit 64 Bit
K Size SP DP SP DP SP DP SP DP
Version 1.0
1 0.11 0.12 0.10 0.12 0.11 0.12 0.10 0.12
2 0.24 0.45 0.23 0.35 0.24 0.34 0.22 0.33
4 0.67 0.75 0.63 0.74 0.65 0.74 0.72 0.74
8 1.44 1.80 1.50 1.69 1.46 1.66 1.37 1.68
16 3.29 3.71 3.16 3.65 3.25 3.61 3.21 3.78
32 7.32 7.83 5.94 6.98 7.33 8.10 6.98 7.97
64 14.36 31.51 13.95 25.44 16.40 28.29 15.96 29.96
128 46.45 120.79 50.90 115.44 38.56 121.13 76.10 136.39
256 209.39 235.36 203.02 266.34 232.47 266.35 259.73 298.24
512 455.89 534.68 491.49 576.91 565.20 597.42 596.50 629.28
1024 1024.78 1195.81 1040.39 1182.20 1205.59 1450.84 1288.20 1439.44
Version 3c.0
1 0.08 0.08 0.08 0.09 0.08 0.09 0.09 0.08
2 0.19 0.20 0.20 0.22 0.19 0.23 0.18 0.19
4 0.46 0.44 0.46 0.48 0.45 0.51 0.48 0.43
8 1.20 0.97 1.06 1.07 1.00 1.12 1.08 0.93
16 2.27 2.26 2.26 2.25 2.67 2.68 2.51 2.50
32 5.11 5.54 5.31 5.83 5.54 5.59 5.74 6.06
64 12.48 14.29 11.22 15.59 10.64 14.72 12.54 14.77
128 27.62 34.25 27.47 31.65 32.82 36.71 28.28 36.95
256 71.32 70.99 62.74 67.95 66.71 77.48 67.25 78.47
512 143.07 144.60 140.50 146.76 157.72 153.43 150.14 168.63
1024 298.00 322.13 289.98 334.07 332.39 365.36 300.79 370.48
2015 Top End Desktop PC
Corei7-4820K 3.9 GHz
L1/L2/L332/256/10 MB
Windows 10
32 bit 64 bit
K Size SP DP SP DP
Version 1.0
1 0.02 0.02 0.02 0.02
2 0.04 0.04 0.04 0.04
4 0.09 0.12 0.08 0.12
8 0.26 0.31 0.25 0.30
16 0.65 0.77 0.62 0.76
32 1.59 1.96 1.51 1.93
64 4.33 4.87 3.91 4.78
128 9.94 10.57 9.21 10.60
256 21.87 22.00 21.01 22.06
512 45.09 55.15 44.72 58.29
1024 105.75 199.77 111.23 199.11
Version 3c.0
1 0.02 0.02 0.01 0.01
2 0.03 0.03 0.03 0.03
4 0.07 0.08 0.06 0.07
8 0.16 0.18 0.14 0.16
16 0.37 0.41 0.33 0.38
32 0.81 0.86 0.73 0.82
64 1.76 1.86 1.56 1.75
128 3.77 4.05 3.38 3.76
256 8.24 9.36 7.38 8.78
512 19.09 22.96 17.28 22.50
1024 45.68 57.37 42.19 56.66
To Start
|
Linux/ARM
Following are results from a Raspberry Pi 2, with version 3c.0 around 30% faster on larger FFTs and SP/DP relationships similar to that on Android based devices.
Raspberry Pi 2 ARM V7 900 MHz, Linux Debian, 32/512 KB L1/L2
Version 1.0 Version 3c.0
K Size SP DP SP DP
1 0.31 0.36 0.35 0.24
2 0.67 0.91 0.78 0.55
4 1.71 2.42 1.82 1.30
8 2.95 3.67 4.02 3.07
16 6.76 9.34 6.38 8.68
32 15.69 37.32 15.50 23.24
64 57.98 130.56 40.70 56.37
128 243.61 347.12 95.91 125.95
256 667.43 808.14 212.87 272.65
512 1553.41 1715.45 456.70 587.89
1024 3220.45 3739.41 987.87 1279.49
|
To Start
Numeric Checks
As indicated earlier, checks of numeric calculations are carried out. These are for angle precision and double precision 1024K sized FFTs. Below is a summary of results from systems initially tested. Differences are due to variations in rounding but should be consistent with a particular benchmark running on the same hardware platform.
Square Check Maximum Noise Average Noise
64 Bit Version 1.0 and 3c.0 Windows and Linux #####
SP 9.999520e-001 3.346482e-006 4.565234e-011
DP 1.000000e+000 1.133294e-023 1.428110e-028
64 Bit Version 1.0 and 3c.0 Linux Intel Atom
SP As ##### 3.346483e-06 As #####
DP As ##### As ##### As #####
64 Bit Version 1.0 and 3c.0 Android 5 Cortex-A53+A104
SP As ##### As ##### As #####
DP As ##### As ##### As #####
32 Bit Version 1.0 and 3c.0 Windows
SP As ##### As ##### As #####
DP As ##### As ##### 1.428095e-028
32 Bit Version 1.0 and 3c.0 Windows, including Atom, V15 compiler
SP As ##### 3.338028e-006 1.043382e-011
DP As ##### As ##### 1.428096e-028
32 Bit Version 1.0 Linux Core i7
SP As ##### As ##### 4.565256e-011
DP As ##### 1.134835e-23 1.428102e-28
32 Bit Version 3c.0 Linux Core i7
SP As ##### As ##### As #####
DP As ##### As ##### As #####
32 Bit Version 1.0 and 3c.0 Linux Phenom, Core 2
SP As ##### As ##### 4.565256e-011
DP As ##### 1.134835e-23 1.428102e-28
32 Bit Version 1.0 and 3c.0 Atom
SP As ##### As ##### 4.569256e-11
DP As ##### 1.134835e-23 1.428088e-28
32 Bit Version 1.0 and 3c.0, Android (so far)
SP As ##### As ##### As #####
DP As ##### As ##### As #####
32 Vit Version Raspberry Pi
SP As ##### 3.346483e-06 As #####
DP As ##### As ##### As #####
|
To Start
MFLOPS and MHz
Earlier, the benchmark program was modified to count the number of floating point operations at each FFT size, these being shown below. Million Floating Point Operations Per Second (MFLOPS), of each measurement, can be calculated from these (Op count/1000/milliseconds). These are shown below, for tests on a Core i7 CPU.
Next are maximum MFLOPS from tests on different systems and calculated MFLOPS/MHz, to reflect efficiency of the different platforms.
As indicated earlier, Android single precision calculations could be slow on small FFTs. This leads to apparent poor relative performance, when comparisons are based on maximum MFLOPS.
Note, using SSE SIMD instructions, the i7 CPU could obtain up to 8 MFLOPS per MHz.
MFLOPS MFLOPS
FFT size FP op count SP DP
1024 53312 4443 3332
2048 116864 4495 3339
4096 254080 4381 3176
8192 549120 4038 3102
16384 1179904 3758 3233
32768 2523648 3690 3223
65536 5374464 3661 3161
131072 11404288 3569 3118
262144 24118272 3462 2983
524288 50857984 3296 2416
1048576 106956800 2810 1642
CPU Core i7 Phenom II Core 2 Atom
MHz 3900 3000 2400 1666
SP DP SP DP SP DP SP DP
MFLOPS 4495 3339 2647 2164 1904 1720 441 312
MFLOPS/MHz 1.15 0.86 0.88 0.72 0.79 0.72 0.26 0.19
CPU Pentium 4 Cortex-A53 64b Cortex-A53 32b Atom Z3745
MHz 1900 1300 1300 1860
SP DP SP DP SP DP SP DP
MFLOPS 726 602 316 261 159 269 573 635
MFLOPS/MHz 0.38 0.32 0.24 0.20 0.12 0.21 0.31 0.34
CPU Qualcomm 800 Cortex-A15 Cortex-A9 ARM V7 RPi2
MHz 2100 1700 1200 900
SP DP SP DP SP DP SP DP
MFLOPS 507 730 241 683 154 258 185 225
MFLOPS/MHz 0.24 0.35 0.14 0.40 0.13 0.21 0.21 0.25
CPU Atom Z3745 64b Atom Z3745 32b Core i7 64b Core i7 32b
OS Windows 10 Windows 10 Windows 10 Windows 10
MHz 1840 1840 3900 3900
SP DP SP DP SP DP SP DP
MFLOPS 666 613 650 658 4443 4443 3652 3554
MFLOPS/MHz 0.36 0.33 0.35 0.36 1.14 1.14 0.94 0.91
|
To Start
|