Linux PC Benchmarks
Contents
General
Both 32-Bit and 64-Bit versions of Ubuntu Linux were installed on an eSATA/USB hard disk and on USB Flash drives, to compile and assemble
existing PC benchmarks
via the compiler and assembler that are included in the package. The booting method used also enabled loading Ubuntu on a range of different PCs and laptops.
The benchmark programs, including source code and compile/link commands, are compressed in .tar.gz format. Copy the latter to your home directory or subdirectory for extraction. Examine the README file for further directions. The benchmarks are simple execution files and do not need installing. The first ones run in a Terminal window via the normal ./name command or via clicking on a shell script, containing the commands. Details are displayed when the tests are running and performance results are save in a .txt file.
To Start
Configuration Details
All benchmarks include the same configuration details, some of which is produced via assembly language code. Example details shown are for an AMD Phenom quad core processor via 32 -Bit Ubuntu and an Intel Core 2 Duo using the 64-Bit version.
######################################################################
Assembler CPUID and RDTSC
CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00100F42
AMD Phenom(tm) II X4 945 Processor
Measured - Minimum 2978 MHz, Maximum 3008 MHz
Linux Functions
get_nprocs() - CPUs 4, Configured CPUs 4
get_phys_pages() and size - RAM Size 7.88 GB, Page Size 4096 Bytes
uname() - Linux, roy-C2D, 2.6.35-22-generic-pae
#35-Ubuntu SMP Sat Oct 16 22:16:51 UTC 2010, i686
Assembler CPUID and RDTSC
CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
Measured - Minimum 2407 MHz, Maximum 2407 MHz
Linux Functions
get_nprocs() - CPUs 2, Configured CPUs 2
get_phys_pages() and size - RAM Size 3.87 GB, Page Size 4096 Bytes
uname() - Linux, roy-64Bit, 2.6.35-22-generic
#33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010, x86_64
Identified with Fedora Linux
uname() - Linux, localhost.localdomain, 2.6.34.7-61.fc13.x86_64
#1 SMP Tue Oct 19 04:06:30 UTC 2010, x86_64
######################################################################
|
To Start
32-Bit and 64-Bit Differences
The main advantage of 64-Bit working is that the amount of main memory installed and accessible is much larger that 32-Bit operation. The downside can be worse performance if integer array variables are defined as 64 bits, leading to twice the data volumes being read and written.
The original x87 floating point instructions are not available using 64-Bit compilations. Instead, SSE instructions are used for 32-Bit Single Precision (SP) floating point numbers and SSE2 for 64-Bit Double Precision (DP). These are potentially Single Instruction Multiple Data (SIMD) instructions, where four SP results or two DP results can be produced per clock cycle and, even adds and multiplies linked, with eight or four results. Unfortunately, it seems that only Single Instruction Single Data (SISD) operations are issued, where only one number is used in the 128 bit registers, and this can lead to slower performance than a program compiled for 32-Bits with x87 instructions.
The main performance gains at 64-Bits appears to be the provision of twice as many general purpose and SSE registers where, with optimisation options, provides faster speeds through reducing the need to save and reload variables that involve access to slower memory.
Some of these for better and for worse results are reflected in the tables below.
To Start
Classic Benchmarks
The Classic Benchmarks are the first programs used to measure relative performance of computers. They are:
Livermore Kernels (Livermore Loops) - Produced for the first supercomputers and comprising 14 kernels in 1970, then 24 in the 1980s. The 24 kernels are run at three different data sizes. Results are in Millions of Floating Point Operations Per Second (MFLOPS) with one measurement for each kernel and some overall figures, where Geometric Mean is the official overall rating.
Whetstone Benchmark - the first general purpose benchmark that set industry standards of performance, particularly for minicomputers, and introduced in 1972. The benchmark produced speed ratings in terms of Thousands of Whetstone Instructions Per Second (KWIPS). In 1978, self timing versions (by yours truly) produced speed ratings, for each of the eight test procedures, in MOPS (Millions of Operations Per Second) or MFLOPS, with an overall rating in MWIPS.
Dhrystone Benchmarks 1.1 and 2.1 - The Dhrystone benchmark, a sort of Whetstone without floating point, became the key standard benchmark, from 1984, with the growth of Unix systems. The second version (2.1) was produced to avoid over-optimisation problems encountered with version 1.1. Original performance ratings were in terms of Dhrystones per second. This was later changed to VAX MIPS by dividing Dhrystones per second by 1757, the DEC VAX 11/780 result.
Linpack Benchmark - This benchmark was produced from the "LINPACK" package of linear algebra routines. It became the primary benchmark for scientific applications from the mid 1980's with a slant towards supercomputer performance, with speed measured in MFLOPS.
Further details and references can be found in
classic.htm
On starting execution, the programs go through a calibration phase to determine the number of passes to run for more than 2 seconds with Dhystone, 1 second for each of 8 tests with Linpack, 1 second for each of 72 tests with Livermore Loops and 10 seconds overall with Whetstone. Displayed results demonstrate that running time is proportional to the number of passes.
For the benchmark execution codes and source files, download
classic_benchmarks.tar.gz.
Four execution files are provided for each benchmark. They comprise 32-Bit and 64-Bit compilations, non-optimised and optimised varieties.
On downloading to Windows, the file appeared as classic_benchmarks.tar.tar but seemed to be fine with the name changed to classic_benchmarks.tar.gz.
To Start
Classic Benchmark Results
Results of these Linux based benchmarks are included with those run via Windows in the following reports. Some examples are given below, all for using 1 CPU of a 2.4 GHz Core 2 Duo.
Whetstone Benchmark Optimised
MWIPS MFLOP MFLOP MFLOP COS EXP FIXPT IF EQUAL
1 2 3 MOPS MOPS MOPS MOPS MOPS
32 Bit 2280 815 811 576 56.5 22.6 4011 7413 3651
64 Bit 2560 865 885 589 65.7 29.1 3851 5314 1078
Livermore Loops MFLOPS 24 Kernels Optimised
Loop
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
32 Bit 1953 1223 1584 1534 343 1238 2192 2385 2147 1187 795 479
161 396 276 956 1368 959 509 385 1385 165 1182 560
64 Bit 1702 1340 1593 1531 341 1199 2422 3060 2057 770 798 861
481 673 444 992 1029 1222 461 423 1251 351 1184 819
Dhrystone Linpack
Dhry1 Dhry1 Dhry2 Dhry2
NoOpt Opt NoOpt Opt
VAX VAX VAX VAX No Opt Opt
MIPS MIPS MIPS MIPS MFLOPS MFLOPS
32 Bit 3428 13599 3348 5852 404 1288
64 Bit 3643 18738 3288 12265 378 1577
|
To Start
Maximum CPU Speeds
Benchmarks whatcpu32 and whatcpu64 are essentially the same as cpuid and cpuid64, produced for Windows, with description and results in
WhatCPU results.htm.
The programs were written with a view towards demonstrating maximum CPU performance executing all types of arithmetic instructions. The execution files and source code are available for download in
max_cpu_speeds.tar.gz.
The benchmark programs use assembler level instructions, including full SIMD operations where appropriate, to simply add values via 1, 2, 3 and 4 registers. Results are in MIPS and MFLOPS, millions of adds per second in both cases. The programs also check that the end totals are correct. The 32 bit version adds 32 bit integers, then 32 bit single precision and 64 bit double precision floating point numbers using the original x87 instructions. This is followed by adding 32 bit integers using MMX and SSE2 instructions and 64 bit integers also using SSE2 functions. Finally there are 32 bit floating point additions using SSE instructions plus 3DNow, using AMD processors, and 64 bit floating point sums with SSE2 operations.
MMX, x87 and 3DNow instructions are not available at 64 bit working, but normal integer instructions are provided to use 64 bit numbers which, in the case of this register based program, mainly run at the same speed as with 32 bit arithmetic.
Results below are for an AMD Phenom X4 and Intel Core 2 Duo, using one CPU in each case. These suggest three integer adds and two 64 bit MMX operations can be executed per clock cycle. Then SSE/SSE2 floating point calculation speed is based on one 128 bit register dealt with per cycle. Best is eight 32 bit SSE integer adds per cycle.
Here, the AMD processor appears to be more efficient than the Intel CPU, but later Intel i7
32 bit and
64 bit
results correct some of this anomaly.
More Linux results are available.
Phenom II X4 945 3.0 GHz Core 2 Duo 2.4 GHz
Speeds adding to 1 Reg 2 Reg 3 Reg 4 Reg 1 Reg 2 Reg 3 Reg 4 Reg
32 bit Version
32 bit Integer MIPS 3314 6629 8664 9040 2629 4915 5356 6605
32 bit x87 MFLOPS 753 1506 2259 3013 801 1601 2402 2402
64 bit x87 MFLOPS 753 1506 2259 3013 801 1601 2402 2402
32 bit MMX Int MIPS 3012 6026 9036 12054 4726 7116 8772 8734
32 bit SSE2 Int MIPS 6024 12050 18073 24107 9490 13769 17545 17469
64 bit SSE2 Int MIPS 3012 6025 9037 12053 2402 4575 4586 4575
32 bit SSE MFLOPS 3012 6024 9037 12050 3202 6405 9608 9608
64 bit SSE2 MFLOPS 1506 3012 4518 6025 1601 3202 4804 4804
32 bit 3DNow MFLOPS 1506 3012 4518 6025
64 bit Version
32 bit Integer MIPS 3315 6629 8664 9040 2601 4410 5226 6606
64 bit Integer MIPS 3315 6629 7701 8287 2612 3908 5525 5285
32 bit SSE2 Int MIPS 6025 12053 18081 24107 9490 14641 17527 17471
64 bit SSE2 Int MIPS 3013 6027 9040 12053 2402 4576 4585 4576
32 bit SSE MFLOPS 3013 6025 9040 12053 3202 6405 9609 9609
64 bit SSE2 MFLOPS 1506 3013 4519 6027 1601 3202 4804 4804
|
To Start
OpenMP Benchmarks
OpenMP is a system independent set of procedures and software that arranges automatic parallel processing of shared memory data when more than one processor is provided. This option is available in the C/C++ compiler included in the Linux Ubuntu Distribution.
In each case, four benchmarks are provided, compiled with and without OpenMP options, to run on 32 bit and 64 bit systems.
The execution files and source code along with compile and run instructions can be downloaded in
linux_openmp.tar.gz.
Details and results are provided in
linux_openmp benchmarks.htm
and a summary follows.
MemSpeed
MemSpeed benchmark
employs three different sequences of operations, on 64 bit double precision floating point numbers, 32 bit single precision numbers and 32 bit integers via two data arrays.
It uses data volmes of 4 KBytes upwards to indicate performance via caches and RAM.
This version is a variation with evaluation mainly concentrating on the formula x[m] = x[m] + r * y[m].
Below is a sample log file with the 64 bit benchmark using four CPUs. The extremely slow performance at the smaller data sizes is due to the relatively high startup overheads of OpenMP.
The 32 bit version produces even slower performance relative to the non-OpenMP compilation.
Original OpenMP Benchmark
The original benchmark
used larger data array sizes of 0.4, 4.0 and 40 MBytes with 2, 8 and 32 floating point calculations per word (4 Bytes). The 32 bit version behaved in a similar way to the Windows compilation, showing performance gains of a four core processor of up to four times that of a single CPU.
The 64 bit OpenMP version behaved in a similar manner to the 32 bit variation but appears to be relatively worse on comparing with speeds produced by the normal compilation.
The reason is that the latter produces full SIMD operation, with four calculations per clock cycle, and the former SISD with one calculation per clock.
(See above, where SIMD was not produced). Examples of results are given below.
Memory Reading Speed Test 64 Bit Version 1 by Roy Longbottom
Start of test Sun Dec 5 12:26:36 2010
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int64 Dble Sngl Int64 Dble Sngl Int64
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
4 2413 2340 2426 2408 2371 2593 1301 1302 1306
8 4642 4379 4655 4739 4488 5045 2562 2478 2583
16 8321 7942 8513 9215 8412 9668 4989 4695 4982
32 15714 12698 15446 16397 14036 17359 9112 7963 9159
64 25533 18268 24526 26971 21394 28979 16033 12269 16032
128 36147 23064 34023 40018 28460 42871 23255 16389 23172
256 45821 26908 42782 21679 34353 57114 31501 20370 31889
512 46924 28555 46191 55514 35557 54808 33583 22754 33376
1024 45478 28681 45098 48798 34662 47103 25081 22172 24993
2048 36642 26993 36187 36523 32366 36917 18354 17985 18388
4096 30960 24342 30259 32057 26483 32862 17172 15049 17153
8192 22963 20257 22754 23462 21376 23910 12203 11223 12176
16384 8927 8774 8888 8947 8803 8951 4469 4454 4487
32768 8938 8817 8875 8963 3681 8964 4494 4465 4488
65536 8956 8863 8910 8959 8849 8981 4500 4474 4502
131072 8979 8918 8951 8830 8808 9022 4513 4494 4517
262144 8784 8657 8706 8760 8826 8919 4436 4422 4433
524288 8774 8478 8789 8732 8643 8864 4374 3703 4435
1048576 8664 8559 8617 8689 8612 8678 4368 4360 4336
2097152 8661 8631 8643 8611 8597 8692 4364 4368 4367
Linux OpenMP MFLOPS 3 GHz Quad Core Phenom
32 Bits 64 Bits
Data Ops/ 1 CPU 1 CPU 2 CPUs 4 CPUs 1 CPU 1 CPU 2 CPUs 4 CPUs
Words Word *Norm OMP OMP OMP *Norm OMP OMP OMP
100000 2 2439 1903 3575 5758 7624 1974 3597 5769
1000000 2 2231 1787 3588 6710 4686 1913 3843 6674
10000000 2 1739 1509 2490 3062 2195 1590 2566 2944
100000 8 3348 3518 6963 13353 14357 3437 6835 12126
1000000 8 3195 3453 6943 13524 13376 3375 6802 12420
10000000 8 3080 3308 6541 11311 7473 3219 6379 10976
100000 32 3881 3794 7566 14896 15336 3552 7084 13494
1000000 32 3853 3774 7554 14969 15009 3533 7079 13540
10000000 32 3817 3735 7465 14883 14318 3490 6970 13450
Instructions FPU FPU FPU FPE SIMD SISD SISD SISD
x87 x87 x87 x87 SSE SSE SSE SSE
*Norm OpenMP Directives not used
|
To Start
BusSpeed Benchmark
This benchmark is particularly designed to identify reading data in bursts over buses, with a 32 bit version using 32 bit integer words and one for 64 bits using 64 bit numbers. The program starts by reading a word, with address increments of 32 words for the next data. The increment is reduced to 16 words then halving until all data is read. The last test reads all data but using SSE2 instructions.
Below are 64 bit results on a Core 2 Duo, with sample results at 32 bits and both varieties on a Phenom processor. The data burst size over the memory bus is indicated at the point where performance becomes constant, like Inc8wds at 64 bits and Inc16wds at 32 bits, both suggesting 512 bits or 64 bytes. Burst reading speed is eight times the constant speed at 64 bits and 16 times at 32 bits, or around 6400 MB/second for the Core 2 Duo and 7200 for the Phenom. There also appears to be some burst reading from data in L2 cache.
Speeds via L1 cache are fairly constant up to ReadAll, indicating no burst reading but, with the data transfer speed at 32 bits being twice that for 64 bits, a constant instruction execution speed is suggested. This, in MIPS, is slightly less than CPU MHz for the Core 2 Duo and somewhat higher than MHz on the Phenom. The SSE2 test is identical at both bit versions with the Core 2 Duo showing better efficiency at nearly four 32 bit results (1 SSE register full) per CPU clock cycle.
The 32 bit and 64 bit benchmarks, source code and instructions can be downloaded in
memory_benchmarks.tar.gz.
with more details and results in
Linux Results BusSpeed
Speed in MB/Second - For MIPS 64 bit divide by 8 and 32 bit divide by 4
Core 2 Duo 2400 MHz - 1 CPU
Bus Speed Test 64 bit Version 2.0 Thu Dec 16 23:09:19 2010
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 15997 17525 18167 18540 18734 18804 37355
24 17759 18484 17865 17822 18531 18526 37980
96 4189 4158 4107 6724 9128 13435 19175
384 4182 4137 4091 6721 9133 13450 19206
768 4109 4123 4094 6723 9129 13448 19229
1536 3883 4086 4039 6643 9011 13280 18913
16380 657 691 800 1626 2949 5445 5882
131070 693 711 803 1622 2942 5440 5874
393210 698 713 803 1623 2948 5444 5865
Bus Speed Test 32 bit Version 2.0 - L1 cache, L2 cache and RAM
6 8568 9076 9176 9315 9412 9433 37350
96 2112 2053 3277 4561 6714 8097 19170
393210 356 401 815 1474 2730 5091 5870
Phenom II X4 3000 MHz - 1 CPU
Bus Speed Test 64 bit Version 2.0 - L1 cache, L2 cache and RAM
6 21407 22690 26285 27053 27050 26435 23784
96 2992 2973 2991 5992 11780 20725 23813
393210 869 901 918 1791 3729 6264 7391
Bus Speed Test 32 bit Version 2.0 - L1 cache, L2 cache and RAM
6 11287 12793 13466 13625 13407 13281 23648
96 1494 1490 2974 5854 10509 13147 23781
393210 447 453 901 1830 3097 5206 7276
|
To Start
RandMem Benchmark
RandMem benchmark carries out eight tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests for 32 bit integers and 64 bit floating point numbers. In both cases, 32 bit integers are used.
The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is redundant, and by the size of preceding caches.
Below, all 64 bit results are shown for a Phenom along with sample speeds at 32 bits and for a Core 2 Duo at 64 bits. Many of the low order speeds are similar at 32 bits and 64 bits but, using RAM, some relationships change, with integer random access becoming progressively worse at 64 bits. The lower GHz Core 2 Duo performs better on some tests.
The 32 bit and 64 bit benchmarks, source code and instructions can be downloaded in
memory_benchmarks.tar.gz
with more details and results in
Linux Results RandMem.
AMD Phenom(tm) II X4 945 Processor 3.0 GHz
Random/Serial Memory Test 64 Bit Version 2 Tue Dec 14 17:21:46 2010
Integer....................... Double/Integer................
Serial........ Random........ Serial........ Random........
RAM Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt
KB MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec
6 12542 9137 12636 9066 16812 13621 16795 13621
12 12613 9165 12676 9137 17022 13705 17013 13673
24 12647 9179 12734 9157 17129 13720 17130 13694
48 12664 9186 12775 9161 17183 13728 17183 13719
96 11989 8464 6866 5221 16934 11776 16496 11888
192 7778 8434 3703 3177 16902 11747 7146 6132
384 7778 8437 3001 2749 16918 11671 5116 4730
768 4956 7348 1954 1900 9978 9459 3670 3591
1536 4763 7201 1404 1388 9748 9346 2488 2474
3072 4016 6914 1078 1045 9531 9200 2048 2043
6144 3668 6769 750 661 9004 8719 1405 1280
12288 2771 3636 590 502 6688 5495 1012 848
24576 2850 3592 504 450 6706 5506 841 736
49152 2858 3583 439 402 6719 5332 727 659
98304 2679 3536 333 307 6697 5490 612 564
196608 2729 3548 266 241 6945 5445 459 422
393216 2866 3559 229 200 6931 5490 377 336
786432 2870 3547 192 167 6938 5499 327 283
At 32 bits
6 14488 11399 12852 11133 16741 20258 16789 19825
96 11088 9912 6861 5520 16960 16197 16554 14645
1536 8044 7528 1410 1390 9668 9223 2475 2461
393216 4296 3575 281 258 6668 5497 491 458
786432 4296 3562 238 212 6841 5492 396 361
Intel Core 2 CPU 6600 @ 2.40GHz
At 64 bits
6 9142 12213 9154 5161 13728 16211 13727 15654
96 8019 9473 4113 3701 11381 11971 7382 6419
1536 7978 8586 2691 2497 11269 11044 4760 4222
393216 3285 2273 238 207 5705 2999 503 374
786432 3297 2277 149 152 5637 3001 297 281
|
To Start
SSEfpu Benchmark
This is a variation of the
SSE3DNow Benchmark
with extensions but excluding AMD 3DNow tests. The benchmark measures Single Precision (SP) and Double Precision (DP) Floating Point speeds, data streaming from caches and RAM. It uses SSE (SP) and SSE2 (DP) instructions, along with compiled C code that produces the old x87 instructions at 32 bits and SSE type for working on a 64 bit system.
The additional tests avoid intermediate register to register operations using s=(s+x[m])*y[m] and s=s+x[m]+y[m] to produce much faster speeds.
The AMD processor performs relatively better on the extra test, with linked add and multiply, at 7.11 floating point results per clock cycle.
The 32 bit and 64 bit benchmarks, source code and instructions can be downloaded in
memory_benchmarks.tar.gz
with more details and results in
Linux Results SSEfpu.
AMD Phenom(tm) II X4 945 Processor 3.0 GHz
SSE & SSE2 Memory Reading Speed Test 64-Bit Version 2.0
0.100 seconds per test, Start Tue Dec 21 12:18:05 2010
Memory --s=s+x[m]*y[m]--- --x[m]=x[m]+y[m]-- (s+x[m])?y[m]
KBytes SSE2 SSE Sngl SSE2 SSE Sngl +*SSE ++SSE
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
4 22773 22689 6156 43460 42950 23333 66361 41700
8 23421 23377 6089 45716 45433 23624 78620 44642
16 23623 23691 6059 42561 42562 23724 84534 45885
32 23834 23827 6043 45141 45140 23797 82980 46315
64 23921 23918 6035 44686 45478 23823 85405 46897
128 23859 23901 6029 22154 22157 17973 23785 23782
256 23821 23764 6027 21555 21535 18026 23888 23889
512 19300 19264 6010 17865 17840 16359 19219 19222
1024 10376 10379 5965 10168 10168 10228 10371 10373
2048 10369 10372 5966 10163 10163 10236 10369 10368
4096 10261 10281 5862 9975 9975 10025 10278 10278
8192 8053 8190 5362 6841 6836 6863 8029 8027
16384 7985 8095 5327 6572 6569 6651 7848 7883
32768 8074 8099 5314 6424 6531 6660 7858 7928
65536 8148 8151 5321 6599 6607 6674 7961 7961
131072 8092 8159 5320 6585 6412 6484 7891 7936
262144 8112 8173 5318 6580 6556 6665 7887 7960
524288 8117 8042 5327 6607 6604 6689 7861 7961
1048576 8147 8108 5328 6535 6581 6668 7941 7816
SSE2 SSE Norm SSE2 SSE Norm SSE SSE
Maximum DP SP SP DP SP SP SP SP
MFLOPS 2990 5980 1539 2857 5685 2978 21351 11724
MFLOPS/MHz 0.99 1.99 0.51 0.95 1.95 0.99 7.11 3.90
MB/sec at 32 bits
Different #####
8 23188 23276 6057 45641 43156 11688 78703 44729
128 23634 23692 5997 22418 22250 9893 23671 23664
1024 10248 10254 5930 10056 10053 8682 10253 10253
131072 8258 8276 5389 6680 6698 6098 7909 8091
Intel Core 2 CPU 6600 @ 2.40GHz
At 64 bits
Different ##### ##### #####
8 25420 25368 6506 37691 37692 13152 36503 36637
128 18481 18655 6406 17105 17107 12704 19725 19744
1024 18517 18749 6391 17136 17137 12690 19803 19822
131072 6444 6419 5455 3955 3956 3863 6399 6393
Maximum
MFLOPS/MHz 1.32 2.64 0.68 0.98 1.96 0.68 3.80 3.81
|
To Start
nVidia CUDA Benchmarks and Burn-in Tests
CUDA, from nVidia, provides programming functions to use GeForce graphics processors for general purpose computing. These functions are easy to use in executing arithmetic instructions on numerous processing elements simultaneously. This is for Single Instruction Multiple Data (SIMD) operation, where the same instructions can be executed simultaneously on sections of data from a data array. For maximum speeds, the data array has to be large and with little or no references to graphics or host CPU RAM. To assist in this, CUDA hardware provides a large number of registers and high speed cache like memory.
The benchmarks measure floating point speeds in Millions of Floating Point Operations Per Second (MFLOPS). They demonstrates some best and worst case performance using varying data array size and increasing processing instructions per data access. There are five scenarios - New Calculations with data in and out, Update Data with just data out, Graphics Only Data using only graphics RAM and two extra tests with lower overheads.
The tests are run at three different data sizes, defaults 100,000 words repeated 2500 times, 1M words 250 times and 10M words 25 times. The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2, 8 or 32 adds or subtracts and multiplies on each data element. The Extra Tests are only run using 10M words repeated 25 times.
The 32 and 64 bit benchmarks, source code and instructions can be downloaded in
linux_cuda_mflops.tar.gz
with more details and results in
linux_cuda_mflops.htm,
the latter showing how to use the benchmarks as reliability/burn-in tests. Example results are below.
Linux CUDA 3.2 x64 32 Bits SP MFLOPS Benchmark 1.4 Wed Dec 29 15:35:35 2010
CUDA devices found
Device 0: GeForce GTS 250 with 16 Processors 128 cores
Global Memory 999 MB, Shared Memory/Block 16384 B, Max Threads/Block 512
Using 256 Threads
Test 4 Byte Ops Repeat Seconds MFLOPS First All
Words /Wd Passes Results Same
Data in & out 100000 2 2500 1.035893 483 0.9295383095741 Yes
Data out only 100000 2 2500 0.514445 972 0.9295383095741 Yes
Calculate only 100000 2 2500 0.082464 6063 0.9295383095741 Yes
Data in & out 1000000 2 250 0.706176 708 0.9925497770309 Yes
Data out only 1000000 2 250 0.380928 1313 0.9925497770309 Yes
Calculate only 1000000 2 250 0.051266 9753 0.9925497770309 Yes
Data in & out 10000000 2 25 0.639933 781 0.9992496371269 Yes
Data out only 10000000 2 25 0.339051 1475 0.9992496371269 Yes
Calculate only 10000000 2 25 0.041672 11999 0.9992496371269 Yes
Data in & out 100000 8 2500 1.013196 1974 0.9569796919823 Yes
Data out only 100000 8 2500 0.490317 4079 0.9569796919823 Yes
Calculate only 100000 8 2500 0.088028 22720 0.9569796919823 Yes
Data in & out 1000000 8 250 0.666709 3000 0.9955092668533 Yes
Data out only 1000000 8 250 0.351320 5693 0.9955092668533 Yes
Calculate only 1000000 8 250 0.052704 37948 0.9955092668533 Yes
Data in & out 10000000 8 25 0.620265 3224 0.9995486140251 Yes
Data out only 10000000 8 25 0.335467 5962 0.9995486140251 Yes
Calculate only 10000000 8 25 0.044453 44992 0.9995486140251 Yes
Data in & out 100000 32 2500 1.057142 7568 0.8900792598724 Yes
Data out only 100000 32 2500 0.531691 15046 0.8900792598724 Yes
Calculate only 100000 32 2500 0.128706 62157 0.8900792598724 Yes
Data in & out 1000000 32 250 0.688714 11616 0.9880728721619 Yes
Data out only 1000000 32 250 0.375411 21310 0.9880728721619 Yes
Calculate only 1000000 32 250 0.075172 106423 0.9880728721619 Yes
Data in & out 10000000 32 25 0.644074 12421 0.9987990260124 Yes
Data out only 10000000 32 25 0.357000 22409 0.9987990260124 Yes
Calculate only 10000000 32 25 0.062001 129029 0.9987990260124 Yes
Extra tests - loop in main CUDA Function
Calculate 10000000 2 25 0.050288 9943 0.9992496371269 Yes
Shared Memory 10000000 2 25 0.009206 54313 0.9992496371269 Yes
Calculate 10000000 8 25 0.049608 40316 0.9995486140251 Yes
Shared Memory 10000000 8 25 0.017254 115916 0.9995486140251 Yes
Calculate 10000000 32 25 0.050531 158320 0.9987990260124 Yes
Shared Memory 10000000 32 25 0.046626 171580 0.9987990260124 Yes
|
To Start
Disk, Bus and LAN Benchmarks
These benchmark tests are based on those produced for Windows, where details and results can be found in
DiskGraf Results.htm and
CDDVDSpd Results.htm.
The tests comprise:
- Writing and Reading Large Files - Five files each of 8 MB, 16 MB and 32 MB are used.
System is instructed not to cache the data.
- Writing and Reading Cached Data - Five files of 8 MB are used. Performance normally
reflects memory speed.
- Reading Bus Speed - The same data is read repetitively at block sizes between 64 KB and
1 MB. This normally reads data from the disk’s buffer to show maximum bus speeds.
- Random Reading Speed - 1 KB blocks are read randomly from 7 file sizes between 2 MB
and 128 MB. Results reflect the disk's buffer size and rotation speed.
- Writing and Reading Small Files - 500 files are written, read and deleted at 6 different
file sizes each between 2 KB and 64 KB. Besides speed, milliseconds per file is provided to reflect overheads.
- Run time parameters - These are provided to write and read larger files and to specify
the drive and file path to be used.
Besides testing disk and flash memory drives, it was intended to use the (drivespeed) benchmarks for measuring speed over such as Local Area Networks (LANs). In order to avoid data being cached in main memory by the Operating System, the program uses direct I/O (file open parameter O_DIRECT for Linux). This prevented directories being mounted over a LAN, so a second program (lanspeed) was produced, identical except with no direct I/O parameter. Compilations at both 32 bits and 64 bits were produced - drivespeed32, lanspeed32, drivespeed64 and lanspeed64.
The lanspeed tests can be used to measure speeds between Linux platforms and also between Linux and Windows systems. A Windows program, drivespeed32.exe is also provided and this can also be used as a LAN speed test.
The execution files, source code along with compiling and running instructions, can be downloaded in
linux_disk_usb_lan_benchmarks.tar.gz
with more details and results in
linux_disk_usb_lan_benchmarks.htm.
The latest version has an added test to measure Random Writing Speed.
Example results are below.
Current Directory Path:
/media/f816ec76-8bf2-4dd3-9e98-62934909a779/roy/all64/drivespeed2
Total MB 11263, Free MB 9513, Used MB 1750
Linux Storage Speed Test 64-Bit Version 1.1, Tue Feb 1 14:20:39 2011
Copyright (C) Roy Longbottom 2011
8 MB File 1 2 3 4 5
Writing MB/sec 4.33 76.73 76.15 82.40 105.84
Reading MB/sec 57.37 86.62 83.40 80.74 82.34
16 MB File 1 2 3 4 5
Writing MB/sec 73.94 108.16 72.53 116.19 116.12
Reading MB/sec 70.39 103.31 120.31 121.53 121.48
32 MB File 1 2 3 4 5
Writing MB/sec 113.01 76.67 73.20 115.83 116.05
Reading MB/sec 105.19 102.41 113.15 121.55 120.59
---------------------------------------------------------------------
8 MB Cached File 1 2 3 4 5
Writing MB/sec 1271.71 1503.73 1496.38 1493.27 1491.68
Reading MB/sec 3406.70 4015.11 4079.82 4081.24 4080.77
---------------------------------------------------------------------
Bus Speed Block KB 64 128 256 512 1024
Reading MB/sec 84.93 102.31 112.31 121.03 116.41
---------------------------------------------------------------------
1 KB Reads File MB > 2 4 8 16 32 64 128
Random Read msecs 0.43 0.39 0.45 3.01 4.49 5.93 6.69
---------------------------------------------------------------------
500 Files Write Read Delete
File KB MB/sec ms/File MB/sec ms/File Seconds
2 7.54 0.27 7.67 0.27 0.015
4 17.19 0.24 22.27 0.18 0.018
8 20.24 0.40 27.21 0.30 0.017
16 33.27 0.49 47.16 0.35 0.019
32 52.67 0.62 67.20 0.49 0.016
64 55.43 1.18 75.49 0.87 0.015
End of test Tue Feb 1 14:21:29 2011
|
To Start
Burn-In and Reliability Testing Apps
A new set of programs have been designed for soak testing Linux based PCs. The execution files and source code along with compile and run instructions can be downloaded in
linux_burn-in_apps.tar.gz.
Full details and results are provided in
linux burn-in apps.htm.
These programs are intended to stress test CPUs, caches, RAM, buses, disks and other drives using high processing speeds, to induce heating effects, and varying data bit order, to investigate possible pattern conscious faults. Common features are command line options to specify memory/storage demands, running time and different results log file names, for use in multiprocessor tests. Data read and results of calculations are also checked for correct or consistent values. Versions compiled to run on 32-Bit and 64-Bit processors are provided.
Three new programs provided are BurnInSSE, IntBurn and DriveStress but they can also be used in conjunction with program produced earlier. BurnInSSE64 and BurnInSSE32 were compiled to use the same range of SSE floating point instructions, where GCC generates fast execution speeds. The IntBurn tests are based on assembly code with IntBurn32 using 32 bit integers and IntBurn64 accessing a larger number of 64 bit registers.
DriveStress32 and DriveStress64 were compiled from the same C code and measure drive and bus speeds (e.g. SATA or USB) whilst checking data read for correct values.
Earlier programs, that also have reliability testing options and included in the package, are
Livermore Loops and nVidia CUDA Benchmarks.
Successes - Three significant problems were identified during testing. The first was apparent excessive temperatures on a desktop PC, compared with earlier measurements via Windows. This was cured by clearing dust out of the CPU heatsink using a compressed air sprayer. Then there were two Linux Peculiarities that seem to be affected by power saving options. A desktop PC with a Core 2 Duo CPU showed a throughput increase of three times using both cores. Here, using one core with “On-Demand” CPU GHz (via Frequency Scaling Monitor), the processor was running at 1.6 GHz instead of 2.4 GHz. Then a laptop, again with a Core 2 Duo PC, overheated, causing the CPU to run at less than half speed. Unlike using Windows, with power on to Ubuntu, initial CPU temperatures were high with the fan not appearing to run as fast as it might. On an apparent random basis, the laptop started at a lower temperature and did not overheat, with the fan apparently running at high speed.
Paging/Swapping Tests - Running multiple copies of the processor exercise programs, with appropriate parameters to demand more main memory capacity than is available, will lead to data being swapped out/in to/from disk. However, with excessive demands, running times can be unpredictable.
Multitasking Scripts - Examples are provided showing how to mix and match programs and run time parameter to soak test complete systems for as long as is required. They also demonstrate how to organise dynamic displayed results in multiple X terminal windows.
The test programs display and log results of calculations and speeds at regular intervals. Examples are shown below, with interpretation and more details in
linux burn-in apps.htm.
IntBurn
Test 4 KB at 10x2 seconds per test, Start at Thu Mar 17 12:00:59 2011
Write/Read
1 10529 MB/sec Pattern 0000000000000000 Result OK 25705389 passes
2 10579 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 25826660 passes
3 10592 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 25858754 passes
4 10587 MB/sec Pattern 5555555555555555 Result OK 25846727 passes
5 10601 MB/sec Pattern 3333333333333333 Result OK 25880968 passes
6 10602 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 25883259 passes
Max 2236 64 bit MIPS
Read
1 16941 MB/sec Pattern 0000000000000000 Result OK 82719400 passes
2 16946 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 82744300 passes
3 16932 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 82676600 passes
4 16927 MB/sec Pattern 5555555555555555 Result OK 82653700 passes
5 16883 MB/sec Pattern 3333333333333333 Result OK 82439400 passes
6 16857 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 82311300 passes
Max 2515 64 bit MIPS
BurnInSSE
Using 400 KBytes, 32 Operations Per Word, For Approximately 1 Minutes
Pass 4 Byte Ops/ Repeat Seconds MFLOPS First All
Words Word Passes Results Same
1 100000 32 67500 15.10 14304 0.356166393 Yes
2 100000 32 67500 15.11 14296 0.356166393 Yes
3 100000 32 67500 15.09 14312 0.356166393 Yes
4 100000 32 67500 15.33 14091 0.356166393 Yes
DriveStress
File size 10.25 MB x 4 files, minimum reading time 1 minutes
File 1 10.25 MB written in 0.12 seconds
File 2 10.25 MB written in 0.14 seconds
File 3 10.25 MB written in 0.11 seconds
File 4 10.25 MB written in 0.14 seconds
Start Reading Sun Apr 17 20:06:07 2011
Read passes 18 x 4 Files x 10.25 MB in 0.25 minutes
Read passes 36 x 4 Files x 10.25 MB in 0.51 minutes
Read passes 54 x 4 Files x 10.25 MB in 0.76 minutes
Read passes 72 x 4 Files x 10.25 MB in 1.01 minutes
Start Repeat Read Sun Apr 17 20:08:08 2011
Passes in 1 second(s) for each of 164 blocks of 64KB:
1440 1480 1480 1480 1480 1400 1480 1480 1480 1460 1380
1480 1480 1460 1480 1440 1440 1480 1480 1480 1440 1460
1480 1440 1480 1460 1500 1460 1480 1760 1540 1480 1480
1440 1480 1480 1480 1480 1460 1440 1480 1480 1480 1460
+ another 120 results
No errors found during reading tests
|
To Start
Multithreading Benchmarks
These multithreading tests are based on the above benchmarks, in turn,
Maximum CPU Speeds,
Whetstone Classic Benchmark,
Original OpenMP Benchmark,
MemSpeed Benchmark,
BusSpeed Benchmark and
RandMem Benchmark.
For further details, sample results, benchmark programs, source code and instructions see
linux multithreading benchmarks.htm and
linux_multithreading_apps.tar.gz.
Six benchmarks are provided that can run using up to 64 concurrent threads, with versions compiled to run using 64 bit or 32 bit systems. Performance is mainly measured as Millions of Instructions Per Second (MIPS), Millions of Floating Point Operations Per Second (MFLOPS) or Millions of Bytes per Second (MB/S).
Simple Add Tests - execute 32 bit or 64 bit integer instructions and 128 bit SSE floating point functions via assembly language. These use simple add operations with little access to external data. Resultant performance is generally proportional to the number of CPU cores with some gains also identified when Hyperthreading is available. Each thread executes independent code.
Whetstone Benchmark - is the first general purpose benchmark that set industry standards of computer system performance, mainly dependent on floating point speed but with some independently timed integer test functions. Data used is generally contained in L1 cache with performance gains again proportional to the number of cores. Each thread again executes independent code.
MP MFLOPS Program - uses the same functions as my CUDA and OpenMP benchmarks, comprising routines with 2, 8 and 32 add or multiply floating point calculations with data from higher level caches or RAM. The 64 bit version compiles using SSE floating point, where up to 6 MFLOPS per CPU MHz per core can be produced. The 32 bit program uses the much slower original 80387 FPU instructions. These programs can also be used as burn-in/reliability tests. Each thread executes the same functions but on a different segment of the data,
MP Memory Speed Tests - employ three sequences of operations, using double and single precision floating point numbers and integers, on data sized between 4 KB and 25% of RAM size. The operations are memory to memory transfers with 0, 1 and 2 arithmetic calculations. The 64 bit version again uses SSE functions but not as efficiently as MP MFLOPS. Again each thread has the same procedures using different segments of the data.
MP Memory Bus Speed Tests - read data at a range of sizes covering caches and RAM. Data is accessed with varying address increments to identify reading data in bursts over the bus and allow estimation of maximum bus/memory speed. This time, each thread reads all the data. The 64 bit version uses the double size 8 byte words, where data transfer speed can be twice that of the 32 bit compilation, demonstrating that 32 and 64 bit integer instructions can execute at the same speed.
MP Memory Random Access Speed Benchmark - comprises serial and random access read and read/write tests that cover cache and RAM data sizes. All threads access the same data but starting at different points. In this case, data could be corrupted with concurrent updates, but the Operating System appears to flush caches to avoid this, producing extremely slow performance. Extra tests (Mutex) avoid this conflict by executing one read/write test at a time, leading to some slower and some faster speeds. Random access can be affected by burst reading/writing with associated poor performance.
Examples of results log format on a quad core 3.0 GHz Phenom II are given below.
Simple Add Tests
Multithreading Add Test 64 bit Version 1.0 Thu May 5 11:35:18 2011
Integer Additions 4 Threads
Thread 4 - 8281 64 bit Integer MIPS
Thread 2 - 7996 64 bit Integer MIPS
Thread 1 - 7815 64 bit Integer MIPS
Thread 3 - 7800 64 bit Integer MIPS
Total - 31892 64 Bit Integer MIPS
Aggregate - 31201 64 Bit Integer MIPS, based on last to finish
SSE Floating Point Additions 4 Threads
Thread 2 - 12030 32 Bit SSE MFLOPS
Thread 3 - 11976 32 Bit SSE MFLOPS
Thread 4 - 11861 32 Bit SSE MFLOPS
Thread 1 - 11692 32 Bit SSE MFLOPS
Total - 47559 32 Bit SSE MFLOPS
Aggregate - 46770 32 Bit SSE MFLOPS, based on last to finish
Whetstone MP Benchmark
Multithreading Single Precision Whetstones 64-Bit Version 1.0
Using 4 threads - Sat May 14 12:03:51 2011
MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Thread 1 2 3 MOPS MOPS MOPS MOPS MOPS
1 2861 927 872 747 71 38 2947 2259 629
2 2865 875 892 745 71 38 3294 2198 641
3 2875 869 892 744 71 38 3408 2202 645
4 2896 906 895 744 72 38 3141 2232 651
Total 11496 3577 3550 2979 285 151 12790 8891 2566
MWIPS 11389 Based on time for last thread to finish
MP MFLOPS Benchmark
64 Bit MP SSE MFLOPS Benchmark 1, 4 Threads, Tue May 17 19:00:43 2011
Test 4 Byte Ops/ Repeat Seconds MFLOPS First All
Words Word Passes Results Same
Data in & out 102400 2 10000 0.091754 22321 0.764063 Yes
Data in & out 1024000 2 1000 0.136134 15044 0.970753 Yes
Data in & out 10240000 2 100 0.632075 3240 0.997008 Yes
Data in & out 102400 8 10000 0.167023 49047 0.850923 Yes
Data in & out 1024000 8 1000 0.176219 46488 0.982342 Yes
Data in & out 10240000 8 100 0.658828 12434 0.998200 Yes
Data in & out 102400 32 10000 0.558509 58670 0.660143 Yes
Data in & out 1024000 32 1000 0.556450 58888 0.953631 Yes
Data in & out 10240000 32 100 0.722131 45377 0.995203 Yes
MP Memory Speed
MP Memory Reading Speed Test 64 Bit Version 1 Using 4 Threads
Start of test Tue Jun 7 11:32:54 2011
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int64 Dble Sngl Int64 Dble Sngl Int64
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
4 15704 11347 10961 17813 12518 15904 13744 8714 8758
8 24188 15367 14929 26770 17870 21025 20789 10866 10234
16 33319 19229 18266 38724 23589 23124 31390 13114 13157
32 40697 20675 21180 51120 27260 25282 39385 13921 13960
65 45013 22913 22267 57143 30132 24875 42247 14314 14241
131 45569 23573 22953 61979 31356 27585 44688 14427 13289
262 48701 23759 22666 63235 32103 27892 44447 14200 14453
524 44900 22996 20417 53167 30753 25832 36085 14671 13403
1048 44929 23357 20300 54596 30302 25790 36207 14708 13590
2097 42017 22864 20927 42429 28809 24778 26734 13125 12659
4194 34909 20379 19542 36402 25268 21093 18592 12625 12821
8388 22498 17592 17006 23354 19577 18854 12489 9400 9657
16777 8906 8697 8781 8884 8841 8844 4433 4217 4440
33554 8848 8684 8606 8877 8436 8843 4412 4293 4422
67108 8423 8445 8433 8685 8506 8526 4228 4296 4273
134217 8704 8453 8572 8563 8426 8485 4383 4303 4346
268435 8623 8579 8539 8731 8652 8612 4408 4301 4322
536870 8683 8331 8534 8724 8658 8444 4371 4330 4325
MP Memory Bus Speed
MP Bus Speeds 32 bit Version 1.0, 4 Threads, Fri Jun 17 16:44:21 2011
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 3901 7614 14703 28644 29313 34882 74424
24 7466 14648 28660 29468 37750 40926 79860
96 4648 5085 8422 19230 33948 39486 74050
384 4774 5131 9864 19142 32406 41067 82021
768 2726 2746 5361 9874 17152 30193 42259
1536 2407 2543 4943 10058 17570 29261 41159
16380 812 837 1684 3635 6772 12743 16252
131070 786 813 1605 3444 6259 12161 14950
393210 807 855 1649 3333 6234 11625 14892
MP Memory Random Access
RandMemMP Speeds 64 Bit Version 1, 4 Threads, Sun Jun 26 18:00:21 2011
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
Serial RD 29630 53166 44120 44829 29620 29671 12108 11987
Serial RW 5040 7334 7442 7402 7353 7395 8532 6247
Random RD 28388 41211 27807 12265 8866 6611 2103 1271
Random RW 657 1096 1229 1283 1288 1376 1648 993
Mutex SRW 5962 8654 7998 7882 6982 6853 3579 3415
Mutex RRW 6243 8594 5838 2815 1970 1370 486 310
|
To Start
Image Processing Benchmarks
SDL_bmpspd32 and SDL_bmpspd64 benchmarks execute the same tests as the Windows version, where details and results can be found in
bmpspeed results.htm.
They are 32 bit and 64 bit varieties compiled to run under Linux using Simple DirectMedia Layer (SDL) functions. The benchmarks generate BMP files and measure speed of saving, loading, scrolling, rotating and editing of 0.5, 1, 2, 4 etc. to 512 MB images.
The programs automatically adjust maximum image size used, depending on available main memory, but run time parameters can be used to change this.
The execution files, source code, compilation and running instructions can be found in
linux_image_processing_benchmarks.tar.gz
with further details in
linux image processing benchmarks.htm. Example results are below.
Besides the standard Configuration Details shown earlier, additional attributes, obtained for this benchmark, are determined and included in the following example results.
Hardware benchmarked for
the main report
were desktops, a laptop and a netbook using internal and external (eSATA) disk drives plus usb flash memory and disk drives. Linux versions used were 32-Bit and 64-Bit Ubuntu 10.10 with GNOME 2, 64-Bit Ubuntu 11.04 with Unity on two different graphics arrangements, 64-Bit Fedora 14 with GNOME 2 and 64-Bit OpenSuse 11.4 with KDE.
Additional System Details
#####################################################################
Memory stats from /proc/meminfo
MemTotal: 3963.8 MB A
MemFree: 3181.8 MB B
Buffers: 46.5 MB C
Cached: 297.5 MB D
Memory Used: 438.0 MB = A - B - C - D
Current Directory Path (getcwd) and drive space (statvfs):
/home/roy/all64/bmpspd
Total MB 11263, Free MB 9446, Used MB 1817
See files hd1.txt and hd2.txt for details of drive used
SDL_GetVideoInfo
hw_available flag is 0 - cannot create hardware surfaces
Display size 1280 x 1024 pixels at 32 bits
SDL_VideoDriverName = x11
Graphics (command - lspci | grep -i vga > vga.txt)
VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1)
#####################################################################
Image Editing Speeds 64 Bit Version 1, Sat Aug 6 09:45:47 2011
Input Enlarge Save Load Scroll Scroll Rotate Max MB
Image Display Display Repeat Overall 90 deg Memory
Mbytes Secs Secs Secs msecs MB/Sec Secs Used
0.5 0.02 0.01 0.01 0.83 601.15 0.01 440.2
1.0 0.02 0.05 0.02 1.63 612.30 0.02 441.9
2.0 0.02 0.02 0.03 3.31 634.52 0.02 445.4
4.0 0.03 0.04 0.06 5.66 625.44 0.03 451.6
8.0 0.05 0.08 0.11 6.73 584.70 0.05 464.7
16.0 0.09 0.16 0.20 6.77 580.53 0.08 489.5
32.0 0.16 0.29 0.31 6.70 587.05 0.16 541.1
64.0 0.29 0.59 0.71 6.94 566.85 0.32 672.4
128.0 0.59 1.32 1.22 6.64 592.54 0.65 785.3
256.0 1.14 2.35 2.60 6.63 593.46 3.51 1129.9
512.0 2.27 4.90 4.73 6.65 591.47 3.91 1822.9
End at Sat Aug 6 09:46:58 2011
|
To Start
OpenGL Benchmark
The benchmarks, videogl32 and videogl64, are 32-Bit and 64-Bit Linux compilations of OpenGL code used for testing via Windows. Details and results can be found in
Linux OpenGL Benchmarks.htm.
The benchmarks measure graphics speed in terms of Frames Per Second (FPS) via six simple and more complex tests. The first four tests portray moving up and down a tunnel including various independently moving objects, with and without texturing. The last two tests, represent a real application for designing kitchens. The first is in wireframe format, drawn with 23,000 straight lines. The second has colours and textures applied to the surfaces.
The textures are obtained from 24 bit BMP files that can be up 256 x 256 pixels at 192 KB. The BMP files and Linux execution files can be found in
linux_opengl_benchmarks.tar.gz,
along with source code, compilation and running instructions. Windows benchmarks from the same source code are also included.
The benchmarks were run on a variety of Ubuntu, Fedora and OpenSuse distros and different PC hardware, with nVidia, ATI and Intel graphics. Newly installed Linux systems do not [so far] provide OpenGL hardware acceleration and, except for nVidia, finding such a driver that works with a particular release is seemingly impossible, in some cases.
As a default, the benchmark runs using a full screen window, but input parameters allow different sized windows to be used, via Terminal commands or a script file. Following are example log files from tests using a Core 2 Duo CPU and GeForce 8600 GT graphics, using a default driver and one from nVidia.
Decreasing performance, as the window size increases, suggests a graphics speed limitation, with constant performance indicating that processor speed is the limiting factor.
#####################################################################
Linux OpenGL Benchmark 64 Bit Version 1, Wed Oct 26 22:29:24 2011
Running Time Approximately 5 Seconds Each Test
Window Size Coloured Objects Textured Objects WireFrm Texture
Pixels Few All Few All Kitchen Kitchen
Wide High FPS FPS FPS FPS FPS FPS
320 240 221.7 158.1 162.4 109.3 72.1 48.0
640 480 60.9 53.5 46.2 37.6 52.7 22.2
1024 768 23.7 22.0 18.4 15.6 34.9 10.7
1280 1024 15.6 14.6 12.0 10.3 28.5 7.4
End at Wed Oct 26 22:31:38 2011
#####################################################################
Linux OpenGL Benchmark 64 Bit Version 1, Tue Oct 25 18:36:45 2011
Running Time Approximately 5 Seconds Each Test
Window Size Coloured Objects Textured Objects WireFrm Texture
Pixels Few All Few All Kitchen Kitchen
Wide High FPS FPS FPS FPS FPS FPS
320 240 3670.2 2326.6 1160.9 678.8 401.0 229.2
640 480 2463.1 2033.9 896.3 666.3 414.5 231.3
1024 768 1089.2 987.3 541.6 440.9 401.8 214.6
1280 1024 727.0 680.8 412.1 338.3 400.2 194.0
End at Tue Oct 25 18:38:58 2011
|
To Start
On-Line Benchmarks
A Java version of the Whetstone Classic Benchmark, that is executed via a downloaded HTML page, was produced in 1997.
Because of the timing considerations in those days, the benchmark ran for 100 seconds. It also included a measurement of graphics speed. Running this via FireFox and Linux identified some unacceptable text displays and measured speeds, due to over-optimisation. The code was modified slightly to avoid this, running time was reduced and graphics tests were excluded, for a new version, compiled via Java installed under Linux.
The benchmark is run via
WhetJava2.html
or indirectly from
online benchmarks.html,
which also includes tests to measure downloading speed of images (see below).
Performance results are produced in graphics format, but this can be kept using Take ScreenShot. A version of the new benchmark was also compiled, that runs from a Terminal command, to produce text output to the window and log file. Format is the same as the graphics display and an example is given below.
Results via Linux and Windows are available in
Whetstone Benchmark Results - Java.
These show differences in 32 bit vs 64 bit, Windows vs Linux, On-line vs Off-line and same results with different browsers. The benchmarks, including source code, can be downloaded from
onlinetests.zip
or
onlinetests.tar.gz.
*************************************************************
Whetstone Benchmark Java Version, Dec 8 2011, 23:38:14
1 Pass
Test Result MFLOPS MOPS millisecs
N1 floating point -1.124750137 894.69 0.0215
N2 floating point -1.131330490 732.82 0.1834
N3 if then else 1.000000000 1027.81 0.1007
N4 fixed point 12.000000000 1735.54 0.1815
N5 sin,cos etc. 0.499110132 41.15 2.0220
N6 floating point 0.999999821 496.69 1.0860
N7 assignments 3.000000000 582.23 0.3174
N8 exp,sqrt etc. 0.825148463 33.54 1.1090
MWIPS 1991.45 5.0215
Operating System Linux, Arch. amd64, Version 2.6.34-12-desktop
Java Vendor Sun Microsystems Inc., Version 1.6.0_26
|
Online Benchmark Downloading Tests measure the downloading time of 1 MByte or 100 KByte BMP, GIF and JPG files and for 200 or 400 70 Byte GIF files. Of particular note, typical loading times of the 400 GIFs (28 KB) is twice as long as that for the 1 MB image files.
To Start
Booting Time
Below are booting times on two PCs, from boot menu selection to loaded desktop. The two PCs are a Netbook with a 1.66 GHz Atom CPU, originally running Windows XP, and a desktop PC with a 2.4 GHz Core 2 Duo and Windows Vista. Besides seconds to boot, MB/second reading speed of the drives is provided, derived from the Image Processing Benchmark results.
The first results show Windows booting time, for comparison purposes, the Core 2 Duo being particularly slow. The second and fastest results are for 64-Bit Ubuntu 10.10, booting from the Windows disk in the Netbook, and a fast (for 2009) eSATA disk on the desktop.
Figures for the next six entries are from USB sticks, booting 32-Bit and 64-Bit Ubuntu 10.10, 64-Bit Ubuntu 11.04, 64-Bit Fedora 14 and 64-Bit OpenSuse 11.4.
On moving the drives between systems, it seems that booting time of the next system used can be considerably longer than normal (needs to use alternative drivers?). Also, the first Linux installations were with Ubuntu and nVidia drivers were installed in order to run CUDA based benchmarks, probably the reason why these would only fully boot on using Recovery Mode on the Netbook, with its Intel graphics.
On the desktop, all Linux loading times are faster than Windows, using much slower drives, but the fastest flash drive does not necessarily produce the shortest booting time. Repeating the tests for a number of times indicates that booting time depends on differing hardware/distro combinations. The last result is with OpenSuse on a USB disk drive, where the faster data transfer speed, compared to a flash drive, does not improve booting time much.
Netbook, WinXP, 5400 Desktop, Vista 7200 RPM
RPM Local Disk SATA and eSATA Disks
Drive Linux Boot1 Boot2 Disk Mode Boot1 Boot2 Disk Mode
Secs Secs MB/s Secs Secs MB/s
Windows Disk 64 50 70.0 Norm 170 170 47.8 Norm
Local Disk Ubuntu 10.10 37 35 56.0 Norm 22 23 108.0 Norm
Old Staples Ubuntu 10.10 100 66 9.3 Rec 76 71 8.8 Norm
4 GB Stick 64 Bit 95 71 Rec
PNY Attache Ubuntu 10.10 100 77 18.2 Rec 103 62 20.4 Norm
4 GB Stick 32 Bit
Cruzer U3 Ubuntu 10.10 50 51 16.4 Rec 57 57 16.9 Norm
4 GB Stick 64 Bit
Patriot Rage Ubuntu 11.04 46 57 24.3 Norm 76 48 26.8 Norm
8 GB Stick 64 Bit
Cruzer U3 Fedora 14 110 98 22.0 Norm 73 70 23.8 Norm
16 GB 64 Bit
Cruzer Blade OpenSuse 11.4 82 70 19.1 Norm 70 44 20.8 Norm
8 GB Stick 64 Bit
USB Disk OpenSuse 11.4 59 60 28.4 Norm 48 42 34.8 Norm
64 Bit
|
To Start
Roy Longbottom March 2012
The Official Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|