Four Core Eight Thread Computing Benchmarks
Contents
General
Dual Core benchmark code
(see DualCore.htm)
has been modified to use eight threads, initially intended for measuring performance of four core processors with Hyperthreading, where Windows sees the system as having eight processors.
Download QuadCore.zip
for benchmark source code and EXE files at 64 bits and 32 bits.
Results below include those for a Quad Core Phenom II and a Quad Core i7 with Hyperthreading.
With one core in use, the latter processor can run at 3066 MHz using Turbo Boost, but this will be reduced to 2933 MHz, when using more than one core, or to the specified speed or 2800 MHz, if hot. This behaviour makes the effects of Hyperthreading more difficult to determine.
Except for the Whetstone benchmark, which has program loops with few instructions, the test programs have long sequences of streamed data, with some using efficient assembly code. In this case, high performance gains on a quad core processor with Hyperthreading are not really expected when using more than four threads.
To Start
CPUIDMP CPU Only Benchmark
Programs CPUID8Thread32.exe and CPUID8Thread64.exe are the same programs but compiled for 32 and 64 bits. They execute three passes of simple additions to four different registers, via assembly code, attempting to demonstrate maximum CPU speeds. Firstly an integer (INT) and an SSE floating point test are run separately. They are then run as two threads, followed by 2 INT and 2 SSE, 3 INT and 3 SSE then 4 INT and 4 SSE. Further information can be found in
WhatCPU Results.htm.
The high speed operation achieved appears to leave a little room to squeeze in additional hyperthreaded instructions on the Core i7. Even using four threads, integer throughput is disappointing and, between four and eight threads, the Phenom appears to be more efficient (on this particular code).
The slow i7 speeds could be due to a reduction in Turbo Boost MHz from 3066 to 2933, where maximum gain might be 2933 / 3066 x 4 x 100 = 383%.
CPU Core 2 Athlon 64 Core 2 Phenom II Core i7
MHz 1830 2211 2400 3000 ####
CPUs/Hyperthreads 2/0 2/0 2/0 4/0 4/4
Windows Vis32 XPx64 Vis64 Win764 Win764
Separate Tests
32 bit SSE MFLOPS 6781 4400 9222 12020 10178
32 bit Integer MIPS 4556 6612 6296 9018 8611
Two Threads Equal Priority
32 bit SSE MFLOPS 6777 4384 9266 12003 10176
32 bit Integer MIPS 5117 6604 6740 9028 8606
Four Threads, First Normal Priority, Others Normal - 1
32 bit SSE MFLOPS 6816 4363 9086 11935 11032
32 bit SSE MFLOPS 0 2215 0 11986 9161
32 bit Integer MIPS 2508 3232 3257 8897 6739
32 bit Integer MIPS 2642 67 3671 8956 6566
Total SSE MFLOPS 6816 6578 9086 23921 20193
Total Integer MIPS 5150 3300 6929 17853 13305
Gain % SSE 101 150 99 199 198
Gain % Integer 113 50 110 198 155
Total 353
Six Threads, All Normal Priority
32 bit SSE MFLOPS 2200 1439 3059 5864 8166
32 bit SSE MFLOPS 2355 1450 3114 12012 9124
32 bit SSE MFLOPS 2358 1488 3111 11946 6653
32 bit Integer MIPS 1700 2192 2112 4452 4612
32 bit Integer MIPS 1669 2163 2249 4546 4612
32 bit Integer MIPS 1699 2257 2450 4519 4041
Total SSE MFLOPS 6913 4376 9284 29822 23942
Total Integer MIPS 5068 6612 6811 13517 13265
Gain % SSE 102 99 101 248 235
Gain % Integer 111 100 108 150 154
Total 389
Eight Threads, All Normal Priority
32 bit SSE MFLOPS 1705 1083 2283 4077 5445
32 bit SSE MFLOPS 1730 1067 2321 5867 5210
32 bit SSE MFLOPS 1730 1078 2321 11982 5194
32 bit SSE MFLOPS 1728 1130 2314 6141 4693
32 bit Integer MIPS 1252 1630 1680 4451 4032
32 bit Integer MIPS 1251 1634 1672 2973 4029
32 bit Integer MIPS 1411 1639 1671 4495 4036
32 bit Integer MIPS 1244 1732 1879 2968 4035
Total SSE MFLOPS 6893 4358 9240 28067 20541
Total Integer MIPS 5158 6635 6902 14887 16132
Gain % SSE 102 99 100 234 202
Gain % Integer 113 100 110 165 187
Total 389
#### Core i7 930 rated at 2800 MHz but running up to 3066 MHz using Turbo Boost
|
To Start
Whetstone Benchmark
The Whetstone Benchmark has various routines that execute floating point and integer instructions.
Speed of individual tests is in terms of Millions of Operations Per Second (MOPS), or MFLOPS for those using simple floating point arithmetic, and an overall rating in Millions of Whetstone Instructions Per Second (MWIPS).
Programs Whets8Thread32.exe and Whets8Thread64.exe are the same programs but compiled for 32 and 64 bits.
Unlike the dual core variety, this version uses common code and equal priority for all threads to produce more consistent performance.
Results and further details can be found in
Whetstone Results.htm.
Those at 64 bits are somewhat faster due to improved optimisation.
The total (top line) results shown are calculated using a simple sum of speeds for each thread and can be distorted by threads finishing at different times. Using a harmonic mean makes little difference and the overall MWIPS rating is calculated using the sum of elapsed times of tests in all threads.
Considering the four core Phenom results, consistent speeds are produced on all test using two and four threads to produce performance gains of 200% and nearly 400%. It is not clear why, but average gains using six and eight threads were around 450%.
The Core i7 produces 200% gain using two threads but less than 400% with four threads, no doubt due to the Turbo Boost clock of 3066 MHz being reduced to the specification speed of 2800 MHz.
This benchmark appears to demonstrate Hyperthreading in a most favourable light, producing average gains of around 450%, using six threads, and 700% with eight threads. The main beneficiaries are the floating point tests, in this case translated to SSE code as Single Instruction Single Data (SISD not SIMD/Multiple) operations.
MWIPS MFLOP MFLOP MFLOP COS EXP FIXPT IF EQUAL
CPU MHz 1 2 3 MOPS MOPS MOPS MOPS MOPS
Phenom II Win7 3000 3115 902 739 716 69.5 49.3 2509 3008 1289
Dual Core Thread 1 902 739 716 69.5 49.3 2509 3008 1289
Phenom II Win7 3000 6229 1811 1480 1432 139 98.6 5007 6022 2578
Dual Core Thread 1 906 738 716 69.5 49.3 2508 3010 1289
Thread 2 905 741 716 69.5 49.3 2499 3012 1288
Gain % 200 201 200 200 200 200 200 200 200
Phenom II Win7 3000 12414 3603 2950 2853 277 196 9988 11992 5139
Dual Core Thread 1 902 735 714 69.1 49.2 2481 2983 1278
Thread 2 903 739 715 69.4 49.0 2501 3000 1287
Thread 3 905 739 710 69.3 49.0 2499 2999 1285
Thread 4 893 736 714 69.5 49.2 2508 3009 1288
Gain % 399 399 399 398 399 398 398 399 399
Phenom II Win7 3000 14101 4322 3550 3374 325 231 12239 14250 6019
Dual Core Thread 1 621 767 725 46.3 49.5 2655 1995 860
Thread 2 613 510 482 46.5 32.8 1722 3009 859
Thread 3 617 496 477 46.4 33.0 1741 2116 862
Thread 4 933 767 726 69.7 49.6 1725 3077 1291
Thread 5 604 505 486 46.3 32.8 2651 2043 854
Thread 6 934 506 477 69.8 33.1 1744 2011 1293
Gain % 453 479 480 471 468 469 488 474 467
Phenom II 8 Threads Similar
Core i7 Win7 #### 3115 1065 886 738 79.3 39.7 2447 2936 1154
Quad Core Thread 1 1065 886 738 79.3 39.7 2447 2936 1154
Core i7 Win7 #### 6228 2130 1773 1474 159 79.4 4894 5872 2308
Quad Core Thread 1 1065 887 737 79.3 39.7 2447 2936 1154
Plus HT Thread 2 1065 886 737 79.3 39.7 2448 2936 1154
Gain % 200 200 200 200 201 200 200 200 200
Core i7 Win7 #### 12043 4243 3529 2930 302 156 9078 10207 4170
Quad Core Thread 1 1059 880 730 75.0 39.4 2102 2332 1018
Plus HT Thread 2 1064 881 733 76.9 38.7 2450 2498 1107
Thread 3 1057 881 729 74.1 38.6 2187 2439 1044
Thread 4 1063 887 738 76.4 39.0 2339 2938 1001
Gain % 387 398 398 397 381 393 371 348 361
Core i7 Win7 #### 17149 6705 5463 4426 422 224 12984 13145 4869
Quad Core Thread 1 1146 919 739 72.3 37.6 2019 1958 816
Plus HT Thread 2 1145 915 736 69.8 37.0 2044 2664 793
Thread 3 1143 916 744 71.8 37.0 2058 2083 793
Thread 4 1111 926 737 68.5 37.6 2398 2023 788
Thread 5 1097 916 742 72.2 37.8 2110 2124 827
Thread 6 1062 872 728 67.8 36.7 2355 2292 852
Gain % 551 630 617 600 532 564 531 448 422
Core i7 Win7 #### 21690 8676 7621 5844 531 291 16643 12027 5034
Quad Core Thread 1 1091 1027 728 66.4 36.5 2050 1501 629
Plus HT Thread 2 1089 1037 742 66.0 36.5 2090 1507 630
Thread 3 1090 946 742 66.8 36.5 2069 1534 631
Thread 4 1092 1037 727 66.6 36.6 2031 1501 630
Thread 5 1042 959 736 66.4 36.5 1912 1483 630
Thread 6 1091 874 723 66.6 36.1 2049 1507 629
Thread 7 1090 867 725 65.6 36.3 2094 1516 631
Thread 8 1091 874 722 66.3 36.3 2350 1476 624
Gain % 696 815 860 792 670 733 680 410 436
#### i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
|
To Start
BusMP Maximum Data Flow Benchmark - MBytes/Second
Bus8Thread32.exe and Bus8Thread64.exe are the same programs but compiled for 32 and 64 bits. Results and further details can be found in
BusSpd2K Results.htm.
One difference is that integers for the the 64 bit version are declared as 64 bits, rather than the default 32. The first results below show major performance differences between the two varieties, where performance in MBytes Per Second can be near twice as fast at 64 bits, indicating a processing speed limitation (64 bit integer arithmetic speed can be same as at 32 bits).
The program starts by reading words with 32 word address increments, to identify memory bus burst reading speed, then reduces the increment to eventually read all words sequentially. Finally, a test loads data to 128 bit SSE registers. Burst reading is mainly at 64 bytes at a time, so maximum speed is likely to be 16 times the MB/second 16 or 8 word increments for 32 or 64 bit numbers.
On the single thread results, burst calculations suggest that the Phenom could achieve 7280 MB/second RAM speed from one CPU, similar to that obtained at 128 bits. The figure for the i7 is 11344 MB/second, higher than that achieved.
According to the specifications, maximum speeds are 21333 MB/second (at 667 MHz) for the Phenom and 17067 MB/second (at 533 MHz) for the i7. Multi-Thread tests achieve up to 15000 MB/second and nearly 14000 MB/second respectively.
Part two tables show performance and gains using 1, 2, 4, 6 and 8 threads, for all caches and RAM, using the 32 bit compilation, at Inc 32wds, Read All and 128b SSE2.
Using 4 or more threads, the Phenom achieves performance gains of 360% to 390% via L1 and L2 caches, around 320% via L3 and 200% to 250% using RAM. With the Core i7, there are only significant gains due to Hyperthreading in the 128 bit SSE L1 cache test. Here, the maximum speed is likely to be one result of 16 Bytes per CPU clock per processor, or 16 x 2800 x 4 = 179,200 MB/second. This was nearly achieved using 8 threads. On the downside, it looks as though the system was trying to use eight lots of 1.5 MB (L3 data) at the same time, forcing data to be read from RAM.
Single Thread Cache MHz Inc Inc Inc Inc Inc Read 128b
Results RAM 32wds 16wds 8wds 4wds 2wds All SSE2
Phenom II 32b L1 3000 10606 13543 13819 13363 13463 14219 23691
Phenom II 32b L2 3000 1496 1495 2957 5972 11352 13145 23798
Phenom II 32b L3 3000 659 751 1377 2995 5656 9562 10838
Phenom II 32b RAM 3000 439 455 894 1846 3097 5214 7302
Phenom II 64b L1 3000 20650 21652 25936 25907 26860 27037 23718
Phenom II 64b L2 3000 2922 2970 2992 5927 11859 22500 23881
Phenom II 64b L3 3000 1419 1462 1492 2908 5958 11097 11891
Phenom II 64b RAM 3000 832 877 911 1784 3676 6237 7360
Core i7 930 32b L1 **** 10303 9510 9654 9122 9134 9023 23326
Core i7 930 32b L2 **** 1996 2041 3677 5980 8009 8643 22092
Core i7 930 32b L3 **** 1948 2004 3608 5848 8074 8614 21650
Core i7 930 32b RAM **** 526 709 1350 2352 4458 7063 9485
Core i7 930 64b L1 **** 20105 18713 19136 17974 18126 17910 23345
Core i7 930 64b L2 **** 3934 3999 4076 7064 12003 15793 21923
Core i7 930 64b L3 **** 3842 3909 4028 6979 11748 15845 21848
Core i7 930 64b RAM **** 949 1048 1419 2736 4698 8812 9459
L1 Cache Results in MBytes/Second - 6 KB % Gain
Cache CPUs/ MHz Inc Read 128b Inc Read 128b
KB HTs 32wds All SSE2 32wds All SSE2
Phenom II 64 4/0 3000 10606 14219 23691
764 2 Threads 128 21150 28435 47423 199 200 200
4 Threads 256 40763 54630 92595 384 384 391
6 Threads 256 31624 54370 88023 298 382 372
8 Threads 256 38638 53126 85948 364 374 363
Core i7 930 32 4/4 **** 10303 9023 23326
764 2 Threads 64 20590 18031 46677 200 200 200
4 Threads 128 29499 31104 91726 286 345 393
6 Threads 128 35391 35846 137181 344 397 588
8 Threads 128 41300 39292 170513 401 435 731
L2 Cache Results in MB/Second - 96 KB % Gain
Phenom II 512 4/0 3000 1496 13145 23798
2 Threads 1024 2983 26351 47336 199 200 199
4 Threads 2048 5761 51226 92184 385 390 387
6 Threads 2048 5863 48050 86055 392 366 362
8 Threads 2048 5380 48529 85650 360 369 360
Core i7 930 256 4/4 **** 1996 8643 22092
2 Threads 512 3378 17305 43722 169 200 198
4 Threads 1024 3866 26611 60836 194 308 275
6 Threads 1024 4049 33262 64866 203 385 294
8 Threads 1024 4178 37228 68711 209 431 311
L3 Cache - 1536 KB Data % Gain
Phenom II 6144 4/0 3000 659 9562 10838
2 Threads 1431 18082 22559 217 189 208
4 Threads 2222 29623 34566 337 310 319
6 Threads 2221 30682 34525 337 321 319
8 Threads 2240 31417 35148 340 329 324
Core i7 930 8192 4/4 **** 1948 8614 21650
2 Threads 3192 17141 42945 164 199 198
4 Threads 3772 30387 58809 194 353 272
6 Threads 2537 29429 43411 130 342 201
8 Threads 1060 19526 15886 54 227 73
RAM Results in MBytes/Second - 128 MB % Gain
Phenom II 4/0 3000 439 5214 7302
2 Threads 744 8920 12162 169 171 167
4 Threads 913 13000 14952 208 249 205
6 Threads 902 13183 15005 205 253 205
8 Threads 909 12701 14966 207 244 205
Core i7 930 4/4 **** 526 7063 9485
2 Threads 637 11883 12945 121 168 136
4 Threads 724 13600 13828 138 193 146
6 Threads 731 13572 13911 139 192 147
8 Threads 731 13750 13722 139 195 145
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
|
To Start
RandMP Serial/Random Access Benchmark
Rand8Thread32.exe and Rand8Thread64.exe are compiled from the same program, but for 32 and 64 bits. The program uses the same code for serial and random use via a complex indexing structure and comprises Read (RD) and Read/Write (RW) tests. They are run to use data from L1 cache, L2 cache and RAM using 1, 2, 4, 6 and 8 threads. Results (32 and 64 bit versions) and further details can be found in
RandMem Results.htm.
This benchmark uses data from the same array for all threads, but starting at different points. As with the dual core version, with RW and particularly random, flushing dedicated caches to maintain data coherency, leads to reduced performance using more than one thread. Here, speed using shared L2 or L3 cache can be faster than using L1 cache.
Results for the 32 bit version below show the total throughput of all threads based on harmonic mean.
Data sizes are, again, 6 KB for L1 cache, 96 KB for L2 cache, 1536 KB for L3 cache but 96 MB for RAM.
On the Phenom, speed on serial reading, from caches and RAM, is similar to that for BusMP Read All tests. This also applies via caches for the Core i7 but, using RAM, the data transfer speed appears to be higher than possible, most likely due to efficient caching of shared data (different data starting point probably suits 8 MB L3 cache). This i7 RAM test is the only one where Hyperthreading has a major impact.
Random reading speed via L1 cache is similar to that for serial reading but becomes progressively slower through other caches and RAM. The Core i7 is the faster from L3 cache and RAM using few threads, but the Phenom nearly catches up at 8 threads.
The i7 is clearly much faster of the two systems on most read/write tests, but still struggles to achieve a throughput gain of grater than 2.0 using more than two threads. Note that, using one thread on random read/write of L1 cache sized data, the i7 is five times faster than using multiple threads and the Phenom up to ten times faster. For the latter, using data in RAM is faster than data that could sit within L1 cache.
CPUs MBytes Per Second Using Threads Gain At Threads
/HTs 1 2 4 6 8 2 4 6 8
Serial RD
Core i7 4/8 L1 11458 22661 37039 43717 46374 2.0 3.2 3.8 4.0
930 L2 10380 20832 32853 41711 42839 2.0 3.2 4.0 4.1
#### MHz L3 8828 17743 29610 38414 40330 2.0 3.4 4.4 4.6
Win 764 RAM 4266 8712 17347 24946 25589 2.0 4.1 5.8 6.0
Serial RW
Core i7 4/8 L1 15282 13724 16240 16209 18379 0.9 1.1 1.1 1.2
930 L2 12223 18216 25326 28104 27047 1.5 2.1 2.3 2.2
#### MHz L3 10234 19266 21931 24450 26351 1.9 2.1 2.4 2.6
Win 764 RAM 4533 7656 13876 14543 13390 1.7 3.1 3.2 3.0
Random RD
Core i7 4/8 L1 11266 22548 38174 45592 47141 2.0 3.4 4.0 4.2
930 L2 6233 12463 20059 24986 25667 2.0 3.2 4.0 4.1
#### MHz L3 3499 6915 9211 10002 9531 2.0 2.6 2.9 2.7
Win 764 RAM 459 909 1241 1398 1364 2.0 2.7 3.0 3.0
Random RW
Core i7 4/8 L1 14375 3027 2780 2901 3297 0.2 0.2 0.2 0.2
930 L2 5887 4555 6117 6693 7281 0.8 1.0 1.1 1.2
#### MHz L3 3104 4604 4721 5047 4933 1.5 1.5 1.6 1.6
Win 764 RAM 428 860 899 948 1026 2.0 2.1 2.2 2.4
#### 2.8 GHz running at up to 3.06 GHz via Turbo Boost, dual channel 1066 MHz DDR3 RAM
##################################################################################
CPUs MBytes Per Second Using Threads Gain At Threads
/HTs 1 2 4 6 8 2 4 6 8
Serial RD
Phenom II 4/0 L1 15212 29350 58904 58896 54909 1.9 3.9 3.9 3.6
3000 MHz L2 12236 24767 49039 50798 47318 2.0 4.0 4.2 3.9
Win 764 L3 8148 16402 30391 33436 32457 2.0 3.7 4.1 4.0
1333 MHz DDR3 RAM 3917 6983 11299 12484 12002 1.8 2.9 3.2 3.1
Serial RW
Phenom II 4/0 L1 7741 5100 5750 6598 6517 0.7 0.7 0.9 0.8
3000 MHz L2 7998 5906 7479 8466 8345 0.7 0.9 1.1 1.0
Win 764 L3 7132 13142 7489 8566 8582 1.8 1.1 1.2 1.2
1333 MHz DDR3 RAM 3589 5981 8576 7913 7802 1.7 2.4 2.2 2.2
Random RD
Phenom II 4/0 L1 14367 27877 56817 55300 54129 1.9 4.0 3.8 3.8
3000 MHz L2 7250 14355 28436 29723 27962 2.0 3.9 4.1 3.9
Win 764 L3 1560 3419 6641 7403 7410 2.2 4.3 4.7 4.8
1333 MHz DDR3 RAM 339 679 1140 1336 1242 2.0 3.4 3.9 3.7
Random RW
Phenom II 4/0 L1 7585 1381 752 833 757 0.2 0.1 0.1 0.1
3000 MHz L2 5985 1624 1230 1387 1245 0.3 0.2 0.2 0.2
Win 764 L3 1505 1724 1377 1545 1572 1.1 0.9 1.0 1.0
1333 MHz DDR3 RAM 313 634 1113 1157 1153 2.0 3.6 3.7 3.7
|
To Start
OpenMP Benchmark - MFLOPS
OpenMP is a system independent set of procedures and software that arranges automatic parallel processing of shared memory data when more than one processor is provided. This option is available in the latest Microsoft C++ compilers.
The benchmark executes the same functions, using the same data sizes, as the
CUDA Graphics GPU Parallel Computing Benchmark,
with varieties compiled for 32 bit and 64 bit operation, using old style i387 floating point instructions and more recent SSE code (OpenMP32MLOPS.exe and OpenMP64MLOPS.exe).
A run time Affinity option is available to execute the benchmark on a selected single processor.
These benchmarks and a non-OpenMP SSE version (SSE32MFLOPS.exe) can be downloaded via
OpenMPMflops.zip.
Results and further details can be found in
OpenMP MFLOPS.htm.
The benchmark demonstrates that OpenMP can make use of four CPUs but not much extra on the Core i7 due to Hyperthreading.
Each test reads 1000 MB and writes 1000 MB where at least the largest data size of 10M words will be from/to RAM and could be limited by memory speed with 2 floating point operations per word. Two example calculations of MB/second are shown below.
Core i7 930 2.8 GHz running at up to 3.06 GHz via Turbo Boost
Windows 7 64
CUDA CUDA
Data Ops/ Repeat SSE i387 i387 SSE 64b SSE 64b GeFrce No I/O
Words Word Passes 1 CPU 1 CPU 4/8 CPU 1 CPU 4/8 CPU GTX480 GTX480
100000 2 2500 3567 1248 4455 1574 4001 521 5554
1000000 2 250 3529 1420 5433 1861 4919 819 21493
10000000 2 25 2388 1364 3038 1735 3076xx 1014 31991
100000 8 2500 4655 2337 8798 3794 14581 2058 20129
1000000 8 250 4642 2413 9813 4149 17080 3306 82132
10000000 8 25 4453 2436 9581 4011 12457 4057 125413
100000 32 2500 3328 2957 12020 4324 16786 7768 52230
1000000 32 250 3329 3011 12339 4436 17599 13190 254306
10000000 32 25 3307 3003 12432 4418 17576yy 16077 425237
Maximum Gain 414% 412%
xx in 0.163 seconds - MB/Second = 2000 / 0.163 = 12270 (x 2/8 for MFLOPS)
yy in 0.455 seconds - MB/Second = 2000 / 0.455 = 4396 (x 32/8 for MFLOPS)
Phenom II X4 3.0 GHz, Windows 7 64
CUDA CUDA
Data Ops/ Repeat SSE i387 i387 SSE 64b SSE 64b GeFrce No I/O
Words Word Passes 1 CPU 1 CPU 4 CPU 1 CPU 4 CPU GTS250 GTS250
100000 2 2500 3552 1920 5587 1822 5613 328 3054
1000000 2 250 3268 1919 5585 1870 7056 625 9672
10000000 2 25 1861 1625 2993 1563 2972 714 13038
100000 8 2500 4535 2115 7763 3637 12653 1336 12233
1000000 8 250 4341 2108 7975 3709 14518 2382 39481
10000000 8 25 4141 2100 8062 3543 11273 2949 51199
100000 32 2500 4012 2566 9675 3652 14092 5142 36080
1000000 32 250 3981 2552 10091 3663 14510 9427 108170
10000000 32 25 3941 2510 9902 3633 14034 11182 135041
Maximum Gain 395% 396%
|
To Start
Multiple Tasks
Multitasking tests were run on the Core i7 using IntBurn64.exe and SSEBurn64,exe which are described in
BurnIn64.htm and
BurnIn4CPU.htm.
The benchmark and source code are in
More64bit.zip.
Tests run were one copy each of the Integer and SSE floating point programs, four concurrent copies of the integer test and four copies of both integer and SSE programs at the same time. Test durations were one minute each and results showed that all multitasking tests started and finished within the same clock time second.
Each test used L1 cache size data of 8 K. The SSE tests used the Cache Test option, normally the fastest.
Single test result show that the integer test is producing around one 64 bit result per clock Hz and four 32 bit (128 bits) floating point results per Hz using SSE instructions. As might be expected, the higher Turbo Boost CPU clock frequency using one CPU, means that four concurrent integer tests do not achieve a 400% performance level. However, running these eight programs, along with Hyperthreading, increases this to between 428% and 450%.
1 Test ----- 4 Concurrent Tests ---- Total Gain
Int Write/Read MB/sec 14195 13955 13902 13879 13905 55641 392%
Int Read MB/sec 20267 20206 20191 20179 20169 80746 398%
Int Write/Read MB/sec 8127 8756 8345 8414 33641 237%
Int Read MB/sec 10914 10794 10790 11049 43547 215%
SSE Calculate MFLOPS 11743 6231 6119 6144 6517 25011 213%
|
To Start
Roy Longbottom August 2010
The new Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|