See Bus Speed and Results Index below
Description
BusSpd2K benchmark is intended to demonstrate maximum data transfer rates from caches and RAM using 32 bit integer words and 64 bit MMX words. MOV and AND assembly code instructions are used, with 64 instructions in the inner loops for integers and 512 instructions for MMX. The program measures speeds with data size 4, 8, 16, 32 etc. KBytes up to a maximum of 50% RAM size. Results are given in MBytes/second (MB/s), where M = 1,000,000. An approximation of processor execution speed in Millions of Instructions Per Second (MIPS)can be obtained by dividing MB/s for integer tests by 4 and those for MMX tests by 8.
Ten different tests load data to one CPU register or 2 registers alternately (MMX 1 or 8). Tests 5 and 6 use MOV instructions to 1 and 2 integer registers, with tests 7 and 8 the same except using AND. These identify differences between CPU models. Tests 9 and 10 use MMX MOV to 1 and 8 registers, normally demonstrating maximum data transfer speeds.
Tests 1 to 4 load a 32 bit word (4 bytes) with address increments of 64, 32, 16, 8 bytes respectively. These are intended to demonstrate bus operation and speed where data is transferred in bursts.
BusSpd2K also has burn-in and reliability testing options, used via the opening window or from BAT file commands. The latter method can also be used to test multiple processors - See DualCore.htm and BurnIn32.htm. This has six write/read tests and six read only tests using different data patterns.
There is also a 64 bit version of the Reliability Tests compiled to use the larger registers - IntBurn64 in More64Bit.zip . See also BurnIn64.htm.
The latest addition is a paging test, using one of the write/read tests - See Paging.htm.
Example log files for reliability and paging tests are given Below.
A pre-compiled version of the benchmark can be found in BusSpd2K.zip which also contains the source code, providing further explanatory comments.
BusMP (and BusMP64), a different version, to test two CPUs is also available - See DualCore.htm, Win64.htm and Below.
Latest MP versions run tests using 1, 2, 4, 6 and 8 threads to test up to 8 processors (cores, or logical hyperthreaded CPUs). Details and results are given Below.
Then there is My Main Page for other PC benchmarks and results.
Following is an example output for a Pentium MMX CPU. Variations in performance identify L1 and L2 cache sizes, but it should be noted that speed can be slower than normal when data size equals cache size.
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
RAM Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
KB MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
L1 4 699 735 1428 1444 776 1470 393 764 1568 1567
8 678 740 1427 1460 775 1473 392 760 1579 1579
16 405 436 700 977 677 1149 371 664 1223 1157
L2 32 52 53 75 131 235 235 192 231 264 263
64 53 53 75 131 235 235 192 232 264 264
128 53 53 75 131 235 236 193 232 265 265
256 52 52 75 130 234 234 192 231 262 263
512 51 52 74 130 233 233 191 230 262 261
RAM 1024 28 29 47 70 140 140 124 139 140 140
2048 28 29 47 70 140 140 123 139 140 140
4096 28 29 47 70 140 140 124 139 140 140
8192 28 29 47 70 140 140 123 139 140 140
|
To Start
Bus Speed
On loading registers with varying address increments, the size of a burst of data over a bus can be recognised as the point when data transfer speed becomes constant. In the case of the Pentium MMX above, and up to Pentium III, burst size is 32 bytes. Then, the maximum bus transfer speed can be assumed to be eight times the measured speed or 232 MB/sec from RAM (8 x 29). The bus on this PC operates at 66.7 MHz and, at 8 bytes wide, provides an actual bus transfer speed of 533 MB/sec. The latter is for 4 bus clocks. So, the actual transfer takes 9.2 bus clocks, or 2 clocks per transfer and an average of 1.2 clocks overhead. This might be expressed at 3.2:2:2:2 for the burst.
With the arrival of Pentium IIIE, and its cache running at CPU MHz speed, burst speeds had reduced to the equivalent of 1.2:1:1:1 (efficiency 95%). Note that much of the overhead, such as indicated by CAS latency, disappears with continuous data transfer.
At least the later AMD CPUs use 64 byte bursts, that is via 8 bus clocks. Come the Athlon Thunderbird with a 133 MHz bus, efficiency could be 98% at an average of 1.2:1:1:1:1:1:1:1 bus clocks. Then along came DDR RAM which could, at least initially, have an efficiency of 92% at bus speeds up to 200 MHz, but much faster transfer rate via 16 bytes per bus clock. This might be designated as 1.3:1:1:1.
Later, there is Dual Channel DDR, starting at around 80% efficiency (on these tests), but producing effective burst transfer speeds of 5000 MB/sec on two 200 MHz buses.
I was told that Intel P4 based PCs used 64 byte bursts. However, my latest benchmarks with BusSpeed varieties using 128 byte address increments, indicate that bursts are at least 128 bytes (See DualCore.htm). The measured speed at 128 bytes is around half that at 64 bytes. So the latter can be used to calculate maximum bursts speeds.
Early P4s had RDRAM, working on a dual pumped 400 MHz bus, 2 bytes wide, and with dual buses. This gave a potential throughput of 3200 MB/Sec. Although efficiency was less than 80%, the achieved speed was the best at the time. Later P4s have PC133, DDR or Dual Channel DDR RAM with similar measured burst speeds as AMD.
On loading all data via the later CPUs, the fastest speeds are demonstrated via MMX instructions. In some cases, this is much slower than indicated by burst speeds. Information on maximum speeds when more processing is involved can be obtained from SSE3DNow results.htm and RandMem results.htm .
To Start
Results
Separate tables of speeds obtained via L1 cache, L2 cache and RAM are given below. Except when connected via the memory bus, performance via caches tends to be proportional to CPU MHz for a given type of processor. So, only a sample of results are provided. Details of cache sizes, speed and range of CPU MHz can be found in CPUSpeed.htm.
The latter also provides a range of performance comparisons based on %MIPS/MHz, which include BusSpd2K results.
L1 Cache Results - Speed on Intel CPUs from Pentium II onwards are fairly constant on all tests on calculating approximate CPU instruction execution speeds (divide results by 4 for integer tests and 8 for MMX). Here, MIPS are mainly between 95% and 98% of CPU MHz. Earlier Intel CPUs and later AMD processors execute some of the tests at greater than 180% of CPU MHz. Note that these percentages should be multiplied by two on considering MMX instructions processing two 32 bit words.
L2 Cache Results - These are sorted in the same order as L1 cache results which immediately indicates differences between Intel and AMD CPUs. The recorded speeds on the latter, particularly slow with address increments of 64 bytes, indicate that AMD use burst reading on L2 cached data.
RAM and Bus Speed Results - RAM performance is sorted by maximum bursts speeds. For example, that is the measured speed at 32 byte address increments multiplied by eight, where bursts are 32 bytes. Maximum bus speeds are also shown. See Bus Speed for further comments. Some examples of exceptionally poor performance (marked #) are included, presumably caused by a faulty mainboard, chipset or BIOS.
Dual Core Results - 32 bit and 64 bit - See below.
Eight Thread Benchmarks - 32 bit and 64 bit - See below.
Reliability Test Results - See below.
Dual and Quad Core Reliability Test Performance Results - 32 bit and 64 bit - See below.
Paging Test - 32 bit and 64 bit - See below.
To Start
L1 Cache Results in MBytes/Second - sorted by Inc64
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
MHz Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
80486 DX2 66 112 117 119 124 136 122 120 123 0 0
Pentium 100 316 355 637 679 385 713 195 380 0 0
Pentium Pro 200 679 748 769 764 775 779 758 756 0 0
Pentium MMX 200 699 735 1428 1444 776 1470 393 764 1568 1567
Celeron A 450 1617 1690 1729 1677 1724 1737 1703 1700 3517 3507
Pentium II 450 1515 1649 1745 1728 1763 1765 1710 1717 3520 3527
Pentium III 450 1624 1700 1738 1735 1759 1740 1742 1744 3513 3491
AMD K62 500 1633 1742 1685 1706 1783 1780 1725 1755 3593 3509
Celeron 2 566 2042 2136 2198 2184 2213 2210 2191 2194 4446 4445
Duron 700 2530 4807 5011 4985 4941 5034 2677 4935 10169 10151
Pentium IIIE 733 2652 2778 2852 2833 2849 2869 2844 2847 5768 5765
Athlon 800 2917 5508 5754 5735 5791 5909 3135 5777 11951 11888
Athlon Tbird 1000 3635 6738 6331 7084 7245 7391 3830 7224 14871 14808
PIII Tualatin 1266 4588 4802 4914 4862 4949 4967 4806 4804 9898 9897
Atom M 1600 5447 5628 5792 5901 5988 5973 5984 5970 12268 12480
Athlon 4 1533 5594 10379 11155 11496 11161 11173 6016 11122 22870 22845
Pentium 4 1700 6139 6343 6559 6639 6589 6540 6405 6428 13188 13276
Ath4 Barton 1800 6525 12165 12999 13367 13020 12917 7013 11689 26630 26632
Core 2 Duo M 1830 6700 6879 7061 7200 7251 7215 7214 7249 14461 14448
Pentium M 1862 6744 6875 7117 7320 7371 7381 7255 7374 14424 14658
Turion 64 1900 6872 13397 14159 14798 14190 14094 7294 14098 29407 29390
Opteron 2000 7237 14069 14802 15473 14822 15008 7783 14726 30715 30698
Celeron C2 M 2000 7362 7622 7792 7845 7877 7800 7597 7911 15124 15648
Athlon XP 2080 7585 14104 15097 15617 15011 14995 7982 15009 31009 30959
P4 Xeon 2200 7947 8301 8495 8593 8559 8561 8336 8414 17176 17184
Athlon 64 2210 8070 15711 16498 17247 16538 16763 8670 16454 34291 34254
Core 2 Duo 2400 8640 8820 9339 9451 9530 9530 9477 9523 18930 18909
Pentium 4E HT 3000 9686 11043 11233 11525 11562 11227 11054 11099 22804 22657
Pentium 4 3000 10915 11458 11710 11853 11784 11790 11426 11238 23589 23500
Core i7 930 **** 11251 11488 11620 11614 11712 11719 5873 11718 23391 23398
Core i7 860 #### 12977 13465 13645 11701 13556 13349 6794 13742 27450 26951
Pentium 4 3678 13412 13879 14306 14358 14252 14422 13007 13473 28713 28818
Phenom II 3000 22764 22849 23433 23768 23938 23934 12019 22553 46887 46911
#### i7 860 2800 MHz running using Turbo Boost at up to 3466 MHz
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
|
To Start To Index
L2 Cache Results in MBytes/Second - same order as L1 speeds
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
MHz Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
80486 DX2 66 11 11 11 17 32 31 30 30 0 0
Pentium 100 26 26 40 75 124 139 96 117 0 0
Pentium Pro 200 133 132 234 317 488 487 454 453 0 0
Pentium MMX 200 53 53 75 131 235 235 192 232 264 264
Celeron A 450 306 305 548 793 975 975 974 976 1582 1619
Pentium II 450 179 179 359 709 829 824 831 832 1428 1433
Pentium III 450 180 180 359 531 846 846 843 846 1430 1437
AMD K62 500 29 59 117 218 436 436 429 429 436 436
Celeron 2 566 532 533 1125 1205 1392 1392 1389 1392 2410 2409
Duron 700 134 270 535 1029 1932 1955 1577 1533 2050 2008
Pentium IIIE 733 697 697 1466 1568 1805 1808 1809 1809 3135 3131
Athlon 800 106 211 424 846 1697 1698 1599 1588 1693 1698
Athlon Tbird 1000 198 360 788 1584 3144 3169 2572 2469 3104 3146
PIII Tualatin 1266 1701 1575 2513 2520 2882 2864 2879 2881 5038 5034
Atom M 1600 379 739 1385 2412 3624 3690 3683 3681 4769 4718
Pentium 4 1700 2617 3077 3544 3570 4658 4656 4598 4628 7143 7117
Ath4 Barton 1800 355 713 1421 2799 4863 4851 4009 4462 5682 5622
Core 2 Duo M 1830 1597 2523 3475 5130 6227 6234 6233 6012 7950 7976
Pentium M 1862 1214 2117 3289 4031 4731 4668 4732 4749 8077 8109
Turion 64 1900 429 831 1688 2976 5490 5383 5467 5457 5939 6086
Opteron 2000 670 1296 2588 4700 7167 7163 5870 6201 9480 9542
Celeron C2 M 2000 1791 2765 3799 5516 6812 6816 6812 6805 8747 8557
Athlon XP 2080 413 828 1645 3270 5637 5601 4597 5163 6564 6567
P4 Xeon 2200 4190 4021 4577 4630 6038 6038 6010 6020 9255 9258
Athlon 64 2210 651 1285 2411 4418 7786 7776 6448 6688 8936 8718
Core 2 Duo 2400 2131 3257 4597 6772 8187 8196 8168 8201 10549 10559
Pentium 4E HT 3000 2945 5640 6105 6624 7526 7536 7425 7470 13097 13303
Pentium 4 3000 5912 5521 6335 6385 8338 8337 8298 8322 12762 12779
Core i7 930 **** 3213 4805 7305 9467 10811 10810 5875 10805 14442 14408
Core i7 860 #### 3595 5003 8442 11028 12618 12639 6895 12408 16719 16788
Pentium 4 3678 7258 6719 7722 7808 10161 10201 10169 10064 15423 15560
Phenom II 3000 1500 2995 5986 11360 15036 15036 11918 15233 22377 22367
#### i7 860 2800 MHz running using Turbo Boost at up to 3466 MHz
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
L3 Cache Results in MBytes/Second
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
MHz Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
Core i7 930 **** 2004 3497 5958 9088 10447 10448 5870 10447 13857 13857
Core i7 860 #### 2262 3537 6992 10641 12204 12233 6319 10478 15059 16251
Phenom II 3000 745 1485 2974 5881 9833 9825 9615 9603 11726 11650
|
To Start To Index
RAM and Bus Speed Results in MBytes/Second - sorted by Max Burst Speed
Max Max MovI MovI AndI AndI MMX
System MHz bus Burst Reg1 Reg2 Reg1 Reg2 Max
80486 DX2 B 66 133 32 25 24 23 24 0
Pentium B 100 400 96 73 79 64 73 0
Celeron 2 # 900 800 168 166 166 166 165 166
Pentium MMX B 200 533 232 140 140 123 139 140
Pentium Pro 200 533 256 225 225 240 240 0
AMD K6 B 550 800 272 238 238 237 238 238
Pentium IIIEB # 1000 1067 289 289 289 289 289 289
Celeron A 300 533 456 267 267 282 280 450
Celeron A 450 800 496 407 406 426 427 494
Pentium II H 400 800 488 314 314 322 322 484
Pentium II H 450 800 504 317 316 324 325 500
Celeron 2 600 533 504 324 326 343 343 511
Pentium III H 450 800 528 303 304 339 334 527
Ath4 Barton # 1800 2133 592 589 590 433 492 594
Athlon Tbird # 1200 1067 672 528 527 351 328 670
Athlon H 800 800 672 575 575 414 366 673
Pentium IIIE 800 800 752 463 462 477 476 764
Celeron 2 850 800 784 474 474 486 486 765
Athlon H 900 1067 912 648 648 461 416 879
PIII Tualatin 1266 1067 912 580 579 580 575 749
Duron 700 1067 994 682 685 512 516 977
Pentium IIIEB R 1000 1600 1024 411 412 420 420 794
Pentium 4 2400 1067 1027 987 989 982 990 1010
Pentium IIIEB 1000 1067 1035 509 516 537 537 908
Athlon Tbird 800 1067 1040 677 677 516 510 942
Athlon Tbird 950 1067 1040 680 680 463 417 950
Duron 1000 1067 1043 680 680 463 414 951
Pentium 4 1900 1067 1043 981 980 979 967 1007
Athlon Tbird D 1466 2133 1744 755 756 666 666 1217
Pentium 4 D 1800 2133 1952 1455 1455 1401 1415 1641
Athlon Tbird D 1333 2133 1968 756 756 659 657 1219
Pentium 4 D 3066 2133 2021 1826 1819 1812 1818 1913
Athlon 4 D 1725 2400 2032 888 878 668 745 1172
Athlon XP D 2080 2667 2336 1171 1167 903 986 1549
Pentium 4 R 1700 3200 2336 1478 1471 1402 1429 1660
P4 Xeon R 2200 3200 2448 1537 1538 1511 1515 1822
Athlon 64 D 2000 3200 2932 2778 2736 2669 2663 2963
Opteron D 2000 3200 3136 2123 2129 2070 2110 2476
Pentium 4 R 2533 4267 3216 2078 2100 2075 2084 2358
Atom M D2 1600 4267 3280 3011 2958 2998 2953 3250
Pentium M DC 1862 4267 3328 2379 2375 2258 2294 2545
Core 2 Duo a DC2 2400 8533 3456 4312 4314 4194 4342 4860
Pentium 4 DC 2533 4267 3529 2576 2578 2451 2448 2742
Celeron C2 M DC2 2000 8533 3632 2550 2843 2607 3351 3493
Turion 64 M DC2 1900 8533 4112 2513 2555 2430 2484 2689
Core 2 Duo M DC2 1830 10667 4800 3738 3758 3604 3643 4464
Pentium 4E DC 3000 6400 4976 3613 3623 3432 3564 3895
Athlon 64 DC 2210 6400 4992 2793 2791 2704 2803 2941
Pentium 4 DC 3678 6272 5021 3375 3381 3249 3273 3723
Core 2 Duo b DC2 2400 8533 5376 4435 4402 4413 4342 5161
Core 2 Duo c DC2 2400 12800 6272 5051 5061 4961 4893 5720
Phenom II DC32 3000 21333 7208 5397 5393 5263 5262 6950
Core i7 DC32 **** 17067 11264 7845 7840 5410 7853 8290
Core i7 DC32 #### 21333 13600 9095 9204 6275 9421 9794
#### i7 860 2800 MHz running using Turbo Boost at up to 3466 MHz
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
Key B L2 cache on memory bus # Example of poor results
H L2 at half CPU MHz or less R RDRAM
D DDR RAM DC Dual Channel DDR RAM
DC2 DDR 2 DC32 DDR 3 2 Channel
M Mobile CPU
Core 2 Duo a nForce 570 chipset, Core 2 Duo b-c Intel 965 chipset
|
To Start To Index
Dual Core Results
Programs BusMP64 and BusMP in DualCore.zip are compiled to run via Win64 and Win32, manipulating 64 bit and 32 bit integers.
The tests are run firstly as a single thread and secondly using two threads at the same priority level, demanding twice the shown memory space.
SSE2 integer instructions are used instead of MMX for compatibility with 64 bit Windows.
Another difference to BusSpd2K is that the first burst test has 32 word address increments, or 128 bytes at 32 bits and 256 bytes at 64 bits, compared with 64 bytes above. This appears to demonstrate that Pentium 4 bursts from RAM are 128 bytes compared with 64 bytes with AMD and later Intel CPUs. Data transfer burst speeds are faster at 64 bits as twice as much data is being used.
The following is an example of log file results using Windows XP Pro x64.
Results below also include 64 bit scores on a Core 2 Duo via 64 bit Windows Vista with later results via 64-Bit Windows 7 using two of the four CPUs on Phenom II and Core i7 processors.
Athlon 64 X2 Dual Core 4200+ 2.21 GHz, DCDDR RAM, WinXP Pro X64
Part 1 - Single Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 14563 15920 16078 17827 17772 17443 17358
24 15001 16194 16241 17760 17706 17782 17492
96 1943 1947 1344 2406 4536 9654 8766
384 645 707 588 992 1502 2927 2957
768 641 713 587 986 1505 2909 2948
1536 642 717 586 988 1499 2897 2940
16380 642 704 591 986 1475 2870 2920
131070 639 698 592 983 1476 2860 2919
Part 2 - Two Threads Total MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 11780 12876 13348 18538 19771 21414 34617
24 11575 17226 17979 22147 22859 23625 34547
96 3121 3045 2569 4674 8281 12510 17488
384 552 630 645 1270 2244 4130 4746
768 556 629 643 1272 2254 4080 4713
1536 557 630 642 1270 2226 4076 4704
16380 563 634 644 1264 2218 4045 4688
131070 558 632 642 1263 2213 4063 4684
For 64 bit MIPS divide MB/Second by 8. SSE2 divide by 16 for 128 bit MIPS
For 32 bit MIPS divide MB/Second by 4. SSE2 divide by 16 for 128 bit MIPS
|
Below are available single and dual processor results for data in L1 Cache L2 Cache and RAM.
The original version of the benchmark obtained slower results on the first L1 cache tests due to reading more data than it should. This has been corrected in Version 1.2. Note that increases in throughput using two threads is not as good as expected, particularly with the 64 bit version.
The single thread tests are calibrated to run for approximately 0.5 seconds and the same number of passes are used for each of the two thread tests. These sometimes both take much longer than half a second, on the early tests, and are often quite different on all the integer tests (run with command BusMP Debug or BusMP64 Debug to log thread milliseconds). This imbalance accounts for some of the lack of throughput using two CPUs, the remainder seemingly being due to Windows interference in data flow. Further details can be found in DualCore.htm.
To Start To Index
L1 Cache Results in MBytes/Second - 6 KB
CPUs MHz Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
Celeron 1 450 1183 1021 1641 1631 1498 1706 0
2 Threads 1130 1225 1574 1564 1647 1701 0
Pentium III 1 731 1862 1793 2539 2417 2583 2460 0
2 Threads 1737 1315 2227 1765 2538 2429 0
Pentium 4 1 1900 3462 3218 6114 6435 6424 6501 13596
2 Threads 4508 4116 6381 6748 5994 6717 13195
Athlon XP 1 2088 5597 5676 8166 8340 8223 8368 0
2 Threads 5578 5664 8163 8490 8381 8557 0
Pentium 4E HT 1 3000 7586 7957 10609 10933 11140 10852 23035
2 Threads 5195 5325 9739 9895 9987 9895 23236
Atom M No HT 1 1600 & 4420 5268 5273 5532 5658 5519 23158
2 Threads 4543 5256 5373 5602 5387 5609 23100
Atom M HT 1 1600 & 4336 5140 5383 5515 5650 5500 23045
2 Threads 4433 5212 5300 5532 5612 5542 21674
Core 2 Duo C 64b 1 2400 & 14079 16065 16562 16952 17210 17268 36814
2 Threads 14155 16167 16460 17104 17211 17217 36797
Celeron C2 M 1 2000 & 6816 6830 7171 7123 7335 7466 30806
2 Threads 6926 6690 7051 7088 7393 7198 30791
Opteron 2 1992 5620 5749 8983 9101 8941 9010 15772
2 Threads 7267 7198 11410 13250 14778 16010 31434
Athlon MP 2 2000 5459 5456 7854 8073 7915 8068 0
2 Threads 5144 5639 7280 8856 10511 12518 0
Turion 64 Mob 2 1900 & 6645 6945 8766 8570 8306 8525 14507
V32 2 Threads 7965 8954 12494 13385 14431 14835 24929
Athlon 64 32b 2 2210 6209 6371 9953 10084 9887 9962 17458
X64 2 Threads 8292 8403 14317 16184 17352 18528 34743
Athlon 64 32b 2 2210 & 8131 8462 10388 10057 9882 9958 17391
X64 2 Threads 9234 10367 14787 16078 17366 18515 34466
Athlon 64 64b 2 2210 & 14563 15920 16078 17827 17772 17443 17358
X64 2 Threads 11780 12876 13348 18538 19771 21414 34617
Xeon P4 2 3065 7388 7715 8431 11206 11290 11467 23841
2 Threads 7285 7895 12031 11800 15783 18583 45757
Pentium 4D 2 3000 7759 8102 10658 11170 11249 11083 23160
2 Threads 6057 5219 7311 7962 9540 10156 44482
Core 2 Duo M 2 1830 5621 6790 6313 7011 6956 7017 27960
V32 2 Threads 7815 11005 11374 12728 12610 11894 50192
Core 2 Duo 2 2400 6654 6840 9122 9253 9207 9264 37031
2 Threads 5036 6453 9976 12462 14560 16123 74470
Core 2 Duo C 32b 2 2400 6751 6808 9050 9198 9270 9258 37375
V64 2 Threads 7897 9372 13674 15367 16514 17224 71915
Core 2 Duo C 32b 2 2400 & 7366 8999 8984 9201 9258 9263 37310
V64 2 Threads 7271 11341 13427 15096 16511 17116 66635
Core 2 Duo C 64b 2 2400 & 14261 16380 16736 17330 17414 17449 37143
V64 2 Threads 8106 12608 15512 17690 19165 20354 71247
Phenom II 32b 4 3000 & 11332 13930 14041 13667 13747 13617 23847
764 2 Threads 13214 18948 18754 21770 23912 25133 47214
Phenom II 64b 4 3000 & 19715 21711 23804 24300 24186 24340 23800
764 2 Threads 12003 19579 22114 23957 29049 29916 47175
Core i7 930 64b 4 **** & 17625 20139 20505 21273 21368 21392 23347
764 2 Threads 8266 13481 14606 19698 22879 24656 46620
Core i7 860 64b 4 #### & 17901 20518 20910 21679 21733 21836 23732
764 2 Threads 9120 15032 15019 21244 23986 25961 47356
L1 Cache At 24 KB
Phenom II 64b 4 3000 & 21010 22411 24797 24710 24188 24319 23833
764 2 Threads 16324 23701 27481 31141 30888 31868 47567
Core i7 930 64b 4 **** & 18462 20692 20694 21308 21412 21446 23465
764 2 Threads 11918 19370 22339 24596 25036 26062 46914
Core i7 860 64b 4 #### & 18856 21026 20935 21655 21845 21748 23935
764 2 Threads 12803 19288 22681 25950 26434 27325 47601
V32/64 32 and 64 Bit Vista, X64 64 Bit XP, 764 64 Bit Windows 7, & Version 1.2
#### i7 860 2800 MHz running using Turbo Boost at up to 3466 MHz (but detuned)
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
|
To Start To Dual Core Index
L2 Cache Results in MBytes/Second - 96 KB
CPUs MHz Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
Celeron 1 450 304 304 274 537 792 852 0
2 Threads 220 219 225 415 618 805 0
Pentium III 1 731 541 548 550 829 981 1275 0
2 Threads 534 537 546 845 1180 1620 0
Pentium 4 1 1900 1776 1976 1823 2511 3392 4725 10612
2 Threads 1906 2007 1971 2503 3368 4865 12725
Athlon XP 1 2088 408 406 817 1612 3252 4614 0
2 Threads 407 406 812 1609 3250 4606 0
Pentium 4E HT 1 3000 2209 2179 3466 3785 6348 7197 18538
2 Threads 3022 3091 4087 4606 6211 7496 18369
Atom M NO HT 1 1600 457 392 730 1368 2309 3490 5541
2 Threads 463 391 736 1358 2347 3535 5506
Atom M HT 1 1600 467 392 740 1368 2374 3551 5520
2 Threads 646 743 1311 2272 3615 4527 8999
Core 2 Duo C 64b 1 2400 4038 4152 3986 6370 9479 13521 18935
2 Threads 3999 4097 3939 6363 9395 13441 18823
Celeron C2 M 1 2000 1663 1605 2660 3725 5463 6583 15630
2 Threads 1579 1663 2548 3723 5368 6427 15511
Opteron 2 1992 756 595 1139 2119 4396 5799 8078
2 Threads 1689 1192 2256 4241 8727 11552 16099
Athlon MP 2 2000 391 390 780 1541 3101 4417 0
2 Threads 756 762 1489 2973 5985 8591 0
Turion 64 Mob 2 1900 611 443 832 1612 2750 5055 5873
2 Threads 1012 730 1400 2805 4894 8856 9564
Athlon 64 32b 2 2210 842 660 1259 2353 4865 6416 8959
2 Threads 1789 1317 2489 4693 9679 12783 17819
Athlon 64 64b 2 2210 1943 1947 1344 2406 4536 9654 8766
2 Threads 3121 3045 2569 4674 8281 12510 17488
Xeon P4 2 3065 4225 4258 3226 4603 5900 8086 18722
2 Threads 7484 7941 6271 8962 11620 16051 46299
Pentium 4D 2 3000 2235 2222 3502 3868 6331 7370 17523
2 Threads 3175 3192 4597 4823 6548 7295 34610
Core 2 Duo M 2 1830 1511 1519 2429 3367 5001 6038 14212
2 Threads 2310 2216 3815 5694 8935 11052 22895
Core 2 Duo 2 2400 2091 2049 3312 4539 6633 7971 18984
2 Threads 3225 3109 5502 7924 12257 15538 32128
Core 2 Duo C 32b 2 2400 2094 2028 3328 4529 6670 7949 19089
2 Threads 3157 3005 5369 7756 12129 14592 31545
Core 2 Duo C 64b 2 2400 4215 4225 4072 6452 9539 13629 19077
2 Threads 6007 6089 5723 9960 13424 17715 31282
Phenom II 32b 4 3000 & 1503 1496 2980 6016 10883 12878 23879
2 Threads 2982 2974 5957 11897 21840 25520 47516
Phenom II 64b 4 3000 & 2962 2980 2986 5776 11973 21594 23822
2 Threads 5470 5720 5676 10384 18099 27000 47542
Core i7 930 64b 4 **** & 6165 6213 6217 9562 15818 18899 23467
2 Threads 8331 9609 9783 15230 22275 24631 46926
Core i7 860 64b 4 #### & 6028 6149 5997 9732 16106 19282 23945
2 Threads 8509 9600 9708 15866 23228 25461 47521
L3 Cache
Phenom II 32b 4 3000 & 841 841 1678 3363 6342 10107 13311
2 Threads 1447 1492 2944 6090 11406 18736 25895
Phenom II 64b 4 3000 & 1488 1490 1499 2986 5688 11256 11967
2 Threads 2831 2584 2788 5621 10514 15690 22640
Core i7 930 64b 4 **** & 3881 3952 4067 7251 11769 17641 21841
2 Threads 4845 5424 6435 11727 17567 22863 41773
Core i7 860 64b 4 #### & 4159 4229 4333 7589 12026 18279 22468
2 Threads 5308 6214 6919 12387 18717 23999 43022
#### i7 860 2800 MHz running using Turbo Boost at up to 3466 MHz (but detuned
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
|
To Start To Dual Core Index
RAM Results in MBytes/Second - 128 MB
CPUs MHz Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
Celeron 1 450 55 61 61 123 215 391 0
2 Threads 61 57 70 157 288 375 0
Pentium III 1 731 80 80 80 161 314 513 0
2 Threads 80 80 75 151 319 544 0
Pentium 4 1 1900 26 52 108 204 417 820 862
2 Threads 26 53 109 215 419 821 869
Athlon XP 1 2088 144 142 250 412 615 1025 0
2 Threads 143 142 250 412 625 1046 0
Pentium 4E HT 1 3000 142 279 563 999 1823 3545 4360
2 Threads 136 277 556 1039 1924 3643 4375
Atom M No HT 1 1600 103 207 413 806 1639 2920 3298
2 Threads 102 199 407 815 1620 2944 3260
Atom M HT 1 1600 100 193 398 810 1511 2842 3159
2 Threads 100 206 410 816 1654 3283 3280
Core 2 Duo C 64b 1 2400 588 629 746 1536 2820 5091 5587
2 Threads 565 625 743 1555 2825 5062 5614
Celeron C2 M 1 2000 219 214 448 818 1482 3010 3483
2 Threads 212 213 446 800 1689 2897 3253
Opteron 2 1992 172 170 328 557 929 1813 1851
2 Threads 178 229 365 922 1569 3300 2746
Athlon MP 2 2000 96 85 151 246 371 622 0
2 Threads 104 103 203 395 674 1110 0
Athlon 64 32b 2 2210 350 318 563 877 1414 2749 2915
2 Threads 322 336 671 1303 2358 4786 4710
Turion 64 Mob 2 1900 246 232 440 732 1246 2407 2600
2 Threads 226 231 471 875 1714 3289 3552
Athlon 64 64b 2 2210 639 698 592 983 1476 2860 2919
2 Threads 558 632 642 1263 2213 4063 4684
Xeon P4 2 3065 96 190 424 752 1235 2367 3309
2 Threads 86 192 404 821 1663 3268 3253
Pentium 4D 2 3000 161 302 564 1066 2147 4010 4697
2 Threads 119 245 450 863 1687 3273 3975
Core 2 Duo M 2 1830 268 278 555 1015 1930 3595 3933
2 Threads 289 290 574 1147 2215 4201 4707
Core 2 Duo A 2 2400 238 253 632 1150 2377 4019 4894
2 Threads 169 202 522 847 1833 3768 3620
Core 2 Duo B 2 2400 241 335 695 1271 2400 4545 5122
2 Threads 228 342 707 1460 2674 5090 5849
Core 2 Duo C 32b 2 2400 318 385 792 1436 2620 4902 5747
2 Threads 316 431 890 1787 3084 5765 7161
Core 2 Duo C 64b 2 2400 600 636 774 1585 2897 5127 5748
2 Threads 611 629 840 1690 3217 5260 7172
Phenom II 32b 4 3000 & 435 454 893 1847 3123 5212 7289
2 Threads 738 760 1484 3049 5192 8959 12146
Phenom II 64b 4 3000 & 822 881 897 1792 3518 5792 7372
2 Threads 1325 1453 1477 2895 5472 8734 12124
Core i7 930 64b 4 **** & 793 965 1440 2800 5055 8922 9589
2 Threads 1052 1165 1631 3321 6337 11170 13081
Core i7 860 64b 4 #### & 988 1186 1657 3210 5757 10108 10953
2 Threads 1313 1446 1968 3967 7507 13076 15560
Core 2 Duo A nForce 570 chipset 533 MHz, C2D Intel 965 chipset B 533 MHz, C 800 MHz
Core 2 Duo M mobile CPU 666 MHz DC2 RAM, Celeron C2 M 533 MHz DDR2 RAM
Phenom II and Core i7 860 usung 1333 MHz DC3 RAM, i7 930 1067 MHz
#### i7 860 2800 MHz running using Turbo Boost at up to 3466 MHz
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
|
To Start To Index
Eight Thread Benchmarks
Bus8Thread64.exe and Bus8Thread32.exe are the latest versions that carry out the same tests as BusMP, manipulating 64 bit and 32 bit integers, but running via 1, 2, 4, 6 and 8 threads. This was produced particularly for Core i7 processors that have 4 cores but properties show 8 CPUs due to the availability of Hyperthreading.
For further information on these and other eight thread benchmarks see Quad Core 8 Thread.htm.
The BusMP programs have twelve functions for the six integer tests, attempting to avoid memory contention, but still generated some slow results using two threads. The latest ones pass different arrays and variables to six sets of common code and, producing these identified the reason for poor BusMP scores. It seems that storing the different variables, used to produce checksums, lead to unnecessary L1 cache flushing as they had adjacent memory addresses (and probably loaded to two CPUs via burst reading). The solution was to use an array to pass the checksums with a multi-word gap for the next thread.
An example of the log file output follows. This is for a four core Pentium Core i7 processor. Note that eight CPUs are indicated by Windows GetSystemInfo.
Comparative results are then provided.
#####################################################################
MP Bus Speed Test 64 bit Version 2.0 Fri Jul 30 16:03:06 2010
Via Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 20105 18713 19136 17974 18126 17910 23345
24 20871 19204 19273 17707 18075 17856 23453
96 3934 3999 4076 7064 12003 15793 21923
384 3833 3907 4013 6968 11774 15805 21853
768 3865 3929 4051 7002 11806 15859 21900
1536 3842 3909 4028 6979 11748 15845 21848
16380 975 1061 1423 2743 4691 8831 9448
131070 949 1048 1419 2736 4698 8812 9459
Part 2 - 2 Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 40146 37384 38154 35915 36233 35824 46624
24 41708 38517 38527 35568 36296 35721 46882
96 6491 6770 7952 13977 23749 31389 43696
384 6170 6403 7817 13827 23377 31569 43252
768 6264 6461 7901 13868 23409 31641 43200
1536 6193 6405 7852 13827 23333 31643 43116
16380 1401 1273 1645 3314 6475 12889 12993
131070 1348 1236 1646 3324 6488 12842 12985
Part 3 - 4 Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 64794 59294 60285 55420 54089 62505 91958
24 55801 55610 47873 49751 55189 53090 79253
96 7877 7747 8451 16454 30896 47690 61990
384 7468 7549 8300 15788 29865 49505 63389
768 7746 7735 8302 16429 29424 47516 65023
1536 7408 7545 8135 15619 29496 48324 61670
16380 1480 1418 1717 3471 6797 13632 13823
131070 1462 1430 1728 3481 6824 13543 13661
Part 4 - 6 Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 67762 66978 76645 74197 72576 77737 136889
24 53927 53535 57057 57339 55476 65243 100393
96 8127 8205 8685 16738 33008 60227 63355
384 7661 7912 8483 16577 32475 58539 61383
768 7967 7821 8571 17102 32517 60040 67265
1536 5130 5238 5499 11220 22209 41870 44368
16380 1497 1469 1740 3490 6949 13877 13976
131070 1460 1455 1733 3493 6961 13642 13878
Part 5 - 8 Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 78565 80765 80972 80495 86139 80267 142857
24 23359 24366 25512 38210 59649 74238 104945
96 8264 8443 8712 17459 34420 65860 68148
384 8122 8130 8497 17042 33674 65546 67427
768 8246 8216 8625 17209 33979 64447 67211
1536 1797 1857 2098 4508 8013 15861 15375
16380 1486 1472 1718 3461 6877 13851 13904
131070 1463 1467 1724 3450 6893 13804 13849
CPUID and RDTSC Assembly Code
CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000106A5
Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz Measured 2806 MHz
Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
AMD64 processor architecture, 8 CPUs
Windows NT Version 6.1, build 7600,
Memory 8184 MB, Free 6779 MB
User Virtual Space 8388608 MB, Free 8388543 MB
|
To Start To Index
Below are available dual core, quad core and quad core with Hyperthreading results for data in L1 Cache L2 Cache and RAM.
With L1 cache data, average performance gain of dual core processors is shown to be slightly less than two times for 2 or more threads. Performance gains using the quad core Phenom average around 3.8 times using 4 or more threads. The Core i7 gains are slightly less using 4 threads, but Hyperthreading kicks in with 6 and 8 threads to provide 4.4 times improvement on integer tests, using 8 threads, and greater than 6 times using SSE2 instructions.
Note that Core i7 Turbo Boost MHz will be lower using four CPUs, probably with no boost at all.
These tests are limited by processor speed, with streamed data to a single register.
This means that speed in MB/second using 64 bit integers can be expected to be twice as fast as that using 32 bit numbers.
Approximate Millions of Instructions Per Second (MIPS) can be obtained by dividing the results by four for 32 bit integer tests, eight at 64 bits and sixteen for 128 bit SSE2 calculations. As with BusSpd2K, the Phenom looks good, with a higher MIPS/MHz ratio (Read All). Except for the Core 2 Duo, SSE2 MIPS/MHz ratios are quite low, possibly leaving more pipeline stages available for Core i7 Hyperthreading.
L1 Cache Results in MBytes/Second - 6 KB
CPUs/ MHz Inc Inc Inc Inc Inc Read 128b
HTs 32wds 16wds 8wds 4wds 2wds All SSE2
Core 2 Duo M 32b 2/0 1830 6299 6653 6665 6831 6614 6890 27543
V32 2 Threads 11279 12315 11952 12193 12596 12254 51667
4 Threads 11755 12907 12985 13172 12808 13101 52472
Athlon 64 32b 2/0 2210 7660 7990 10479 10094 9839 10061 17299
X64 2 Threads 14918 15689 20513 19824 19299 19795 34084
4 Threads 14755 15841 20817 19973 19627 19903 34554
Core 2 Duo 32b 2/0 2400 8468 8988 9012 9180 9213 9254 37049
V64 2 Threads 15305 16992 16431 17206 15852 17377 68187
4 Threads 15772 16660 17437 17623 17746 17742 70598
Phenom II 32b 4/0 3000 10606 13543 13819 13363 13463 14219 23691
764 2 Threads 21150 27063 27605 26633 26854 28435 47423
4 Threads 40763 51778 53876 52158 52049 54630 92595
6 Threads 31624 47862 51680 49239 48359 54370 88023
8 Threads 38638 48086 51190 52780 50112 53126 85948
Core i7 930 32b 4/4 **** 10303 9510 9654 9122 9134 9023 23326
764 2 Threads 20590 19009 19309 18196 18278 18031 46677
4 Threads 29499 28967 30452 32401 29121 31104 91726
6 Threads 35391 34197 35465 36962 37410 35846 137181
8 Threads 41300 41496 43446 38948 43781 39292 170513
64 Bit Version
Athlon 64 64b 2/0 2210 15335 16626 16436 18984 19388 19165 17442
X64 2 Threads 30464 32613 32540 37046 38068 37328 34369
4 Threads 30733 33244 32934 37547 38225 37961 34505
Core 2 Duo 64b 2/0 2400 16255 17621 17958 18346 18340 18425 37001
V64 2 Threads 30950 31475 33680 33412 32928 33573 70601
4 Threads 30116 33695 34040 34231 35342 34284 68674
Phenom II 64b 4/0 3000 20650 21652 25936 25907 26860 27037 23718
764 2 Threads 41151 43229 51655 51829 53580 54049 47380
4 Threads 78774 82990 99324 101024 104673 105713 91159
6 Threads 79061 80404 92704 96655 97729 102354 88050
8 Threads 78384 84959 99584 97863 100433 104259 88818
Core i7 930 64b 4/4 **** 20105 18713 19136 17974 18126 17910 23345
764 2 Threads 40146 37384 38154 35915 36233 35824 46624
4 Threads 64794 59294 60285 55420 54089 62505 91958
6 Threads 67762 66978 76645 74197 72576 77737 136889
8 Threads 78565 80765 80972 80495 86139 80267 142857
Windows V32/64 32 and 64 Bit Vista, X64 64 Bit XP, 764 64 Bit Windows 7
**** i7 930 2800 MHz running using Turbo Boost at up to 3066 MHz
|
To Start 8 Thread Index To Index
Using L2 cache data, the two or more thread Core 2 Duo performance improvement is not that good, as this cache is shared. Athlon and Phenom multiple thread L2 cache performance gains are similar to those using L1 cache. Hyperthreading on the Core i7 does not lead to much improvement on L2 results and four thread tests are relatively slow. In this case, the Phenom is faster.
Results using 1.5 MB L3 cache based data are also shown, the Phenom having 6 MB and the Core i7 8 MB. The latter is around twice as fast as the AMD processor using 1 to 4 threads but, surprisingly, the Phenom is twice as fast using 8 threads.
L2 Cache Results in MBytes/Second - 96 KB
CPUs/ MHz Inc Inc Inc Inc Inc Read 128b
HTs 32wds 16wds 8wds 4wds 2wds All SSE2
Core 2 Duo M 32b 2/0 1830 1518 1505 2402 3355 4942 5970 13925
2 Threads 2237 2188 3852 5158 8742 10380 22791
4 Threads 2247 2253 3876 5510 8971 11288 23223
Athlon 64 32b 2/0 2210 763 655 1231 2312 4890 6571 8919
2 Threads 1485 1291 2417 4536 9562 12793 17491
4 Threads 1487 1314 2446 4620 9669 13015 17715
Core 2 Duo 32b 2/0 2400 2030 1993 3205 4486 6552 7870 18925
2 Threads 2941 2962 5050 7668 11655 13901 31289
4 Threads 3186 3036 5208 7482 11913 14866 31401
Phenom II 32b 4/0 3000 1496 1495 2957 5972 11352 13145 23798
2 Threads 2983 2990 5928 11974 22511 26351 47336
4 Threads 5761 5851 11557 23154 43838 51226 92184
6 Threads 5863 5819 10338 23460 44264 48050 86055
8 Threads 5380 5894 10895 23551 43718 48529 85650
Core i7 930 32b 4/4 **** 1996 2041 3677 5980 8009 8643 22092
2 Threads 3378 3981 7235 11826 15938 17305 43722
4 Threads 3866 4199 8289 15503 24780 26611 60836
6 Threads 4049 4355 8620 16846 28853 33262 64866
8 Threads 4178 4324 8449 16917 31200 37228 68711
64 Bit Version
Athlon 64 64b 2/0 2210 1776 1692 1353 2476 4589 9769 8881
2 Threads 3473 3256 2653 4847 9086 19163 17369
4 Threads 3514 3349 2667 4911 9218 19401 17299
Core 2 Duo 64b 2/0 2400 4026 4073 4011 6529 9015 13240 18870
2 Threads 5851 6384 5773 10640 15026 23496 30304
4 Threads 6392 6315 5934 10790 15485 23501 30601
Phenom II 64b 4/0 3000 2922 2970 2992 5927 11859 22500 23881
2 Threads 5812 5899 5948 11874 23749 44866 47624
4 Threads 11296 11605 11558 23004 46296 83573 92062
6 Threads 11583 10182 11529 23299 42600 83121 93145
8 Threads 10582 10629 11762 23418 42158 85956 93293
Core i7 930 64b 4/4 **** 3934 3999 4076 7064 12003 15793 21923
2 Threads 6491 6770 7952 13977 23749 31389 43696
4 Threads 7877 7747 8451 16454 30896 47690 61990
6 Threads 8127 8205 8685 16738 33008 60227 63355
8 Threads 8264 8443 8712 17459 34420 65860 68148
L3 Cache - 1536 KB Data
Phenom II 64b 4/0 3000 1419 1462 1492 2908 5958 11097 11891
2 Threads 2854 2801 2869 5727 10889 22017 22032
4 Threads 4432 4395 4371 8693 16905 31648 34088
6 Threads 4260 4415 4290 8449 17146 32420 32108
8 Threads 4358 4287 4423 8341 17342 32064 34862
Core i7 930 64b 4/4 **** 3842 3909 4028 6979 11748 15845 21848
2 Threads 6193 6405 7852 13827 23333 31643 43116
4 Threads 7408 7545 8135 15619 29496 48324 61670
6 Threads 5130 5238 5499 11220 22209 41870 44368
8 Threads 1797 1857 2098 4508 8013 15861 15375
|
To Start 8 Thread Index To Index
Maximum data transfers speeds in MB/second using RAM are Core 2 Duo 1 - 8533, Athlon 64 - 6400, Core 2 Duo 2 - 12800, Phenom - 21333, Core i7 - 17067. With multiple threads, the two Core 2 systems achieve 55% of this, the two AMD processors 70% and the Core i7 80%. The latter is fastest using a single thread but is overtaken by the Phenom using four threads.
As a reminder, the first columns represent burst reading. Maximum MB/second can be estimated from these. For example, using 16 increment 32 bit words, the recorded speeds are multiplied by 16 (or 8 for 64 bit words) and normally produce similar results to maximum measured speeds. All appear to use 64 Byte bursts (32 bit words, speed not double from Inc32 to Inc16).
RAM Results in MBytes/Second - 128 MB
CPUs/ MHz Inc Inc Inc Inc Inc Read 128b
HTs 32wds 16wds 8wds 4wds 2wds All SSE2
Core 2 Duo M 32b 2/0 1830 263 273 510 964 1927 3310 4042
2 Threads 288 267 596 1136 2170 4271 4578
4 Threads 292 298 605 1176 2235 4337 4707
Athlon 64 32b 2/0 2210 347 311 543 858 1428 2751 2877
2 Threads 323 332 667 1274 2341 4717 4647
4 Threads 319 329 660 1276 2329 4696 4652
Core 2 Duo 32b 2/0 2400 298 374 754 1402 2512 4514 5617
2 Threads 314 425 846 1730 2990 5523 6886
4 Threads 313 422 882 1761 3044 5706 6997
Phenom II 32b 4/0 3000 439 455 894 1846 3097 5214 7302
2 Threads 744 763 1481 3063 5204 8920 12162
4 Threads 913 946 1875 3763 7177 13000 14952
6 Threads 902 947 1868 3767 6989 13183 15005
8 Threads 909 953 1891 3774 7074 12701 14966
Core i7 930 32b 4/4 **** 526 709 1350 2352 4458 7063 9485
2 Threads 637 824 1661 3227 6454 11883 12945
4 Threads 724 873 1725 3456 6895 13600 13828
6 Threads 731 867 1744 3464 6985 13572 13911
8 Threads 731 861 1724 3433 6925 13750 13722
64 Bit Version
Athlon 64 64b 2/0 2210 629 690 618 1099 1673 2804 2900
2 Threads 588 624 643 1287 2457 4532 4534
4 Threads 587 623 643 1290 2472 4571 4549
Core 2 Duo 64b 2/0 2400 578 616 737 1542 2766 5068 5600
2 Threads 601 619 843 1734 3536 5948 6921
4 Threads 606 629 848 1753 3558 6154 7093
Phenom II 64b 4/0 3000 832 877 911 1784 3676 6237 7360
2 Threads 1433 1492 1529 2946 6102 10404 12145
4 Threads 1815 1826 1898 3735 7524 14356 14998
6 Threads 1803 1831 1900 3719 7546 14324 14997
8 Threads 1791 1777 1915 3761 7567 14441 15080
Core i7 930 64b 4/4 **** 949 1048 1419 2736 4698 8812 9459
2 Threads 1348 1236 1646 3324 6488 12842 12985
4 Threads 1462 1430 1728 3481 6824 13543 13661
6 Threads 1460 1455 1733 3493 6961 13642 13878
8 Threads 1463 1467 1724 3450 6893 13804 13849
|
To Start 8 Thread Index To Index
Reliability Test
BusSpd2K Reliability Test and IntBurn64 log file output has the following format. The write and read test uses a single pass of the data, resulting in numerous passes for the smaller sizes. This leads to loop and data checking overheads being significant and, in some cases, L2 cache speeds appearing to be faster than L1. After the example log file are results on 4 CPUs, including an estimate of overheads in microseconds, where the impact is highest on the Core 2 Duo.
Further results are given in Dual Core Reliability Test Performance Results.
For burn-in example results an associated CPU temperature increases see BurnIn32.htm, BurnIn64.htm and
BurnIn4CPU.htm.
Reliability Test L1 Cache
Test 16 KB, 5 seconds per test, Mon Sep 24 13:06:10 2007
Write/Read
1 8044 MB/sec Pattern 0000000000000000 Result OK 245491 passes
2 8056 MB/sec Pattern 0000000000000000 Result OK 245864 passes
3 8081 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 246611 passes
4 8087 MB/sec Pattern 5555555555555555 Result OK 246811 passes
5 8072 MB/sec Pattern CCCCCCCCCCCCCCCC Result OK 246324 passes
6 8079 MB/sec Pattern 0F0F0F0F0F0F0F0F Result OK 246562 passes
Read
1 16623 MB/sec Pattern 0000000000000000 Result OK 1014600 passes
2 16619 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 1014400 passes
3 16633 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 1015300 passes
4 16605 MB/sec Pattern 5555555555555555 Result OK 1013500 passes
5 16624 MB/sec Pattern 3333333333333333 Result OK 1014700 passes
6 16633 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 1015200 passes
Reliability Write/Read Test MB/Sec 32 Bit BusSpd2K
KB Core 2 Duo Athlon 64 Pentium 4 Athlon 4
2400 MHz 2210 MHz 1900 MHz 2088 MHz
64-Bit Vista XP Pro x64 Win XP Win 2000
4 3933 8299 1861 2588
8 5991 11229 2891 4410
16 8087 13577 *3676 6865
32 8766 15151 *4407 9435
O/H usecs 0.7 0.2 1.5 1.3
* L2 Cache
|
To Start To Index
Dual and Quad Core Reliability Test Performance Results (64 Bit Windows)
The following show data transfer speed in MB/second for increasing memory demands for BusSpd2K Reliability Test and IntBurn64 via 64 Bit Windows. The former uses MMX instructions for maximum speeds with 32 bit working and the latter has normal integer instructions using 64 bit registers. So the results are not directly comparable. Results are for one copy then two copies at the same time to measure dual core performance. The percentage speed gain is also shown.
As with the BusSpd2K Performance Test, The Athlon 64 L1 cache speeds, on reading using MMX instructions, are faster than the Core 2 Duo. It seems that the code used also favours the Athlon 64 using 64 bit integer instructions. The position is reversed for all other results, where most are comparing Core 2 Duo L2 cache speeds with those from Athlon 64 RAM. The shared Core 2 Duo L2 cache is surprisingly fast when being used by two CPUs, except where both could use most of it. Only the 16,000 KB measurements represent memory speeds on both systems. Here, the slower Athlon 64 RAM throughput improvement is better, when using two CPUs.
Later IntBurn64 results are for a quad core Phenom II processor using Windows 7 and showing speeds via L1 cache, dedicated L2 cache, shared L3 cache and RAM. These show that four CPUs, each using 2 MB data, can effectively share the 6 MB L3 cache and four CPUs are needed to obtain maximum memory throughput.
To Start To Index
Example BAT file commands
Start BusSpd2k Reliability, KB 4, Seconds 1, Log Log1.txt
Start BusSpd2k Reliability, KB 4, Seconds 1, Log Log2.txt
Start IntBurn64 Auto, KB 4, Secs 1, P1, Log testCPU1.txt
Start IntBurn64 Auto, KB 4, Secs 1, P2, Log testCPU2.txt
Core 2 Duo 2400 MHz Vista Athlon 64 2210 MHz XP Pro
32KB L1 4MB L2 800 MHz RAM 64KB L1 512KB L2 400 MHz RAM
MBytes/second
32 Bit 64 Bit 32 Bit 64 Bit
KB CPUs Wrt/Rd Read Wrt/Rd Read Wrt/Rd Read Wrt/Rd Read
4 1 3870 15794 4322 16206 8514 20913 12437 22257
2 7287 31401 7737 32248 16926 41503 24684 44389
% 188 199 179 199 199 198 198 199
16 1 8051 16603 8499 16711 13670 22815 18559 23177
2 15761 33014 16483 33114 27304 45290 36821 45996
% 196 199 194 198 200 199 198 198
64 1 8844 13035 9033 12995 15442 23002 18699 23028
2 15945 24899 16185 24884 30833 45677 37533 45234
% 180 191 179 191 200 199 201 196
500 1 9715 13084 9911 13048 8545 9112 8104 10102
2 17222 25111 17243 25033 17031 18023 16168 19957
% 177 192 174 192 199 198 200 198
1000 1 9756 13098 9737 13007 2125 2897 2072 3050
2 17183 24670 17245 25035 2476 4736 2459 4917
% 176 188 177 192 116 163 119 161
2000 1 9567 12980 9664 12919 2101 2898 2074 3014
2 15611 23144 15672 23399 2480 4629 2445 4904
% 163 178 162 181 118 160 118 163
4000 1 8350 11902 8955 12159 2098 2879 2045 3011
2 4095 6720 4185 6657 2477 4693 2485 4873
% 49 56 47 55 118 163 121 162
16000 1 3466 5433 3370 5408 2086 2872 2055 3009
2 3687 6066 3598 6019 2454 4706 2478 4838
% 106 112 107 111 118 164 121 161
Example BAT file commands to test 4 CPUs
System - 3.0 GHz Phenom II, 1333 MHz DDR3 RAM, 64-Bit Windows 7
Caches L1 64 KB, L2 512 KB, L3 6 MB shared
Start IntBurn64 Auto, KB 65536, Secs 1, P1, Log quad1a.txt
Start IntBurn64 Auto, KB 65536, Secs 1, P2, Log quad2a.txt
Start IntBurn64 Auto, KB 65536, Secs 1, P3, Log quad3a.txt
Start IntBurn64 Auto, KB 65536, Secs 1, P4, Log quad4a.txt
MBytes/second
L1 Cache Data L2 Cache Data L3 Cache Data RAM Data
KB A 4 B 16 KB A 128 B 256 MB A 1 B 2 MB A 8 B 64
KB CPUs Wrt/Rd Read Wrt/Rd Read Wrt/Rd Read Wrt/Rd Read
A 1 18449 30442 14807 21445 9527 11330 5373 6142
2 35856 60888 29611 42917 18628 22099 7655 10519
% 194 200 200 200 196 195 142 171
4 67862 120605 58403 85383 34624 36651 8990 14567
% 368 396 394 398 363 324 167 237
B 1 25960 31681 14775 21518 9496 11062 5292 6263
2 52682 63357 29533 42984 18586 22123 7552 10475
% 203 200 200 200 196 200 143 167
4 101116 125675 58352 85636 31344 34221 8978 14482
% 390 397 395 398 330 309 170 231
|
To Start To Index
Paging Tests
BusSpd2K and IntBurn64 standard Reliability Tests can be used to demonstrate memory speeds with paging but, with 12 tests, this can take a long time. So an additional Paging option has been included that uses just one of the write/read tests and is controlled by a BAT file command. Example command, output and results are below. More details, with results via 32 bit and 64 bit Windows, are in Paging.htm.
This shows that data transfer speed can be much slower than using normal disk writes/reads and the varying maximum amount of data that can be used - from 1.2 GB using Windows XP to slightly less than 8 GB using 64-Bit Vista, then 14 GB with 64-Bit Windows 7. These probably depend on RAM capacity provided. It also shows that Vista paging can be more efficient than other versions of Windows and Windows 7 even better.
Command and Output (BusSpd2K)
BAT command - start busspd2k Reliability, Paging, KB 2000000, Secs 1
Test 2000000 KB, 1 seconds per test, Fri Sep 21 20:44:46 2007
Write/Read 1.9 Seconds 2048.0 MB x 1 passes x 2 (M = 1,000,000)
2139 MB/sec Pattern 5555555555555555 Result OK 1 passes
Example Results IntBurn64
4 GB RAM, 64-Bit Vista 8 GB RAM, 64-Bit Windows 7
KB Passes Seconds MB/sec KB Passes Seconds MB/sec
2000000 4 5.4 3056 6000000 2 3.1 4051
3000000 3 6.4 2878 7000000 2 3.5 4078
3500000 1 6.7 1075 8000000 1 227.0 72
4000000 1 145.3 56 9000000 1 697.0 26
5000000 1 1040.3 10 10000000 1 1231.0 17
7900000 1 771.0 21 14000000 1 2742.0 10
8000000 Cannot allocate memory 15000000 Cannot allocate memory
|
To Start To Index
Roy Longbottom August 2010
The new Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|