Description
MemSpd2K benchmark employs three different sequences of operations, on 64 bit double precision floating point numbers, 32 bit single precision numbers and 32 bit integers via two data arrays:
Sum to register r = r + x [m] * y[m] (Integer + y [m])
Sum to memory x[m] = x[m] + y[m]
Memory to memory x[m] = y[m]
These are executed using assembly code which uses the same instructions as the original MemSpeed benchmark. The memory loading speed is calculated in terms of millions of bytes per second (MB/S). Measurements are made at 4000, 8000, 1600 etc. memory bytes up to 25% of the main RAM size to produce speed ratings via data from different levels of cache and from RAM.
A pre-compiled version of the benchmark can be found in MemSpd2K.zip which also contains the source code, providing further explanatory comments.
MemSpeed can be found in DOSTests.zip - file MDTRDOS.exe.
Then there is My Main Page for other PC benchmarks and results.
The two arrays are allocated with addresses in multiples of 2048 bytes apart. This identifies a design limitation with the Intel P4 CPU, producing false cache flushing and some very slow speeds. The problem did not affect the earlier MemSpeed benchmark, so results for this are also provided.
This problem appears to have been rectified on later P4 CPUs (see P4E) but SSE3DNow benchmark is the preferred option as it uses the same calculations to measure performance (For benchmark see SSE3DNow.zip and SSE3DNow results.htm, also BusSpd2K results.htm and RandMem results.htm ).
Following are example output for both MemSpeed and MemSpd2K on a 1.9 GHz Pentium 4 CPU with PC133 RAM. Variations in performance identify L1 and L2 cache sizes.
It can be seen that some MemSpd2K speeds can be slower on reading data in caches than that from main memory.
MemSpeed Results
Memory s=s+x[m]*y[m ]Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int Dble Sngl Int Dble Sngl Int
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
L1 4 5607 2834 6417 9134 4096 5090 4774 2656 3413
8 5689 2852 6320 9433 4000 5125 4769 2627 3466
L2 16 5885 2849 3697 7306 3507 4655 3870 2143 2932
32 5896 2865 3712 7529 3523 4650 3893 2151 2942
64 5909 2872 3683 7420 3514 4664 3908 2152 2937
128 5860 2837 3644 7495 3497 4679 3863 2151 2926
256 2069 1725 2202 1548 1267 1818 772 665 977
RAM 512 843 839 832 544 552 551 273 277 277
1024 834 824 824 539 542 547 270 272 277
2048 816 811 888 531 536 566 266 270 285
MemSpd2K Results
Memory s=s+x[m]*y[m ]Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int Dble Sngl Int Dble Sngl Int
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
4 1549 305 166 1756 1498 1260 3997 1825 1626
8 1740 344 159 2657 1523 1292 4138 1803 1547
16 1722 376 141 1655 1516 1300 4067 1809 1394
32 1727 830 96 2312 1141 1169 3462 1567 1325
64 1719 1022 90 2389 1261 1170 3153 1554 1267
128 1687 842 97 2243 1285 1161 2530 1472 1341
256 1505 634 116 1888 988 1069 2104 1309 1203
512 823 653 300 570 506 561 294 294 288
1024 823 656 299 576 505 562 294 295 289
2048 827 650 299 575 507 563 296 296 295
4096 825 656 299 577 505 566 297 296 296
8192 822 662 301 578 511 567 295 298 297
16384 800 654 300 554 498 548 287 283 286
32768 790 646 301 542 485 537 281 280 279
65536 784 639 305 528 481 530 280 278 278
|
To Start
Results
Separate tables of speeds obtained via L1 cache, L2 cache and RAM are given below. Except when connected via the memory bus, performance via caches tends to be proportional to CPU MHz for a given type of processor. So, only a sample of results are provided. Details of cache sizes, speed and range of CPU MHz can be found in CPUSpeed.htm.
The benchmark is very sensitive to synchronisation between the CPU and memory, so resultant speeds can be different. Large variations are also apparent with different mainboards.
To Start
|