Title

Roy Longbottom at Linkedin  More OpenMP Parallel Computing Benchmarks

Contents


MemSpeed Example Log Files Different Version Results
Results On A Different Processor Other Benchmark Compilations

General

OpenMP is a system independent set of procedures and software that arranges automatic parallel processing of shared memory data when more than one processor is provided. This option is available in the latest Microsoft C++ compilers. The first benchmark, described in OpenMP MFLOPS, executes the same functions, using the same data sizes, as the CUDA Graphics GPU Parallel Computing Benchmark, with varieties compiled for 32 bit and 64 bit operation, using old style i387 floating point instructions and more recent SSE code.

It was decided to compile other existing benchmarks using the same Microsoft compiler and OpenMP directive, the first one being the Linpack Benchmark, where performance is mainly governed by a loop containing

   dy[i] = dy[i] + da * dx[i]             
The speed measured by the OPenMP version was unexpectedly extremely slow. So, it was decided to produce a variation of the MemSpeed Benchmark, with the same calculations, but using data sizes that occupy increasing memory sizes to test caches and RAM. Other benchmarks were also converted to identify other slow functions. Some of these showed that careless use of OpenMP leads to programs producing wrong and inconsistent numeric results.

The new benchmarks are included for download in OpenMPMflops.zip. No installation is necessary - Extract All and click on EXE files.

The OpenMP benchmarks have also been ported to 32-Bit and 64-Bit Linux using the supplied GCC compiler (all free software) - see linux benchmarks.htm, linux openmp benchmarks.htm and download benchmark execution files, source code, compile and run instructions in linux_openmp.tar.gz. Using Windows the file downloaded wrongly as linux_openmp.tar.tar but was fine when renamed linux_openmp.tar.gz.

To Start


MemSpeed

MemSpeed benchmark employs three different sequences of operations, on 64 bit double precision floating point numbers, 32 bit single precision numbers and 32 bit integers via two data arrays:

   Sum to register   r = r + x [m] * y[m] (Integer + y [m])
   Sum to memory     x[m] = x[m] + y[m]                    
   Memory to memory  x[m] = y[m]                           
   

MemSpd2K, the latest standard version, uses assembly code to execute the same instructions as the original MemSpeed benchmark. This special version for OpenMP is again all C code, with the first linked triad tests returning results to memory via:

   Sum to memory     x[m] = x[m] + r * y[m]                

Memory tested doubles up from 4 KB to 25% of RAM size, to use all caches and RAM. Speed measurements are data reading speeds in MegaBytes Per Second. For tests using arithmetic operations, speed in MFLOPS can be calculated as MB/second divided by 4 for single precision floating point tests and divided by 8 for those using double precision.

To Start


Example Log Files

Below are OpenMP (MemSpdOMP.exe) results produced from running on a Quad CPU Phenom processor using 64-Bit Windows 7 and those for the same code produced without the OpenMP compiler parameter (MemSpdNotOMP.exe). The programs each identify the system hardware and software as shown before performance details. Of particular note are the extremely slow OpenMP speeds for the smaller data sizes.

The slowest original OpenMP floating point benchmark results on this PC were 1920 MFLOPS using one CPU and 5587 MFLOPS with four processors. This was at 100,000 words or 400 KBytes. This MemSpeed version is similar at 512 KB, with 9921 MB/second or 2480 single precision MFLOPS with one CPU, and 22009 MB/second or 5502 MFLOPS with four CPUs using OpenMP. The single processor speeds are faster with less data, using L1 cache but, unexpectedly, those for OpenMP are progressively slower.

As with other benchmarks running on this system, use by more than one processor is required for maximum throughput from RAM.


  CPUID and RDTSC Assembly Code
  CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00100F42
  AMD Phenom(tm) II X4 945 Processor Measured 3013 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, Has 3DNow, 
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  Intel processor architecture, 4 CPUs 
  Windows NT  Version 6.1, build 7600, 
  Memory 4096 MB, Free 4096 MB
  User Virtual Space 4096 MB, Free 3005 MB


  OPenMP Version

      Memory Reading Speed Test OpenMP Version 4.0 by Roy Longbottom

      0.100 seconds per test, Start Wed Oct 13 12:27:26 2010

  Memory    x[m]=x[m]+s*y[m] Int+  x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl   Int    Dble   Sngl   Int    Dble   Sngl   Int
   Used     MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S

      4      418    436    439    438    441    449    222    224    225
      8      874    862    866    849    873    867    443    445    443
     16     1727   1713   1700   1730   1708   1737    878    853    873
     32     3341   3234   3263   3378   3218   3287   1724   1680   1647
     64     6123   5792   5978   6280   5922   6024   3156   3103   3052
    128    10822   9932  10085  11262   9666  10149   5848   5335   5481
    256    17639  15485  16134  18178  15582  16453   9879   8871   8853
    512    25742  22009  22123  26990  21379  22327  13959  12877  13138
   1024    33657  27622  26572  35721  27548  27919  19185  16918  16260
   2048    37554  30171  31756  37599  31174  30073  22600  18869  19298
   4096    24280  22284  23117  26256  22540  22471  14475  11822  12494
   8192    16476  13555  15907  18268  14493  15129   9479   7495   8435
  16384     7394   7137   7077   7743   7004   7248   3920   3697   3692
  32768     7387   6969   7184   7644   7167   7124   3987   3618   3752
  65536     7486   7188   7240   7733   6975   7077   3974   3725   3773
 131072     7462   7163   7249   7775   7197   7258   3976   3603   3654
 262144     7578   7207   7280   7816   7208   7223   4029   3632   3812
 524288     7652   7405   7344   8009   7331   7487   4084   3837   3825
1048576     7720   7373   7469   8012   7181   7480   4112   3837   3789

                End of test Wed Oct 13 12:28:05 2010

 
  Normal Compilation

      Memory Reading Speed Test Version 4.0 by Roy Longbottom

      0.100 seconds per test, Start Wed Oct 13 12:26:33 2010

  Memory    x[m]=x[m]+s*y[m] Int+  x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl   Int    Dble   Sngl   Int    Dble   Sngl   Int
   Used     MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S

      4    22924  11651  12725  23949  12063  12721  15055   7771   9346
      8    23536  11839  13242  24553  12230  13677  15577   7855   9488
     16    23834  11887  12790  24828  12294  13728  15816   7957   9557
     32    23407  11902  13659  23941  12159  12991  15478   7913   9482
     64    23669  11847  12910  24528  12337  13543  15626   7913   9464
    128    14703   9926  10290  14750  10443  10243   8688   6701   6981
    256    14644   9906  10175  14884  10130  10166   8593   6680   6927
    512    14302   9921  10171  14895  10376  10188   8611   6687   6899
   1024     8246   7091   7026   8596   7017   7190   4509   3911   3976
   2048     8166   6976   7142   8545   7019   7125   4452   3880   3937
   4096     8006   6898   6984   8392   6836   7003   4469   3788   3857
   8192     4416   3983   4175   4530   4037   4169   2341   2157   2202
  16384     4244   3888   3993   4484   3826   4010   2298   2093   2135
  32768     4249   3885   3958   4467   3888   3966   2256   2095   2123
  65536     4235   3892   3929   4424   3875   3991   2293   2079   2137
 131072     4264   3894   3965   4487   3904   3980   2302   2092   2125
 262144     4279   3870   3991   4394   3903   4007   2305   2090   2131
 524288     4235   3873   3968   4423   3906   3998   2222   2073   2127
1048576     4297   3922   3976   4520   3913   3976   2325   2107   2142

                End of test Wed Oct 13 12:27:12 2010

To Start


Results From Different Versions


The OpenMP version was also run using Task Manager Processes Affinity options to execute using one and two processors. These produced the same sort of speeds as the OpenMP log above, using the smaller data sizes. Viewing the Threads column, in Task Manager Processes, shows that four threads are used irrespective of the number of CPUs selected by Affinity settings. Calculations indicate that there is a OpenMP startup overhead, for all these tests, of around 9 microseconds with this Phenom processor. Note that, with the normal compilation, the time to read 100 KB is about 9 microseconds.

The speed of the OpenMP tests, relative to those for the normal compilation, are shown in the graph. Maximum speeds are only achieved with data in the 6144 KB L3 cache. Performance with the larger data sizes are limited by RAM speed.


Single Precision Floating Point x[m]=x[m]+s*y[m]




To Start


Results Different Processors

Following are results of single and double precision calculations of the x[m]=x[m]+s*y[m] tests on a PC with a Core 2 Duo using 64-Bit Vista. The first two columns are for normal compilations, without OpenMP. The next four columns show data transfer speeds using one and two cores with OpenMP functions. Next are loss and gain ratios for the single precision speeds, where dual core throughput improvement is associated with data in the shared 4096 KB L2 cache. The last column reflect startup overheads of at least 9 microseconds.

Later results shown are for a dual core Core i5 that also has Hyperthreading (See configuration details - Intel processor architecture, 4 CPUs). Then, there are full results for a 4 core, 8 thread Core i7, with and without using OpenMP. Here, the impact of the latter is even worse, with the single thread version being up to 100 times faster. There are performance gains of up to 3.85 times using shared L3 cache and twice 2.0 times using RAM


  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
  Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz Measured 2402 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  Intel processor architecture, 2 CPUs 
  Windows NT  Version 6.0, build 6002, Service Pack 2
  Memory 4095 MB, Free 1079 MB
  User Virtual Space 4096 MB, Free 3018 MB

                            x[m]=x[m]+s*y[m]
                                                       Loss/Gain SP
           Not OpenMP    OpenMP 1 CPU  OPenMP 2 CPUs   1 CPU 2 CPUs  2 CPUs
  KBytes   Dble   Sngl    Dble   Sngl    Dble   Sngl    Sngl   Sngl   usecs
    Used   MB/S   MB/S    MB/S   MB/S    MB/S   MB/S   ratio  ratio   /pass

       4  18490   9185     547    553     425    413    0.06   0.04       9
       8  18631   9349    1051   1005     842    830    0.11   0.09      10
      16  18903   9467    1903   1827    1681   1630    0.19   0.17      10
      32  18739   9487    3059   2831    2558   2640    0.30   0.28      11
      64  11535   7631    4552   3986    5148   4751    0.52   0.62      14
     128  11626   7584    6150   5234    7553   6765    0.69   0.89      18
     256  11634   7686    7263   5815   10645   8937    0.76   1.16      30
     512  11632   7524    8375   6395   12273  10469    0.85   1.39      46
    1024  11605   7638    8362   7131   13733   9631    0.93   1.26      87
    2048  11408   7298    8998   7118   15255  11028    0.98   1.51     162
    4096   8626   7057    7792   5856   13525  10211    0.83   1.45     350
    8192   4287   4222    3667   3685    4367   4318    0.87   1.02    2287
   16384   3690   3532    3360   3510    3409   3718    0.99   1.05    4421
   32768   3284   3431    3472   3315    2815   3017    0.97   0.88    9166
   65536   3572   3460    3458   3452    3570   3602    1.00   1.04   19270
  131072   3485   3550    3429   3376    3656   3466    0.95   0.98   36268
  262144   3504   3570    3638   2990    3727   3564    0.84   1.00   70469
  524288   3650   3533    3130   3500    3737   3637    0.99   1.03  143996
 1048576   3696   3534    3616   3598    3603   3002    1.02   0.85  285017


  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000206A7
  Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz Measured 1596 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  Intel processor architecture, 4 CPUs 
  Windows NT  Version 6.1, build 7601, Service Pack 1
  Memory 4096 MB, Free 4096 MB
  User Virtual Space 4096 MB, Free 3006 MB

                       x[m]=x[m]+s*y[m]

           Not OpenMP     OPenMP 2 CPUs    Loss/Gain   2 CPUs
  KBytes   Dble   Sngl     Dble   Sngl    Dble   Sngl   usecs
    Used   MB/S   MB/S     MB/S   MB/S   ratio  ratio   /pass

       4  19157   9719      250    262    0.01   0.03      15
       8  19932  10030      718    697    0.04   0.07      11
      16  20002   9768     1413   1372    0.07   0.14      12
      32  19766  10046     2723   2587    0.14   0.26      12
      64  17504   9708     4940   4536    0.28   0.47      15
     128  17415  10066     8351   7018    0.48   0.70      17
     256  17368   9676    12771   9624    0.74   0.99      25
     512   9736   6919    15949  11184    1.64   1.62      54
    1024   9944   6919    14707  10785    1.48   1.56      91
    2048   9763   6815    16064  10940    1.65   1.61     177
    4096   7895   6077    10684   9087    1.35   1.50     421
    8192   7646   6045     9156   8920    1.20   1.48     966
   16384   7643   5942     9096   9179    1.19   1.54    1751
   32768   7658   6031     9528   9655    1.24   1.60    3475
   65536   7718   6045    10187   9730    1.32   1.61    5767
  131072   7734   6061     9572   9638    1.24   1.59   14493
  262144   7934   6117    10563   9588    1.33   1.57   27239
  524288   8137   6248    10492  10612    1.29   1.70   49118
 1048576   8138   6221    11311  10512    1.39   1.69   98708

 ############################################################

 Windows 8.1 64-Bit, Core i7-4820K 3.7 GHz, 4 Channel DDR3 1600 MHz RAM

  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4
  Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz Measured 3711 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  Intel processor architecture, 8 CPUs 
  Windows NT  Version 6.2, build 9200, 
  Memory 4096 MB, Free 4096 MB
  User Virtual Space 4096 MB, Free 2999 MB

      Memory Reading Speed Test OpenMP Version 4.0 by Roy Longbottom

      0.100 seconds per test, Start Tue Sep 30 10:24:44 2014

  Memory    x[m]=x[m]+s*y[m] Int+  x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl   Int    Dble   Sngl   Int    Dble   Sngl   Int
   Used     MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S

      4      329    328    324    331    346    345    174    173    173 L1
      8      685    697    685    684    689    687    347    345    345
     16     1380   1362   1353   1381   1365   1369    698    683    666
     32     2727   2675   2711   2705   2708   2703   1358   1361   1375
     64     5250   5257   5242   5359   5221   5278   2711   2659   2670 L2
    128    10368   9981   9925  10466  10119  10164   5286   5156   5174
    256    19247  17801  17303  19893  18320  18449  10314   9539   9515
    512    33203  28253  28718  34933  30428  30399  18218  16123  16077 L3
   1024    48844  38945  40108  52994  42481  42635  27973  22975  23084
   2048    65318  49487  50672  68134  55093  48589  36840  30226  30346
   4096    79834  56326  58847  85096  63436  63283  45528  36018  35250
   8192    83167  59969  61809  87526  67789  66920  45066  38250  38200
  16384    26091  25915  25962  26063  26029  26043  13128  13026  13003 RAM
  32768    24690  23614  24635  24782  24723  24611  12502  12398  12381
  65536    24678  24595  24661  24865  24739  24760  12382  12511  12469
 131072    25203  25127  25129  25307  25101  25146  12752  12691  12673
 262144    25489  24881  25358  25433  25297  25346  12645  12777  12748
 524288    25639  25093  25400  25495  24977  25445  12838  12722  12825
1048576    25953  26054  25955  25926  25957  26063  13043  13033  12999


Not OpenMP

      Memory Reading Speed Test Version 4.0 by Roy Longbottom

      0.100 seconds per test, Start Tue Sep 30 10:21:40 2014

  Memory    x[m]=x[m]+s*y[m] Int+  x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl   Int    Dble   Sngl   Int    Dble   Sngl   Int
   Used     MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S

      4    32502  16875  18651  37629  19264  18745  26795  14188  11974 L1
      8    33594  17065  18877  38727  19574  19350  28999  15089  12670
     16    35686  18063  19965  41194  20699  20035  30259  15408  12678
     32    35887  18064  19973  41221  20718  19996  26834  14694  12575
     64    31618  17918  19982  34163  20203  19993  23658  13109  12516 L2
    128    31641  17909  19967  34038  20273  20016  23503  13159  12557
    256    30443  17696  19894  32707  20076  19878  22051  12878  12348
    512    24592  17512  18549  25832  18477  18431  15095  10902  10701 L3
   1024    24667  17479  18608  25860  18479  18501  15028  10912  10887
   2048    24675  17485  18552  25867  18461  18540  15015  10910  10896
   4096    24160  17110  18235  25504  18092  18265  14896  10826  10485
   8192    22490  16639  17598  23718  17608  17487  13493  10413  10548
  16384    15145  13134  13423  15247  13157  13401   7646   7775   7842 RAM
  32768    14783  13029  13210  14894  12963  12855   7408   7587   7565
  65536    14827  13100  13226  14923  13016  13258   7432   7641   7645
 131072    14958  13033  13279  15007  13052  12410   7398   7664   7632
 262144    14901  13124  13266  15032  13097  13273   7489   7666   7647
 524288    14897  13124  13304  15077  13063  13273   7456   7688   7568
1048576    14813  12940  13165  15028  12947  13265   7411   7660   7662



To Start


Other Benchmark Compilations

The Livermore Loops Benchmark was converted to use OpenMP. This is the 1970’s benchmark that set the standards for the first supercomputers (Cray 1 onwards). It has 24 kernels of numerical application with performance measured in MFLOPS. Each kernel produces a double precision floating point checksum to demonstrate accuracy of the system being tested and this can vary slightly, depending on the compiler and options used. My C++ program checks these numbers against those built-in for a particular compilation (for use as a reliability/burn-in test). The kernels are run three times using decreasing memory demands, mainly starting at 8 KB for each of one or more arrays.

The first results below are for the normal compilation, with checksums identical to the first successful run. This includes specifying the “#pragma omp parallel for” directives but they are not used. The other results are for runs with these directives enabled by using the /openmp compiler parameter. Kernels 16 and 17 have no loops for the pragma to apply.

The next results are with OpenMP using four processors, where a few tests are slightly faster than above, but many are much slower. Even worse, the calculations do not produce the same checksum numeric results and repeated runs show that the value can be unpredictable. The third results are with OpenMP using one CPU (but two threads), where identical wrong checksums appear to be produced on repeating the benchmark.

There are a number of other OpenMP programming options and the simple directive used here is not suitable for many of the kernels. Anything more complex than the MemSpeed x[m]=x[m]+r*y[m] needs careful consideration to ensure that instructions are executed in a consistent sequence and functions run long enough to absorb startup delays. Maybe it is best to leave it to a compiler that can ensure that the correct and most efficient procedures are used.

Later results are for a dual core 4 thread Core i5 CPU and a quad core, 8 thread Core i7 processor, showing the same (or worse) degradation effects with Intel.


############################################################

  AMD Phenom(tm) II X4 945 Processor Measured 3013 MHz

  Normal MFLOPS for 24 loops

 2622.5 1851.1  887.0 1454.3  336.3  779.3 3405.7 3011.2 2861.3 1428.9  207.0 1394.6
  280.2  559.9 1162.3  989.0  999.2 2087.7  522.9 1177.1 1815.8  282.1  964.3  661.7

  Numeric results were as expected


  OpenMP MFLOPS for 24 loops

  522.9    6.2  210.0  133.9  193.1   86.5 1560.6  371.6  189.8   99.4   98.6  108.2
   44.5  228.4  279.3  939.7  999.2  154.5   32.9  480.1   22.3  159.0  116.6  108.0

  Section 1 Test  6  result was 4.312366077873135e+003 expected 4.375116344729986e+003
  Section 1 Test 13  result was 1.202533952702805e+011 expected 1.202533961842805e+011
  Section 1 Test 14  result was 3.165549299821230e+009 expected 3.165553044000335e+009
  Section 1 Test 20  result was 3.042067004051425e+007 expected 3.040644339351239e+007

  Section 2 Test 13  result was 9.816387759644356e+010 expected 9.816387810944356e+010
  Section 2 Test 19  result was 5.421816884714813e+002 expected 5.421816960147207e+002

  Section 3 Test 19  result was 1.268230668053491e+001 expected 1.268230698051004e+001


  Different Results Next Run

  Section 1 Test  6  result was 4.345898038418117e+003 expected 4.375116344729986e+003
  Section 1 Test 14  result was 3.165550475680920e+009 expected 3.165553044000335e+009
  Section 1 Test 19  result was 5.421816884714813e+002 expected 5.421816960147207e+002
  Section 1 Test 20  result was 3.042636088846063e+007 expected 3.040644339351239e+007

  Section 3 Test 19  result was 1.268230698051474e+001 expected 1.268230698051004e+001


  Affinity Set To Use 1 CPU - Consistent Results

  MFLOPS for 24 loops

  466.8    6.6  182.7  106.8  141.2  216.7 1169.0  359.1  186.4   93.3   76.4  104.9
   42.3  233.6  235.2  892.8 1001.5  152.8   32.9  838.0   22.7  117.1  113.4  101.3

  Section 1 Test  2  result was 1.542092319263005e+003 expected 1.539721811668385e+003
  Section 1 Test 19  result was 5.421816947167190e+002 expected 5.421816960147207e+002

  Section 2 Test  2  result was 1.542092319263005e+003 expected 1.539721811668385e+003
  Section 2 Test 19  result was 5.421816947167190e+002 expected 5.421816960147207e+002

  Section 3 Test  2  result was 3.958295105509222e+001 expected 3.953296986903060e+001
  Section 3 Test  3  result was 2.699309089320673e-001 expected 2.699309089320672e-001
  Section 3 Test 19  result was 1.268230657539253e+001 expected 1.268230698051004e+001


############################################################

  Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz Measured 1596 MHz

  Normal MFLOPS for 24 loops

 2094.0 1711.7  964.3 1254.7  286.7  809.9 2761.5 3030.6 3373.6 1285.8  256.4 1127.4
  520.9  681.1  864.9 1250.6 1001.4 1547.4  568.4  892.5 1645.5  238.5  941.4  902.4

  OpenMP MFLOPS for 24 loops

  359.3    4.8  141.7   74.9  104.1  134.4  745.8  221.2  110.0   61.2   67.0   71.8
   30.8  208.0  175.5  873.2  696.8   80.3   20.9  502.8   15.1  102.2   79.0   73.3


 ############################################################

 Windows 8.1 64-Bit, Core i7-4820K 3.7 GHz

  Normal MFLOPS for 24 loops

 4901.9 3628.6 2568.4 2640.4  564.7 1590.4 4685.2 5227.3 5595.2 2833.1  441.3 1932.3
  996.0 1245.6 2289.5 2245.1 1778.9 3549.2 1069.2 1883.2 2827.2  411.1 1598.2 1621.3

  OpenMP MFLOPS for 24 loops

  440.7    5.0  175.8  108.0  170.7  224.0 1361.9  305.1  150.1   78.1   85.9   87.3
   37.6  296.0  237.8 2258.1 1784.1  125.7   26.5 1461.5   17.9  140.6   93.7   84.8



To Start




Roy Longbottom at Linkedin  Roy Longbottom October 2014



The Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection