Title

Classic Computer Benchmarks

Roy Longbottom

Index

Older Files Lists For DOS, OS/2, Windows and Linux
1970 Livermore Loops Numeric benchmark for supercomputers
1972 Whetstone Floating point benchmark for minicomputers
1979 Linpack Floating point benchmark for workstations
1984 Dhrystone Integer benchmark for UNIX systems

Note - Here, most of the links to benchmark downloads and results are from a 2013 versions of the Benchmarks.

Introduction

The Classic Benchmarks are the first programs that set standards of performance for computers. They were the initial benchmarks in the Roy Longbottom PC Benchmark Collection, produced via C/C++ in 1996, following the same execution format:

1: Running time to be calibrated to run for a noticeable time
2: Results to be displayed as programs are running
3: Results or summary to be saved in text log files
4: Facilities to type details of the system used for inclusion in the log
4: Later versions collected automatically
5: Two versions to be provided, optimised and non-optimised
6: Installation not to be required, just click and go
7: Execution and source code to be provided in zip files

The initial series of Classic Benchmarks, to run via Windows, are available to download from BenchNT.zip and later 32 bit and 64 bit versions in Win64.zip. Then there are preconpiled versions for DOS and OS/2 in DosTests.zip and OS2Tests.zip. Also there are 32 bit and 64 bit versions to run on PCs using Linux in classic_benchmarks.tar.gz See lists of files below.

Original ARMv7 Android apps can be downloaded from Android Benchmarks.htm that also contains results, with project files in Android Benchmarks.zip. Then newer ones, that automatically select benchmark code for ARM, Intel or MIPS processors at run time, for 32 bit architecture or 64 bit operation, can be obtained via the web site index along with 32 bit and 64 bit compilations for the ARM CPU based Raspberry Pi systems.

These Classic Benchmarks were first made available in 1996. Then, with the tools available, performance was measured using low resolution timers when, for more accurate results, 5 seconds minimum running time of individually timed functions was chosen. This lead to extended running times for the Whetstone Benchmark and, particularly, the Livermore Loops program. With later high resolution timers, minimum test times were reduced to 1 second.

Early versions of the benchmarks produced text based logs showing the same details as running displays. Later, the text files provided a summary of the important performance measurements. Details below include copies of the running displays. Of particular significance are details of running times during calibration, that determines the number of repeat passes for the required running time. The code would be suspect if pass count and time did not increase linearly.

Where appropriate, the program source codes include tables of numeric results of floating point calculations that can depend on different compilers and CPU technology. These are checked, producing “Was” and “Should be” numbers, when inconsistent. The former can be included in a table for a new compilation.

Go Back




Older Files

The following files have been uploaded, containing 71 benchmark execution files and source code to use via DOS, OS/2, Windows or Linux.

dostests.zip 32 bit

Dhry1ND.exe   - Dhrystone 1 Benchmark Non-Optimised 
Dhry1OD.exe   - Dhrystone 1 Benchmark Optimised 
Dhry2ND.exe   - Dhrystone 2 Benchmark Non-Optimised 
Dhry2OD.exe   - Dhrystone 2 Benchmark Optimised 
LinpCND.exe   - Linpack Benchmark Non-Optimised
LinpCOD.exe   - Linpack Benchmark Optimised
LiveCND.exe   - Livermore Loops Benchmark Non-Optimised
LiveCOD.exe   - Livermore Loops Benchmark Optimised
WhetCND.exe   - Whetstone Benchmark Non-Optimised Single Precision
WhetCOD.exe   - Whetstone Benchmark Optimised Single Precision
WhetDCD.exe   - Whetstone Benchmark Optimised Double Precision

cb16bit.zip - DOS 16 bit

DHRY1C16.EXE  - Dhrystone 1 compiled with Watcom 10.5 C/C++
WHETC16.EXE   - Whetstone compiled with Watcom 10.5 C/C++
WHETBAS.EXE   - Whetstone BASIC compiled with BASCOM
WHETPROF.EXE  - Whetstone FORTRAN compiled with PROFORT
WHETMS5F.EXE  - Whetstone FORTRAN compiled with MS 5.1

os2tests.zip

Dhry1NO2.exe   - Dhrystone 1 Benchmark Non-Optimised 
Dhry1OO2.exe   - Dhrystone 1 Benchmark Optimised 
Dhry2NO2.exe   - Dhrystone 2 Benchmark Non-Optimised 
Dhry2OO2.exe   - Dhrystone 2 Benchmark Optimised 
LinpCNO2.exe   - Linpack Benchmark Non-Optimised
LinpCOO2.exe   - Linpack Benchmark Optimised
LiveCNO2.exe   - Livermore Loops Benchmark Non-Optimised
LiveCOO2.exe   - Livermore Loops Benchmark Optimised
WhetCNO2.exe   - Whetstone Benchmark Non-Optimised Single Precision
WhetCOO2.exe   - Whetstone Benchmark Optimised Single Precision
WhetDCO2.exe   - Whetstone Benchmark Optimised Double Precision

benchnt.zip, for Windows, includes source codes

Dhry1NNT.exe   - Dhrystone 1 Benchmark Non-Optimised 
Dhry1ONT.exe   - Dhrystone 1 Benchmark Optimised 
Dhry2NNT.exe   - Dhrystone 2 Benchmark Non-Optimised 
Dhry2ONT.exe   - Dhrystone 2 Benchmark Optimised 
LinpCNNT.exe   - Linpack Benchmark Non-Optimised
LinpCONT.exe   - Linpack Benchmark Optimised
LiveCNNT.exe   - Livermore Loops Benchmark Non-Optimised
LiveCONT.exe   - Livermore Loops Benchmark Optimised
WhetCNNT.exe   - Whetstone Benchmark Non-Optimised Single Precision
WhetCONT.exe   - Whetstone Benchmark Optimised Single Precision
WhetDCNT.exe   - Whetstone Benchmark Optimised Double Precision

WhetsMS6.exe   - Whetstone Benchmark Optimised Single Precision via MS6 C
whetvb4.exe    - Whetstone Benchmark compiled by Visual Basic 4

Win64.zip, later 32 bit and 64 bit source codes in newsource.zip     

32 bit           64 bit             64b large integers

dhry132.exe      dhry164int32.exe    dhry164int64.exe 
dhry232.exe      dhry264int32.exe    dhry264int64.exe
linp32SSE2.exe   linpack64.exe
lloops32SSE2.exe lloops64.exe
whets32SSE.exe   

classic_benchmarks.tar.gz, for Linux, includes source codes

32 bit             64 bit

dhrystone1_NoOpt   dhrystone1_64_NoOpt
dhrystone1         dhrystone1_64
dhrystone2_NoOpt   dhrystone2_64_NoOpt   
dhrystone2         dhrystone2_64
linpack_NoOpt      linpack_64_NoOpt
linpack            linpack_64
lloops_NoOpt       lloops_64_NoOpt
lloops             lloops_64
whetstone_NoOpt    whetstone_64_NoOpt
whetstone          whetstone_64

Go Back



Livermore Kernels (Livermore Loops)

This supercomputer benchmark was first introduced in 1970, initially comprising 14 kernels of numerical application, written in Fortran. The number of kernels was increased to 24 in the 1980's. Performance measurements are in terms of Millions of Floating Point Operations Per Second or MFLOPS. The program also checks the results for computational accuracy. One main aim was to avoid producing single number performance comparisons, the 24 kernels being executed three times at different Do-loop spans to produce short, medium and long vector performance measurements. If overall averages are quoted, the benchmark reference below indicates that the geometric mean may be interpreted as a characteristic rate of computation but it would be more realistic to retain the range of statistics in terms of geometric, harmonic and arithmetic means, minimum and maximum. See details in Livermore Loops Results.htm.

The original Fortran Version is available from Netlib, plus the Kernels in C. I used the latter and data generation converted from Fortran to C, to produce a program more suitable to run on PCs. The benchmark ran successfully until gcc 6 was used for Raspberry Pi 3 at 64 bits, when a few minor changes were required for that platform.

On screen Log


L.L.N.L. 'C' KERNELS: MFLOPS   P.C.  VERSION 4.0

Optimisation  Optimised

Calculating outer loop overhead
      1000 times   0.00 seconds
     10000 times   0.00 seconds
    100000 times   0.00 seconds
   1000000 times   0.01 seconds
  10000000 times   0.05 seconds
  20000000 times   0.07 seconds
  40000000 times   0.15 seconds
  80000000 times   0.29 seconds
Overhead for each loop  3.6829e-009 seconds

Calibrating part 2 of 3 (There are three displays similar to the following)

Loop count          8  0.01 seconds
Loop count         32  0.04 seconds
Loop count        128  0.16 seconds

Loops  200 x  2 x Passes

Kernel       Floating Pt ops
No  Passes E No    Total      Secs.  MFLOPS Span     Checksums          OK
------------ -- ------------- ----- ------- ---- ---------------------- --
 1  40 x 534  5 4.314720e+009  1.00 4320.00  101 5.253344778937972e+002 16
 2  40 x 553  4 3.433024e+009  1.00 3435.39  101 1.539721811668384e+003 15
 3  53 x 171  2 7.322904e+008  1.02  716.27  101 1.009741436578952e+000 16
 4  70 x 475  2 1.596000e+009  1.00 1593.73  101 5.999250595473891e-001 16
 5  55 x 128  2 5.632000e+008  1.00  565.94  101 4.589031939600982e+001 16
 6   7 x 246  2 6.612480e+008  0.95  695.38   32 8.631675645333210e+001 16
 7  22 x 361 16 5.133709e+009  0.95 5407.69  101 6.345586315784055e+002 16
 8   6 x 287 36 4.909766e+009  0.95 5195.20  100 1.501268005625795e+005 15
 9  21 x 355 17 5.120094e+009  0.93 5494.16  101 1.189443609974981e+005 16
10  19 x  50  9 3.454200e+008  0.95  362.60  101 7.310369784325296e+004 16
11  64 x 168  1 4.300800e+008  0.95  453.49  101 3.433560407475758e+004 16
12  68 x 554  1 1.506880e+009  0.95 1592.06  100 7.127569130821465e-006 16
13  41 x  23  7 8.449280e+007  0.96   87.93   32 9.816387810944356e+010 15
14  10 x  49 11 2.177560e+008  0.95  229.34  101 3.039983465145392e+007 15
15   1 x 118 33 7.788000e+008  0.95  820.44  101 3.943816690352044e+004 15
16  27 x 458 10 1.384992e+009  0.95 1458.55   40 6.480410000000000e+005 16
17  20 x  82  9 5.963040e+008  0.96  624.40  101 1.114641772902486e+003 16
18   1 x 371 44 3.232152e+009  0.95 3403.80  100 1.015727037502299e+005 15
19  23 x 112  6 6.244224e+008  0.95  654.34  101 5.421816960147207e+002 16
20   8 x  72 26 5.990400e+008  0.96  626.43  100 3.126205178815432e+004 16
21   1 x  72  2 1.800000e+009  0.95 1891.86   50 7.824524877232093e+007 16
22   7 x 106 17 5.096056e+008  0.96  533.42  101 2.938604376566698e+002 15
23   5 x 134 11 1.459260e+009  0.95 1536.42  100 3.549900501563624e+004 15
24  31 x 661  1 8.196400e+008  0.91  901.82  101 5.000000000000000e+001 16

         Overall     Maximum   Rate 5670.13
                     Average   Rate 1761.87
                     Geometric Mean 1080.28
                     Harmonic  Mean  608.53
                     Minimum   Rate   81.48

Livermore Reference

F.H. McMahon, The Livermore Fortran Kernels: A Computer Test Of The Numerical Performance Range, Lawrence Livermore National Laboratory, Livermore, California, UCRL-53745, December 1986.

Go Back



Whetstone Benchmark

The Whetstone benchmark was written by Harold Curnow of CCTA, the British government computer procurement agency, based on work by Brian Wichmann of the National Physical Laboratory. An Algol version of the benchmark was released in November 1972 and Fortran single and double precision varieties in April 1973. The Fortran codes became the first general purpose benchmarks that set industry standards of performance.

The benchmark produced speed ratings in terms of Thousands of Whetstone Instructions Per Second (KWIPS). In 1978, self timing versions (by Roy Longbottom also of CCTA) produced speed ratings in MOPS (Millions of Operations Per Second) and MFLOPS (Floating Point) and overall rating in MWIPS

Whetstone benchmark source code can also be downloaded in programming languages Fortran, C, Basic, Java, Visual Basic, Excel Spreadsheet Visual Basic. These are all of the same sort of format with self timing for PCs. Performance from these is included in the benchmark results file. Please excuse all code looking like Fortran.

For results on PCs see the following Whetstone Benchmark History and Results and Whetstone Benchmark Detailed Results On PCs, which includes performance ratings of computers produced in the 1960's to today's PCs. At the end of the latter there are tables demonstrating efficiency of the different PC interpreters and compilers, from up 12 benchmarks, starting with an 8088/8087 CPU, up to a multithreaded Core i7. The first table provides overall MWIPS ratings and the second shows efficiency, based on %MWIPS per MHz, the latter varying between 0.4 and 1000 for a Core i7 based PC.

With the original benchmark producing a single performance measure (and using manual timing), results were subject to manipulation. As is usual, compilers were arranged to produced the most efficient code, but some computer manufacturers arranged for unrealistic over optimisation. Loop N6 (see below) used to be the dominant procedure, by far, and was most affected. It should never produce faster speeds than N1 or N2. Hence, the timing modification were introduced to identify irregular behaviour. N5 and N8 function tests now tend to consume the most time. Particularly with the availability of more registers with 64 bit working, N3, N4 and N7 can be subject to over optimisation, but this does not affect the overall performance rating much.

On screen Log


 Single Precision C/C++ Whetstone Benchmark Optimised

Calibrate
       0.01 Seconds          1   Passes (x 100)
       0.03 Seconds          5   Passes (x 100)
       0.08 Seconds         25   Passes (x 100)
       0.41 Seconds        125   Passes (x 100)

Use 3048  passes (x 100)

 Single Precision C/C++ Whetstone Benchmark Optimised


Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12475025653839100       872.850              0.067
N2 floating point     -1.12274754047393800       813.164              0.504
N3 if then else        1.00000000000000000                 697.500    0.452
N4 fixed point        12.00000000000000000                3934.509    0.244
N5 sin,cos etc.        0.49904659390449520                  81.161    3.125
N6 floating point      0.99999988079071040       517.339              3.178
N7 assignments         3.00000000000000000                 898.894    0.627
N8 exp,sqrt etc.       0.75110864639282230                  52.437    2.162

MWIPS                                           2942.474             10.359

Whetstone Reference

H J Curnow and B A Wichmann, "A Synthetic Benchmark", Computer Journal Vol 19, No 1 1976

Go Back



Linpack Benchmark

This benchmark was produced by Jack Dongarra from the "LINPACK" package of linear algebra routines. Following initial release in 1979, it became the primary benchmark for scientific applications from the mid 1980's with a slant towards supercomputer performance.

The original version was produced in Fortran but a "C" version appeared later. The standard "C" version operates on 100x100 matrices in double precision with rolled/unrolled and single/double precision options. The pre-compiled versions are double precision, rolled. Other versions are available with different sizes of matrices.

Performance rating is in terms of MFLOPS. For original results on PCs see Linpack Results.htm

This program's arrangement of measuring and excluding overheads, and running with variation of array leading dimensions, were in the original Linpack Benchmark. My variation, linpack-pc.c, for PCs was accepted by Netlib and can be downloaded from here. Note that clicking on the link might produce details without appropriate line feeds. Using Windows, you might have to download and open with WordPad or suitable language editor.

On screen Log


Unrolled Double Precision Linpack Benchmark - PC Version in 'C/C++'

Optimisation Optimised

norm resid      resid           machep         x[0]-1          x[n-1]-1
   0.4   7.41628980e-014  1.00000000e-015 -1.49880108e-014 -1.89848137e-014

Times are reported for matrices of order          100
1 pass times for array with leading dimension of  201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00047    0.00002    0.00049    1402.78     0.0014     0.0087

Calculating matgen overhead
        10 times   0.00 seconds
       100 times   0.02 seconds
      1000 times   0.10 seconds
     10000 times   0.95 seconds
     20000 times   1.87 seconds
Overhead for 1 matgen      0.00009 seconds

Calculating matgen/dgefa passes for 1 seconds
        10 times   0.00 seconds
       100 times   0.04 seconds
      1000 times   0.35 seconds
      2000 times   0.71 seconds
      4000 times   1.42 seconds
Passes used       2823

Times for array with leading dimension of 201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00026    0.00001    0.00027    2545.49     0.0008     0.0048
    0.00026    0.00001    0.00027    2545.55     0.0008     0.0048
    0.00026    0.00001    0.00027    2547.66     0.0008     0.0048
    0.00027    0.00001    0.00028    2457.21     0.0008     0.0050
    0.00026    0.00001    0.00027    2545.73     0.0008     0.0048
Average                              2528.33

Calculating matgen2 overhead
Overhead for 1 matgen      0.00009 seconds

Times for array with leading dimension of 200

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00026    0.00001    0.00027    2541.88     0.0008     0.0048
    0.00026    0.00001    0.00027    2543.99     0.0008     0.0048
    0.00026    0.00001    0.00027    2540.51     0.0008     0.0048
    0.00026    0.00001    0.00027    2539.97     0.0008     0.0048
    0.00026    0.00001    0.00027    2541.71     0.0008     0.0048
Average                              2541.61

Unrolled Double  Precision     2528.33 Mflops

Linpack Reference

Jack Dongarra, Performance of Various Computers Using Standard Linear Algebra Software in a Fortran Environment (includes numerous results) and can be downloaded from Here.

Go Back



Dhrystone Benchmarks

The Dhrystone "C" benchmark, a sort of Whetstone without floating point, became the key standard benchmark, from 1984, with the growth of Unix systems. The first version was produced by Reinhold P. Weicker in ADA and translated to "C" by Rick Richardson.

Two versions are available Dhrystone versions 1.1 and 2.1. The second version was produced to avoid over-optimization problems encountered with version 1. Although it is recommended that advanced optimization levels should be avoided with the latter, it is clear from published results that the recommendation is usually ignored. The default option in the Watcom compiler produces high levels of optimization and omits some constant calculations from the timing loop. Version 2 is compiled from three source files.

Original versions of the benchmark gave performance ratings in terms of Dhrystones per second. This was later changed to VAX MIPS by dividing Dhrystones per second by 1757, the DEC VAX 11/780 result.

For early results see Dhrystone Results.htm. My versions are identical to the original, except for the calibration, timing and results checking modifications.

On screen Log


Dhrystone Benchmark, Version 2.1 (Language: C or C++)

Optimisation    Optimised
Register option not selected

       10000 runs   0.00 seconds
      100000 runs   0.01 seconds
     1000000 runs   0.05 seconds
     2000000 runs   0.10 seconds
     4000000 runs   0.20 seconds
     8000000 runs   0.40 seconds
    16000000 runs   0.80 seconds
    32000000 runs   1.59 seconds
    64000000 runs   3.25 seconds

Final values (* implementation-dependent):

Int_Glob:      O.K.  5  Bool_Glob:     O.K.  1
Ch_1_Glob:     O.K.  A  Ch_2_Glob:     O.K.  B
Arr_1_Glob[8]: O.K.  7  Arr_2_Glob8/7: O.K.    64000010
Ptr_Glob->              Ptr_Comp:       *    4320480
  Discr:       O.K.  0  Enum_Comp:     O.K.  2
  Int_Comp:    O.K.  17 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->         Ptr_Comp:       *    4320480 same as above
  Discr:       O.K.  0  Enum_Comp:     O.K.  1
  Int_Comp:    O.K.  18 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:     O.K.  5  Int_2_Loc:     O.K.  13
Int_3_Loc:     O.K.  7  Enum_Loc:      O.K.  1
Str_1_Loc:                             O.K.  DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:                             O.K.  DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone:         0.05
Dhrystones per Second:                        19695907
VAX  MIPS rating =                              11209.96

Dhrystone Reference

Reinhold P. Weicker, CACM Vol 27, No 10, 10/84,pg.1013, plus results Dhrystone 1 Dhrystones per second and Dhrystone 1 and 2 VAX MIPS.

Go Back