Quad Core Burn-In/Performance Tests Windows 7
Contents
This page was set up as 770 pixels wide and accommodates preformatted text <PRE> results tables. Some browsers produce monospaced font of an unexpected size but this might be adjustable via browser Preferences.
Summary
Tests were run on a new PC, with a 3 GHz Phenom II processor, a GeForce GTS 250 graphics card and Widows 7, to measure system temperatures and multiprocessor performance. Temperature measurement available were CPU case, CPU core, motherboard and graphics processor. A slightly up market copper heatpipe CPU cooler was fitted, and this might explain why the CPU case temperature, normally at room temperature + 13°C when idling, only increased by a maximum of 14°C.
Initially, the processor ran at 800 MHz and the first 2°C rise was almost instantaneous with the CPU switching to full speed. Maximum temperature was often reached within 5 minutes.
Reported CPU core temperatures were lower than those for the case but increases under load were higher. It then became apparent that the reported core temperatures are not real but used purely for thermal management purposes, with a maximum value of 70°C.
IntBurn64 - this carries out 64 bit integer arithmetic with high speed data from caches or RAM. Four different sizes were used for data in dedicated L1 and L2 caches, shared L3 cache and RAM, to use one then four processors. With dedicated caches, average throughput gains in using four CPUs was 3.94.
The L1 cache test produced the highest speeds at 17,914 Millions of Instructions Per Second (MIPS) but maximum case/core temperature increases of 14/18°C were produced by the L2 cache test, at up to 12,613 MIPS.
For the L3 cache tests, each processor requested 1 MB of the 6 MB shared space. Resulting gains on using four CPUs averaged 3.5 times. Temperatures were similar to those for L1 cache tests.
RAM speed with one CPU was less than a third of possible maximum data transfer rate but achieved 68%, or 14.5 GB/second, using four processors.
SSEBurn64 - this uses SSE or SSE2 floating point instructions and has CPU only and L1 cache tests, with resultant speeds measured in Millions of FLOating Point instructions per Second (MFLOPS). Maximum 32 bit SSE MFLOPS for a 3 GHz processor is 12,000, or possibly 24,000 with linked add and multiply instructions. Maximum 64 bit SSE2 MFLOPS are half those for SSE. Memory size has to be specified for other tests that stress caches or RAM, with results calculated as MB/second. Performance gains, using four processors, were similar to the integer tests.
The SSE CPU test, using registers instead of cache, ran at nearly 12,000 MFLOPS on each CPU but temperature increases were the lowest.
The SSE Cache test generated 16,400 MFLOPS per CPU and produced the same high temperatures as those with the integer tests.
Memory Tests - L2 cache tests gave rise to the highest temperatures of this group, with a data transfer speed half that using L1 cache.
Livermore Loops - This was the original key benchmark for supercomputers, having 24 kernels of numerical applications with speeds calculated in MFLOPS. The benchmark was known to produced errors on an earlier overclocked processor. Here, maximum temperatures were moderately high, with four CPU each running at up to nearly 3900 MFLOPS (i387 instructions not SSE type).
VideoD3D9 - This is a DirectX 9 benchmark where any one of the 8 tests can be run as a burn-in program at a specified window size. Speeds are recorded in Frames Per Second (FPS). The tests were run individually to find the hottest one. This was then run at the same time as three SSE Cache tests. Recorded temperature increases were just about the highest - Case +14 to 48°C, Core +19 to 45°C, Board +3 to 36°C and Graphic Processor +26 to 69°C. SSE MFLOPS were degraded somewhat, as the graphics test uses the equivalent of 120% CPU utilisation.
Other Tests - The graphics/SSE test was repeated using integrated Radeon HD 4200 graphics on the Asus motherboard. The results were much lower temperatures and FPS but higher MFLOPS. CUDA provides nVidia programming extensions to use a GPU for general purpose computing. The test using these ran at 159,600 MFLOPS with temperature of the GPU changing by +30 to 73°C. OpenMP functions enable shared data calculations over available CPUs. The MS compiler is not efficient in translating to SSE instructions, only achieving 3700 MFLOPS on one CPU and 14,446 MFLOPS using four, with associated low temperature increases.
A game benchmark was run a number of times and this increased GPU temperature by 29°C. It used more than three CPUs but heating effects were not high.
To Start
Reliability or Burn-in Benchmarks
This report provides results of a series of performance and reliability tests on a 3.0 GHz Phenom II X4 CPU using 64-Bit Windows 7. The hardware comprises a 4 processor Phenom II 945 on an Asus M4A785TD-V motherboard, with 8 GB DCDDR3 RAM, a WD 5400 RPM Green SATA disk and a GeForce GTS 250 graphics card, plus Radeon HD 4200 on the motherboard. The processor has a Titan TTC-NK34TZ "Super Quiet 22dBA Triple Copper heatpipe CPU Cooler".
Reliability/Burn-in programs used were mainly compiled for 64 bit working, including tests using integer instructions, DirectX 3D graphics functions and floating point arithmetic calculations. CPU, motherboard and graphics processor temperatures were recorded as each test was run. Besides being used for stress testing, the programs provide useful performance information of multi-core processors. These results are saved in text log files, examples being given below. The compiled programs, source code, descriptions and performance results on numerous systems can be obtained via the links given below.
To Start
System Information
All my latest benchmarks and test programs include the following, where Windows NT Version 6.1, build 7600 indicates Windows 7. Memory shown could be 8192 MB but, with integrated graphics enabled, this is reduced by 256 MB.
Data cache sizes for this processor are L1 64 KB dedicated, L2 512 KB dedicated, L3 6 MB shared.
Hardware Information
CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00100F42
AMD Phenom(tm) II X4 945 Processor Measured 3013 MHz
Has MMX, Has SSE, Has SSE2, Has SSE3, Has 3DNow,
Windows Information
AMD64 processor architecture, 4 CPUs
Windows NT Version 6.1, build 7600,
Memory 7936 MB, Free 6403 MB
User Virtual Space 8388608 MB, Free 8388561 MB
|
To Start
Temperature Monitor
During the tests, CPU case and core temperatures were measured using
CPUID Hardware Monitor and, when applicable, confirmed using Asus Probe II. The accuracy of these measurements is unknown and the temperatures and increases seem particularly low. Below is an example of maximum monitored temperatures using 4 CPUs, with room air at 21°C. The core temperatures are very low when the CPUs are idle, but suddenly increase significantly when the tests are started. Core voltage also increases from 0.93V to 1.3V.
CPUZ indicated that that CPU was running at 800 MHz when idle, increasing to 3.0 GHz.
Core/Case temperatures with the CPU running at 3 GHz, but idling, were typically around 4°C/ 2°C higher than at the lower MHz. All measurements produced maximum temperatures on the same sort of time scale and Prime95 showed similar maximum temperature.
Turning off Cool’n’Quiet in BIOS increases CPU idling speed to 3 GHz and, at the same room temperature, case and core temperatures to 36°C and 30°C respectively. These temperatures were also confirmed after installing
AMD’s OverDrive utility.
It was then found that there is only one Core measurement that is known as
Tctl Processor Temperature Control Value.
It does not represent an actual temperature but is a relative reading that can be used for thermal management. Maximum value is 70°C.
|
|
CPUID Hardware Monitor
Value Min Max
CPU VCore 0.94 V 0.93 V 1.30 V
Temperatures
CPU case 35 °C 34 °C 47 °C
Motherboard 33 °C 33 °C 34 °C
Fans
CPU RPM 2732 2721 2755
Fan 2 RPM 1280 1273 1293
Temperatures
Core #0 26 °C 25 °C 43 °C
Core #1 26 °C 25 °C 43 °C
Core #2 26 °C 25 °C 43 °C
Core #3 26 °C 25 °C 43 °C
|
To Start
IntBurn64 Reliability Test
This program uses assembly code and eight 64 bit integer registers (r8 to r15) that are not used with 32 bit code. It has twelve tests, adding and subtracting different data patterns. The first six tests alternately write and read data and the others are read only. The results are checked for correct calculations. Drop down lists are provided to select memory size used, between 4 KB and RAM size minus 64 MB, to test using data in L1 cache, L2 cache or RAM. Another list allows running time to be selected between 1 and 1000 seconds per test. An example of the log file is shown below.
For testing multiple processors, the program can also be run via commands in a BAT file that also has options to display the run time window at different screen positions and to save results in different log files. Example commands to use four CPUs are included below.
Test 4 KB at 5 seconds per test, Start at Wed Nov 25 13:49:00 2009
Write/Read
1 17857 MB/sec Pattern 0000000000000000 Result OK 10898787 passes
2 18014 MB/sec Pattern 0000000000000000 Result OK 10994602 passes
3 18027 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 11002711 passes
4 17687 MB/sec Pattern AAAAAAAAAAAAAAAA Result OK 10795160 passes
5 17659 MB/sec Pattern CCCCCCCCCCCCCCCC Result OK 10778002 passes
6 18089 MB/sec Pattern 0F0F0F0F0F0F0F0F Result OK 11040616 passes
Max 3816 64 bit MIPS
Read
1 30453 MB/sec Pattern 0000000000000000 Result OK 37174500 passes
2 30445 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 37164100 passes
3 30425 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 37139800 passes
4 30443 MB/sec Pattern 5555555555555555 Result OK 37161800 passes
5 30418 MB/sec Pattern 3333333333333333 Result OK 37131500 passes
6 30455 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 37177000 passes
Max 4521 64 bit MIPS
Example commands to test four CPUs:
Start IntBurn64 Auto, KB 4, Secs 25, P1, Log qL11.txt
Start IntBurn64 Auto, KB 4, Secs 25, P2, Log qLl2.txt
Start IntBurn64 Auto, KB 4, Secs 25, P3, Log qLl3.txt
Start IntBurn64 Auto, KB 4, Secs 25, P4, Log qLl4.txt
|
To Start
IntBurn64 MP Performance and Temperatures
The following show performance levels in MBytes per second and temperature increases, running four copies of IntBurn64 at the same time, plus the speed using one CPU.
The total data transmission rates are calculated, performance gains using four processors and Millions of Instructions Per Second (MIPS) from known instruction counts. Optional data size parameters used were 4 KB, 256 KB, 1 MB and 64 MB.
The tests were run for 5 minutes, the one producing the highest temperature gains repeated for 10 minutes.
Multi-processor performance, using data in L1 and L2 caches, is as good as might be expected and better than expectations, sharing L3 cache space. The highest data throughput from RAM requires demands from more than one processor.
Temperature
Test 1 CPU Copy1 Copy2 Copy3 Copy4 Total Gain Total Case Core MBrd
MB/s MB/s MB/s MB/s MB/s MB/s MIPS 34°C 26°C 33°C
L1 Cache
Write/Read 17889 17482 16402 17643 17543 69070 3.86 14571 +12 +16 +1
Read Only 30440 30181 29963 30200 30329 120673 3.96 17914
L2 Cache - 10 minutes
Write/Read 14744 14722 14563 14694 14470 58449 3.96 12330 +14 +18 +1
Read Only 21426 21394 21205 21377 21007 84983 3.97 12613
L3 Cache
Write/Read 9464 8798 8812 8712 8699 35021 3.70 7389 +14 +16 +2
Read Only 11364 9289 9310 9392 9227 37218 3.28 5526
RAM
Write/Read 5280 2250 2242 2235 2236 8963 1.70 1891 +9 +12 0
Read Only 6228 3632 3609 3630 3632 14503 2.33 2153
Adjust for idle temperature at maximum core volts and CPU GHz -2 -4
Typical room air 21°C, case 34°C and core 26°C idling at low GHz
Four processors each executing one instruction per cycle would produce 12000 MIPS
|
To Start
SSEBurn64
SSEBurn64 uses SSE or SSE2 Single Instruction Multiple Data (SIMD) floating point instructions to soak test the CPU, Cache or RAM at high speeds whilst checking results for correct values. SSE and SSE2 Run buttons are provided for separate CPU, Cache and RAM tests. The program produces 1024 random floating point numbers used in all tests. For the CPU test, 32 add or multiply instructions manipulate a few at a time from registers within a loop. The Cache test uses the same 32 instructions but with data from L1 cache within the main loop. The RAM test is biased towards fast data transfer and can also use cache sized data. Every fifth pass the memory is filled with 16 or 32 of the random numbers with the first set being read and checked for correctness. The main loop uses 8 load/add and 8 load/subtract instructions to produce a sum check of zero. CPU and Cache tests check that results are the same as the first pass which also calibrates the testing loops to run for up to one second (on a fast CPU).
Drop down lists are provided to select running time (1 minute to 24 hours), and memory size used, between 4 KB and 4 MB for Cache tests and 4 KB to 8192 MB for RAM tests. The former is a L1 cache test, using part of the data in turn. The latter is for testing using data in L1 cache, L2 cache or RAM. Speed of CPU and Cache tests is measured in Millions of Floating Point Operations Per Second (MFLOPS) with results for the RAM test in MBytes/second. An example of the log file is shown below.
#########################################################################
SSE and SSE2 Reliability Test Version 1.0 for 64 bit OS
SSE CPU Test at 5 minutes, Start at Sat Nov 28 14:26:25 2009
1.01 Minutes at 12020 MFLOPS, No Errors
2.01 Minutes at 12022 MFLOPS, No Errors
3.00 Minutes at 12022 MFLOPS, No Errors
4.01 Minutes at 12023 MFLOPS, No Errors
5.01 Minutes at 12023 MFLOPS, No Errors
Reliability Test Ended Sat Nov 28 14:31:25 2009
SSE Cache Test at 5 minutes and 4 KB, Start at Sat Nov 28 14:50:34 2009
1.01 Minutes at 16792 MFLOPS, No Errors
SSE Memory Test at 5 minutes and 32 KB, Start at Sat Nov 28 15:06:29 2009
Pass 1 write & read 0.0328 MB, 0.0655 Total MB in 0.00018555 Seconds = 353 MB/Sec
Pass 2 read only 0.0328 MB, 0.0328 Total MB in 0.00000408 Seconds = 8035 MB/Sec
1.00 Minutes at 47465 MB/Sec, No Errors
Example commands to test four CPUs:
Start SSEBurn64 SSE, CPU, Mins 5, auto, P1, Log TCP1.txt
Start SSEBurn64 SSE, CPU, Mins 5, auto, P2, Log TCP2.txt
Start SSEBurn64 SSE, CPU, Mins 5, auto, P3, Log TCP3.txt
Start SSEBurn64 SSE, CPU, Mins 5, auto, P4, Log TCP4.txt
Other example commands:
Start SSEBurn64 SSE2, CPU, Mins 5, auto, P1, Log Tests1.txt
Start SSEBurn64 SSE, Cache, KB 4, Mins 10, auto, P2, Log Tstx4.txt
Start SSEBurn64 SSE, RAM, KB 65536, Mins 5, auto, P3, Log Testrams2.txt
Start SSEBurn64 SSE, RAM, KB 1024, Mins 5, auto, P4, Log TestramL34.txt
|
To Start
SSEBurn64 MP Performance and Temperatures
Following are performance details in MFLOPS and MBytes per second, along with temperature increases, running four copies of SSEBurn64 at the same time, plus the speeds using one CPU. Total MFLOPS or MB/second are shown for four processors and performance gain ratios. Tests indicating highest temperature increases were again run for 10 minutes.
Temperature increases and multi-processor performance gains are similar to IntBurn64 above.
Maximum SSE speed in MFLOPS, executing such as add or multiply, is four times CPU MHz, 4 x 3000 or 12000 MFLOPS in this case. The cache tests use multiply followed by add, demonstrating more that 16000 MFLOPS per CPU. Maximum SSE2 64 bit floating point speeds are half those using 32 bit SSE instructions.
Temperature
Test 1 CPU Copy1 Copy2 Copy3 Copy4 Total Gain Case Core Mbrd
Mflps Mflps Mflps Mflps Mflps Mflps 34°C 26°C 33°C
CPU 12022 11931 11901 11867 12007 47706 3.97 +7 +11 +1
Cache SSE 10m 16802 16478 16466 16381 16410 65735 3.91 +13 +18 +1
Cache SSE2 10m 8258 8090 8107 8166 8103 32465 3.93 +14 +18 +1
MB/s MB/s MB/s MB/s MB/s MB/s
L1 32KB 47484 47109 46907 47553 47222 188791 3.98 +11 +15 +1
L2 256KB 10m 23919 23577 23907 23807 23690 94980 3.97 +13 +17 +1
L3 1024KB 11250 9171 9225 9264 9224 36884 3.28 +11 +14 +0
RAM 64MB 7041 3708 3796 3756 3730 14990 2.13 +10 +14 +1
Adjust for idle temperature at maximum core volts and CPU GHz -2 -4
Typical room air 21°C, case 34°C and core 26°C idling at low GHz
|
To Start
Livermore Loops
This was the original key benchmark for supercomputers, having 24 kernels of numerical applications with speeds calculated in Millions of Floating Point Operations Per Second or MFLOPS. Overall performance characteristics are identified by geometric, harmonic and arithmetic means, minimum and maximum. The program also checks the results for computational accuracy and this was adopted to check for consistent numeric results for a burn-in test.
Multi-processor tests can be run using the commands shown below with results written to a common log file (that appears to work). The 24 kernels are run three times. So, at 5 seconds per test, total running time should be around six minutes.
The benchmark was converted to a burn-in test as the original was known to produce incorrect numeric results on overclocked Pentium Pro CPUs. In this case, maximum temperatures were not as high as the other tests.
Temperature
Maximum Average Geomean Harmean Minimum Case Core Mbrd
MFLOPS MFLOPS MFLOPS MFLOPS MFLOPS 34°C 26°C 33°C
1 CPU 3883 1070 644 384 64
Copy 1 3866 1064 641 383 64 +11 +16 +1
Copy 2 3832 1059 637 380 63
Copy 3 3838 1062 639 382 64
Copy 4 3878 1066 641 382 64
Expected End Messages
Numeric results were as expected
Commands for four CPUs
Start LiveCONT RunSecs 5
Start LiveCONT RunSecs 5
Start LiveCONT RunSecs 5
Start LiveCONT RunSecs 5
|
To Start
VideoD3D9_64 and SSEBurn64
VideoD3D9 is a DirectX 9 benchmark where any one of the 8 tests can be run as a burn-in test at a specified window size. Speeds are logged in Frames Per Second (FPS) over each minute of the tests. With the CPU burn-in and Direct3D tests in the same folder, a BAT file, with the commands shown below, can run them both at the same time.
Firstly, all DirectX 9 tests were run individually for 5 minutes at 1680 x 1050 pixel monitor setting to identify the hottest test. Perfmon performance monitor was run at the same time to log CPU utilisation, where that recorded is average per CPU.
Next, The Vertex Shader routine and three copies of SSEBurn64 cache tests were run for 10 minutes. The former produced 30% CPU utilisation or 120% of one CPU. This was reflected in multiprocessor performance, where MFLOPS per CPU was reduced from around 16,500 to 13,600.
Maximum CPU case and motherboard temperatures were just about the highest at 48°C and 36°C. Asus Probe II monitor has default maximum settings of 77°C and 60°C but the AMD specification for the former appears to be 71°C. Maximum GeForce GTS 250 temperatures are shown as 105°C, 69°C being recorded for these tests.
The graphics and three CPU test was repeated using the slower motherboard integrated graphics. This did not even lead to a larger increase in board temperature and and only produced low CPU utilisation.
Case Core Board GPU % CPU
FPS 34°C 26°C 33°C 43°C Util
1. Egg Gouraud shading 2425 +6 +8 +2 +23 12
2. Wireframe egg Vsync 60 +2 +5 +2 +5 6
3. Wireframe 500 Cubes 182 +2 +5 +1 +22 13
4. Textured Tunnel 873 +6 +9 +2 +25 12
5. Plain Colour Objects 1331 +7 +10 +1 +24 24
6. Textured Objects 825 +7 +10 +1 +24 22
7. Pixel Shader 773 +6 +9 +1 +23 21
8. Vertex Shader 1044 +8 +11 +2 +25 30
Graphics and 3 x SSE - GeForce GTS 250 1680 x 1050
FPS
and MFLOPS
8. Vertex Shader 1032 +14 +19 +3 +26 95
3 x SSEBurn64 40896
Graphics and 3 x SSE - On board Radeon HD 4200 1280 x 1024
FPS
and MFLOPS
8. Vertex Shader 153 +13 +16 +2 80
3 x SSEBurn64 46801
BAT File Commands
Start VideoD3D9_64 Auto, Width 1680, Height 1050, Test 2, Secs 600, P1
Start SSEBurn64 SSE, Cache, KB 4, Mins 10, auto, P2, Log X2stx2.txt
Start SSEBurn64 SSE, Cache, KB 4, Mins 10, auto, P3, Log X2stx3.txt
Start SSEBurn64 SSE, Cache, KB 4, Mins 10, auto, P4, Log X2stx4.txt
Example Results
SSE Cache Test at 10 minutes and 4 KB, Start at Thu Dec 03 16:36:34 2009
1.01 Minutes at 13668 MFLOPS, No Errors
2.00 Minutes at 13662 MFLOPS, No Errors
DirectX9 D3D Test 64 Bit Version 1.1, Thu Dec 03 16:36:34 2009
Vertex Shader 2.0 at 1680 x 1050 x 32 bits
1028.5 Frames Per Second over 60 seconds
1031.2 Frames Per Second over 60 seconds
|
To Start
Other Tests
CUDA, from nVidia, provides programming functions to use GeForce graphics processors for general purpose computing. These functions are easy to use in executing arithmetic instructions on numerous processing elements simultaneously. As a benchmark, tests are run using different data sizes and increasing numbers floating point calculations per data element, with and without transferring data from/to main processor RAM. The reliability test uses calculate only with 32 calculations per word.
OpenMP is a system independent set of procedures and software that arranges automatic parallel processing of shared memory data when more than one processor is provided. This option is available in the latest Microsoft C++ compilers. Potential performance gains due to hardware SIMD with SSE instructions are not realised due to compiler limitations and this enhances the comparative benefit of CUDA GPU parallel processing. The benchmark executes the same range of functions, using the same data sizes, as the CUDA benchmark, but only with data in and out.
OpenMP tests at least show that a quad processor can achieve up to a near four times performance gain over a single CPU, but the relative slow speed leads to low temperature increases. CPU temperatures are even lower with the CUDA test, with processor utilisation equivalent to 100% of one CPU. On the other hand, that huge MFLOPS speed gives rise to the highest graphics processor temperature. For further details of these programs, see references below.
Game - Grand Theft Auto IV built-in benchmark was run (only running for 4 to 5 minutes in 10). This fully utilised more than 3 CPUs, not particularly generating a lot of CPU heat but giving rise to a 29°C increase on the GPU. A fast game player might have more impact.
Case Core Board GPU % CPU
MFLOPS 34°C 26°C 33°C 43°C Util
CUDA Graphics Processing 159600 +6 +11 +2 +30 25
OpenMP 4 CPUs maximum 14446 +9 +14 +1 +2
CPU Utilisation average 88
OpenMP 1 CPU maximum 3700
Grand Theft Auto IV FPS 47 +10 +14 +3 +29 86
|
To Start
Reference Files
SSEBurn64 and IntBurn64 - Benchmark and source code in More64Bit.zip - Further results and description in BurnIn64.htm
Livermore Loops - Benchmark and source code in Benchnt.zip - Results and description in Livermore Loops Results.htm
VideoD3D9_64 - Benchmark and source code in Video64.zip - Further results and description in 64 Bit Graphics Tests.htm and Direct3D Results.htm
CUDA - Benchmark and source code in CudaMFLOPS.zip - Further results and description in Cuda1.htm
OpenMP - Benchmark and source code in OpenMPMFLOPS.zip - Further results and description in OpenMP MFLOPS.htm
Other - Burnin32.htm, Burnin64.htm, Vista64.htm, Win64.htm
To Start
Roy Longbottom December 2009
The new Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|