Raspberry Pi Pico, Pi 4 and Pi 400 Python and C Basic Beginners Bit Banging Benchmarks

Roy Longbottom


Contents

Introduction Initial Python Tests Pi 4 and Pi 400 More Python Tests Pi 400
Initial C Tests Pi 4 and Pi 400 More C Tests Pi 400 Pico Python Tests
Pico C Tests Power Wiring
Code Format and Execution Notes Pi 4 Python and C Code Pico Python and C Code
Pi 4 13 Output Code Extensions Pico 13 Output Code Extensions Pi 4 C Input Monitor
Pi 4 C Monitor Results Pico C CPU Benchmarks Whetstone Benchmark
Dhrystone Benchmark MemSpeed CPU Benchmark


Summary

The Pico is a microcontroller with many advanced options, identified such as DMA, ADC, UART, 12C and PWM. Beginners in this area might be initially interested in exploiting general purpose input/output. This report covers measuring Pico performance driving 1 and 13 destinations (mainly LEDs), comparing it with Raspberry Pi 4/400 results, and further considerations using programs written in C and (new to me) Python.

The four single output programs used (with listings below, and having some completely different formatting arrangements), covered maximum expectation of 5 to 500000 on/off cycles per second, in six steps, with increasing repetition rates, aimed at producing 20 second tests. In most cases, the latter was not achieved, due to unsuitable sleep timers, excessive overheads and slow processing, with more complication running 13 output tests.

In order to identify reasons for the poor performance, all the tests were repeated with sleep timers and no outputs, then with outputs and no sleeps, the latter providing maximum possible cycles per second. These involve equal on and off output times that can be interpreted as two bits per cycle, leading to maximum Bit Banging Mega bits per second (BB Mbps) ratings to be calculated, for this sort of activity.

Maximum Bit Banging Speed - Following is a summary of all test results with no sleeps, with running time dependent on loop control overheads (see Microsecs/loop), with Python code more than 500 times higher than C. Performance in Mbps was little difference between 1 and 13 outputs, the latter tests running for a much longer time. The results show that Pi 400 speed was proportional to CPU MHz, with the Pico speed, using C, equivalent to an 1100 MHz Pi 400. Considering Mbps per MHz indicates that the Pico speed is not dependent much on CPU MHz.

               Bit Banging Mbps                  Microsecs/loop     
               Python           C                Python  C      Time
         MHz   1 out  13 out    1 out  13 out    1 out   1 out  Py/C

Pi 400  1800    0.12    0.12    63.65   67.07       17   0.031   548
Pi 400   600    0.04    0.04    21.21   22.32       50   0.094   532
Pico     125    0.06    0.04    41.67   51.59       31   0.048   646

Sleep Timer Overheads - Following are results of tests using sleep timers with no output. Two were used for Pico C tests, one with 100% accuracy. Alternative timers might be available for other areas, but those used here were subject to average overheads between 70 and 200 microseconds for two sleeps. At lower frequencies, microsecond parameters could often be varied to produce the required cycles per second. The strangest results were from using Pi 400 C, with the overhead applying up to 5000 CPS but much less significant with faster operation.

                               Cycles Per Second                              
             Python                             C                             
Expected     5     50    500   5000 500000      5     50    500   5000  500000

Pi 1800    5.0   49.7    470   3078   7809    5.0   49.7    471   3115  499460
Pi 600     5.0   49.5    453   2646   5629    5.0   49.6    462   2755  235101
Pico       5.0   49.8    484   3738  14771                                    
Pico T1    5.0                                5.0   50.0    500   5000  362069
Pico T2    5.0                                5.0   50.0    500   5000  500000
Pico Output Plus Sleeps - The best C Program indicated the same 100% timing accuracy, with one output, but not quite with 13. Then it was 100% up to 10 microsecond sleeps but obtained 499999.9 CPS at one microsecond. Python still obtained the 5 CPS speed but gradually became less accurate than the above by up to a further 27%.

Output Frequency Checker - A program was produced to validate output on a Pi CPU and appeared to produce accurate measurements up to 500 CPS, but reduced to 499324.08 at 500000 CPS (code provided).

Pico USB Power - Steady voltage was noted, with current reaching 50 mA running the most demanding test.

CPU Benchmarks - I converted my C Whetstone, Dhrystone and MemSpeed Benchmarks to compile and run on the Pico. A zip file containing these can be downloaded. Comparisons with a Pi 4B are provided, where, Pico’s lack of floating point hardware and less efficient integer instructions, lead to Pi 4 relative performance being significantly greater that the 12 times CPU MHz difference. MemSpeed includes floating point and integer calculations, where the Pico achieved the slow data transfers no higher than around 100 MBytes per second, but this equates to bit banging performance of near 800 Megabits per second, much larger than possible with my simple tests.

Introduction Next or Go To Start


Introduction

As a member of the Raspberry Pi Alpha Testing Team, I have been writing and running some simple extended LED flashing type benchmarks, initially on a Pi 4 and Pi 400 then moving on to the Pico. Both Python and Bit Banging (of this form) were new to me, so this represents a first step.

I produced two varieties of Python and C programs to test minimum and near maximum configurations for this type of activity, comprising output to one LED then 13 outputs to 11 LEDs, one pure resistive and one to provide connection to an input on a different device. The latter content was based on measured current consumption being near the maximum Pi 4 GPIO specification.

In order to check the input data sent from the Pico or Pi GPIO, a further C program was written to to run on a Pi 4 or 400. This measures transitions of signals ON to OFF and OFF to ON and, up to a limit, acceptable for confirming data transfers at the correct frequency.

Detailed software installation and running methods for Pico are not provided, nor are detailed wiring diagrams. For the latter, the programs identify physical pin numbers. All these issues are better obtained by studying the vast amount of Raspberry Pi documentation available on line and in book form. Starting ponts for these are raspberry-pi-pico-python-sdk.pdf, raspberry-pi-pico-c-sdk.pdf and Raspberry Pi Forum.

For testing purposes, I ended up with two identical breadboard setups. One has a ribbon cable/plug to connect to Pi 4 or Pi 400 and the other for Pico connections.


Initial Python Tests Pi 4 and Pi 400

The first programs were written in Python, starting with the usual one to generate a single flashing LED, comprising switch LED on, sleep for a while, switch LED off, then sleep again, all repeated for a finite time. The second one was the same but controlling 11 LEDs and two resistive outputs. These being repeated (loops) for a measurable time, with increasing loops and decreasing sleep times, for a constant total of 20 seconds. Starting points were 100000 microseconds for each sleep, reducing to 1, and 100 loops, increasing to 10 million. The programs also calculated ON/OFF Cycles Per Second (CPS), as loops divided by elapsed time. Pi 4 and Pi 400 results are shown below.

The time taken and performance were clearly superior on the Pi 400, not proportional to CPU MHz difference of 1.2 times, but approaching that with 13 outputs. Also, running times became increasingly greater than that possibly anticipated by totals of 20 seconds sleep time. Considering running times with one and 13 outputs, that for the latter is shown to be up to twice as long, indicating neither serial or parallel activity but possibly affected by timer issues.

In order to help to identify performance and timing differences, it was decided to carry out further tests executing output functions without sleep timers and sleep timers with no output. These are considered in the next sections, using Pi 400 at 1800 and 600 MHz (frequency scaling governor powersave setting) and Pico, via both Python and C programmed versions.

    PI 4 1500 MHz

    Python One Output + Sleep                      Python 13 Outputs + Sleep

     Loops microsecs   runsecs  cycles/sec         Loops microsecs   runsecs  cycles/sec

       100    100000    20.028         5.0           100    100000    20.054         5.0
      1000     10000    20.176        49.6          1000     10000    20.416        49.0
     10000      1000    21.612       462.7         10000      1000    24.066       415.5
    100000       100    36.071      2772.3        100000       100    59.988      1667.0
   1000000        10   178.132      5613.8       1000000        10   417.749      2393.8
  10000000         1  1618.362      6179.1      10000000         1  4012.817      2492.0

    Pi 400 1800 MHz

    Python One Output + Sleep                      Python 13 Outputs + Sleep

     Loops microsecs   runsecs  cycles/sec         Loops microsecs   runsecs  cycles/sec

       100    100000    20.027         5.0           100    100000    20.047         5.0
      1000     10000    20.152        49.6          1000     10000    20.345        49.2
     10000      1000    21.514       464.8         10000      1000    23.451       426.4
    100000       100    34.834      2870.8        100000       100    54.019      1851.2
   1000000        10   168.419      5937.6       1000000        10   358.707      2787.8
  10000000         1  1520.876      6575.2      10000000         1  3428.312      2916.9


  Pi 400 / Pi 4 excluding 20 seconds sleep time

  10000000         1                  1.06      10000000         1                  1.17
 
More Python Tests Pi 400 Next or Go To Start


More Python Tests Pi 400

No Sleeps - These indicate maximum operational speeds in ON/OFF cycles per second, where the cycles can be interpreted as comprising two bits, and speed in megabits per second. In this case, at around 0.12 Mbps at both one and 13 outputs at 1800 MHz. Again for the latter, the derived microseconds per loop are effectively constant, at around 17 with one output and thirteen times longer for 13 outputs. This appears to confirm that there is no parallel operation and these on plus off switching times per output are unsuitable for high speed bit banging.

Running at 600 MHz, performance at one output was just about three times faster at 1800 MHz, but not quite so with 13 outputs.

Just Sleeps - Accurate sleep timing would lead to the running time of all the tests to be 20 seconds and cycles per second 5, 50, 500, 5000, 50000 and 500000. This is clearly not the case, with the excess time being around 124 microseconds per loop, using Pi 400 at 1800 MHz, appearing to be due to arranging the sleep function call.

Checking Output Frequency - The last example here demonstrates my output checking program, that was never intended to be 100% accurate. This time it was run on a Pi 400, measuring the output from a GPIO pin. It also demonstrates that performance driving a resistive load was virtually the same as controlling a flashing LED. The monitoring program samples the input for 10 seconds. In this case, the samples were taken during the last test, with results varying but similar to performance measured by the test program.


     Loops   micro       run    cycles microsecs         run    cycles     Total microsecs
           seconds   seconds   /second     /Loop     seconds   /second       CPS     /Loop

Python One Output No Sleep                        Python 13 Outputs no Sleep

Pi 400 1800 MHz                                   Pi 400 1800 MHz

       100       0     0.002     57339      20.0       0.023      4352     56575     230.0
      1000       0     0.019     53712      19.0       0.225      4446     57792     225.0
     10000       0     0.179     55887      17.9       2.231      4483     58282     223.1
    100000       0     1.707     58581      17.1      22.315      4481     58257     223.2
   1000000       0    16.901     59169      16.9     220.669      4532     58912     220.7
  10000000       0   167.929     59549      16.8    2209.995      4525     58824     221.0

  10000000 Mbps                   0.12                                      0.12                 

Pi 400 600 MHz                                    Pi 400 600 MHz

       100       0     0.005     19751      50.0       0.064      1566     20358     640.0
      1000       0     0.052     19331      52.0       0.630      1587     20627     630.0
     10000       0     0.512     19550      51.2       6.385      1566     20362     638.5
    100000       0     5.066     19741      50.7      61.508      1626     21135     615.1
   1000000       0    50.371     19853      50.4     616.302      1623     21094     616.3
  10000000       0   500.346     19986      50.0    6160.182      1623     21103     616.0

Max 1800/600                      2.98                                      2.79

Pi 400 1800 MHz                                   PI 400 600 MHz

Python Just  Sleep                                Python Just  Sleep

   Loops   micro       run    cycles      over         run    cycles      over
         seconds   seconds   /second     heads     seconds   /second     heads

     100  100000    20.027       5.0     0.027      20.036       5.0     0.036
    1000   10000    20.128      49.7     0.128      20.209      49.5     0.209
   10000    1000    21.263     470.3     1.263      22.086     452.8     2.086
  100000     100    32.492    3077.7    12.492      37.793    2646.0    17.793
 1000000      10   144.315    6929.3   124.315     193.387    5171.0   173.387
10000000       1  1280.522    7809.3  1260.522    1776.560    5628.9  1756.560

Approximate microsecs/loop                 124                             174
Approximate microsecs/sleep                 62                              87

-------------------------------------------------------------------------------

Python One Output No Sleep to Pi 400 Input  Pi 400 reading cycles from GPIO

     Loops microsecs   runsecs  cycles/sec   Cycles Per Second Last Test

       100         0     0.002     58175.7   60345.38 ON and 60345.38 OFF
      1000         0     0.017     59017.8   60398.88 ON and 60398.78 OFF
     10000         0     0.170     58728.2   59653.38 ON and 59653.38 OFF
    100000         0     1.671     59845.3   60752.08 ON and 60752.08 OFF
   1000000         0    16.599     60245.2   60456.54 ON and 60456.64 OFF
  10000000         0   165.210     60528.9   60625.78 ON and 60625.78 OFF
End
   

Initial C Tests Next or Go To Start


Initial C Tests Pi 4 and Pi 400

These programs were compiled using gcc using WiringPi GPIO access library, where it is recommended that execution should use sudo access, but the programs would only execute on the Pi 400 without sudo.

Results of the programs are reported below. These use delayMicroseconds(number) functions for sleeping. There are various different reports of inaccuracies in using this function. These are apparent here.

Weird Results - A variation of the program identified these weird results, shown below. This was run from a command line using the time function that identifies elapsed time and the part indicated as CPU time in user and system modes. With sleep times of 100 microseconds or greater, CPU utilisation was extremely low, increasing to 100% at 99 microseconds and below. The table also demonstrates that the expected cycles per second might be achievable by reducing the sleep time specified by the program,

Because of the inclusion and variability of sleep times, it is not really possible to accurately compare performance, using different CPUs or MHz. But, at least, it does show improvement on increasing the frequency, particularly on approaching maximum speed, at 1 microsecond sleep time. At this point, the total cycles per second for 13 outputs was 6.49, or 12.98 Mbits per second, near maximum possible.

Python Comparison - The last table, here, indicates that the C program was 75 times faster than the Python version, driving one output pin and 171 times with 13 outputs. However, it should be noted that the two versions could produce the same performance, at lower sleeping times, possibly by varying Python sleep time requests to produce the required cycles per second (as shown using the C based program).


                   One output + sleep                13 Outputs + Sleep
     Loops   micro       run    cycles      over         run    cycles     Total      over
           seconds   seconds   /second     heads     seconds   /second       CPS     heads

 Pi 4 1500 MHz C
       100  100000    20.013       5.0     0.013      20.013       5.0        65     0.013
      1000   10000    20.125      49.7     0.125      20.127      49.7       646     0.127
     10000    1000    21.248     470.6     1.248      21.255     470.5      6117     1.255
    100000     100    32.273    3098.6    12.273      32.327    3093.4     40214    12.327
   1000000      10    20.010   49975.9     0.010      20.299   49264.4    640437     0.299
  10000000       1    20.021  499474.9     0.021      29.996  333383.2   4333982     9.996

 Pi 400 1800 MHz C
       100  100000    20.012       5.0     0.012      20.016       5.0        65     0.016
      1000   10000    20.122      49.7     0.122      20.125      49.7       646     0.125
     10000    1000    21.204     471.6     1.204      21.224     471.2      6126     1.224
    100000     100    31.940    3130.8    11.940      32.009    3124.1     40613    12.009
   1000000      10    20.031   49922.0     0.031      20.008   49980.9    649752     0.008
  10000000       1    20.030  499255.9     0.030      20.021  499475.6   6493183     0.021

  Pi 400 600 MHz C
       100  100000     20.02       5.0     0.020      20.018       5.0        65     0.018
      1000   10000    20.172      49.6     0.172      20.175      49.6       645     0.175
     10000    1000    21.638     462.1     1.638      21.631     462.3      6010     1.631
    100000     100    36.325    2752.9    16.325      36.232    2760.0     35880    16.232
   1000000      10    22.426   44591.6     2.426      23.678   42233.1    549030     3.678
  10000000       1    43.946  227552.0    23.946      55.172  181252.9   2356288    35.172


   Weird Results - Pi 400 1800 MHz, One output + sleep

     Loops microsecs   runsecs  cycles/sec       real        user         sys

      1000      9938    20.000       50.0   0m20.005s    0m0.024s    0m0.000s
     10000      1000    21.222      471.2   0m21.226s    0m0.029s    0m0.172s
     10000       939    19.999      500.0   0m20.003s    0m0.005s    0m0.198s
    100000       100    32.111     3114.2   0m32.115s    0m0.109s    0m1.789s
    100000        99    19.802     5050.0   0m19.806s    0m6.349s   0m13.457s
   1000000        10    20.002    49994.5   0m20.006s    0m5.239s   0m14.767s


  Pi 400 1800 MHz Maximum Speeds 10000000 Loops

                   One output + sleep                13 Outputs + Sleep
  Program    micro       run    cycles      over         run    cycles     Total      over       
           seconds   seconds   /second     heads     seconds   /second       CPS     heads

  Python         1  1520.876    6575.2  1500.876    3428.312    2916.9     37908  3408.352
  C              1    20.030  499255.9     0.030      20.021  499475.6   6493183     0.021

  C / Python Gain                  75                             171
  

More C Tests Pi 400 Next or Go To Start


More C Tests Pi 400

These were run on the Pi 400 at 1800 and 600 MHz, in preparation for comparisons with Pico results.

No Sleeps - The sub-microsecond speeds of these output control operations were shown to be up to around 560 times faster than from the Python versions but, of course, they do not represent arithmetic calculation speeds. Bit banging speed was indicated as up to 67.1 Million bits per second.

Monitor Confirmation - A longer running 13 outputs, no sleep version was compiled to check with monitoring options on the Pi 400, as shown below. Using the time function confirmed the running time and indicated 100% CPU utilisation. Then my input speed monitoring program was run to confirm performance of around 2.58 million cycles per second from the input connection, also indicating 67 million bits per second overall, from 13 outputs.

Pi 400 1800 Versus 600 MHz - confirmed that performance was proportional to CPU MHz.

Sleep Only Tests - As implied by the sub-microsecond output speeds, indicated above, these results were almost identical to those from running the tests with sleeping, with the same weird running times. Speed gains over Python were not as high as the no sleep tests, due to the inclusion of a constant 20 seconds sleeping times.

                   C One Output no Sleep         C 13 Outputs No Sleep

    Loops   micro      run   cycles microsecs        run   cycles    total microsecs
          seconds  seconds  /second     /loop    seconds  /second      CPS     /loop

Pi 400 1800 MHz
      100       0    0.000 20971520                0.000  2452809 31886520
     1000       0    0.000 30840470                0.000  2584291 33595780
    10000       0    0.000 28747800                0.004  2568151 33385960     0.400
   100000       0    0.003 31581236     0.030      0.039  2547669 33119697     0.390
  1000000       0    0.031 31863136     0.031      0.388  2578077 33514998     0.388
 10000000       0    0.314 31823078     0.031      3.876  2579794 33537325     0.388

C      10M Mbps                63.6                                   67.1
Python 10M Mbps                0.12                                   0.12
C/Python                        530                                    559

Pi 400 600 MHz
      100       0    0.000  7231559                0.000   830555 10797218
     1000       0    0.000 10618491                0.001   858257 11157346
    10000       0    0.001 11087243                0.012   846308 11002008     1.200
   100000       0    0.009 10648415     0.090      0.117   855015 11115200     1.170
  1000000       0    0.097 10357508     0.097      1.165   858464 11160027     1.165
 10000000       0    0.943 10606199     0.094     11.650   858359 11158668     1.165

C      10M  Mbps               21.2                                   22.3
Python 10M  Mbps              0.040                                  0.042
C/Python                        530                                    531

 ----------------------------------------------------------------------------------
 No Sleep Time Monitoring  13 Outputs No Sleep - around 67 Mbps

     Loops microsecs   runsecs  cycles/sec   Time Results

 100000000         0    38.703  2583751.0  real 0m38.708s  user 0m38.705s sys  0m0.000s

 No Sleep Speed Monitoring   13 Outputs No Sleep - around 67 Mbps
  
     Loops microsecs   runsecs  cycles/sec   ./incount cycles per second

 100000000         0    38.782  2578491.8    2564947.90 ON and 2564948.00 OFF
 100000000         0    38.844  2574369.0    2565509.59 ON and 2565509.59 OFF
 100000000         0    38.780  2578670.5    2566093.49 ON and 2566093.49 OFF
 ----------------------------------------------------------------------------------

C Just sleep

                  Pi 400 1800 MHz               Pi 400 600 MHz
   Loops   micro      run   cycles      over        run   cycles    total      over
         seconds  seconds  /second     heads    seconds  /second      CPS     heads

      100  100000   20.012      5.0     0.012     20.017      5.0       65     0.017
     1000   10000   20.125     49.7     0.125     20.171     49.6      645     0.171
    10000    1000   21.222    471.2     1.222     21.651    461.9     6005     1.651
   100000     100   32.103   3114.9    12.103     36.302   2754.7    35811    16.302
  1000000      10   20.003  49993.0     0.003     22.249  44945.1   584286     2.249
 10000000       1   20.022 499459.8     0.022     42.535 235100.6  3056308    22.535

Maximum Speeds 10000000 Loops Just sleep

Python          1 1280.522   7809.3  1260.522   1776.560   5628.9    73176  1756.560
C               1   20.022 499459.8     0.022     42.535 235100.6  3056308    22.535

C/Python              64.0     64.0     57296       41.8     41.8     41.8      77.9

  

Pico Python Tests Next or Go To Start


Pico Python Tests

Output With Sleeps - In this case, utime.sleep_us(microsecs) was used for better performance than from time.sleep(), limited by millisecond resolution. Comparison with Python results from tests, run on a 1800 MHz Pi 400, provide mixed messages. Pico was indicated as faster, driving one output, but slower with thirteen.

Output With No Sleeps - For the larger loop counts, running time is normally sufficient to produce consistent performance. Python results for Pi 400, running at both 1800 and 600 MHz, are provided, where performance differences were nearly proportional to CPU MHz and running time for 13 outputs around 13 times longer than with one output.

Pico performance relationships were somewhat different, certainly not proportional to CPU MHz, said to be 125 MHz. Performance with one output was equivalent to that of a Pi 400 running at around 1200 MHz, then similar to that of a Pi 400 at 600 MHz driving 13 outputs. For this, the running time for 13 outputs was just over 20 times longer than for one output. An additional test was run, using 8 outputs, where the eight to one increase was around 13 times.

Sleep Only Tests - Results for Pico and 1800 MHz Pi 400 are provided. Because of the overheads, both varied from, what might be expected, cycles per second, the Pico timer suffering less and increasing apparent higher throughput.


    Loops   micro      run   cycles      over        run   cycles     over
          seconds  seconds  /second     heads    seconds  /second    heads

                   Pico one output + sleep       Pico 13 outputs + sleep
      100  100000    20.01      5.0      0.01      20.07      5.0     0.07
     1000   10000    20.12     49.7      0.12      20.76     48.2     0.76
    10000    1000    21.19    472.0      1.19      27.56    362.8     7.56
   100000     100    31.85   3139.5     11.85      95.63   1045.7    75.63
  1000000      10   138.51   7219.9    118.51     776.34   1288.1   756.34
 10000000       1   927.56  10781.0    907.56    7249.31   1379.4  7229.31

Pi 400 1800 MHz GPIO Python
    10000    1000    21.51    464.8      1.51     23.451    426.4     3.45
 10000000       1  1520.88   6575.2   1500.88   3428.312   2916.9  3408.31


          Pico one output no sleeps             Pico 13 outputs no sleeps

    Loops   micro      run   cycles microsecs        run   cycles    Total microsecs
          seconds  seconds  /second     /Loop    seconds  /second      CPS     /Loop

      100       0     0.00    25000       0.0       0.05     1887    24528     500.0
     1000       0     0.03    32258      30.0       0.63     1600    20800     630.0
    10000       0     0.31    32154      31.0       6.25     1601    20810     625.0
   100000       0     3.11    32206      31.1      62.47     1601    20812     624.7
  1000000       0    31.04    32213      31.0     624.65     1601    20812     624.7
 10000000       0   310.42    32214      31.0    6246.46     1601    20812     624.6

 10000000       0                    8 outputs   4133.32     2419    19352     413.3


 Pi 400 1800 and 600 MHz

              MHz
 10000000    1800   167.93    59549      16.8    2210.00     4525    58824     221.0
 10000000     600   500.35    19986      50.0    6160.18     1623    21103     616.0
 Ratios
 Pi 400 MHz   3.0     2.98                          2.79
 Pico/Pi 1800         0.54                          0.35 
 Pico/Pi  600         1.61                          0.99



                   Pico Python Sleep only        Pi 400 1800 MHz Python      Expected
    Loops   micro      run   cycles      over        run   cycles     over    cycles
          seconds  seconds  /second     heads    seconds  /second    heads   /second

      100  100000    20.01      5.0      0.01      20.03      5.0     0.03       5.0
     1000   10000    20.07     49.8      0.07      20.13     49.7     0.13      50.0
    10000    1000    20.67    483.7      0.67      21.26    470.3     1.26     500.0
   100000     100    26.75   3738.3      6.75      32.49   3077.7    12.49    5000.0
  1000000      10    87.38  11444.3     67.38     144.32   6929.3   124.32   50000.0
 10000000       1   677.01  14770.8    657.01    1280.52   7809.3  1260.52  500000.0
  

Pico C Tests Next or Go To Start


Pico C Tests

Output With Sleeps - For running these tests, sleep_us(microsecs) was used initially, but results were unsatisfactory for shorter sleep times. This was manly corrected sometime later. Meanwhile busy_wait_us(microsecs) was tried. As shown, this appeared to provide a perfect solution when handling a single output, but with minor variations with the highest activity attempted. The best possible running times imply that output switching time is so fast that it does not affect running time, within the displayed range.

As indicated earlier, for Pi 400 tests, the sleep timer produced weird timing variations at mid point but came good with the shorter delays. The results indicate that maximum performance, in the range down to one microsecond, were effectively the same from a Pi 400 GPIO and a Pico. For these tests, with sleeping, the Pico C compilations were up to 362.5 times faster than those from Python.

Output With No Sleeps - As indicated earlier, these represent maximum data transfer speeds, where cycles per second can be converted to Mega bits per second, in this case, with Pico C achieving up to 51.6 Mbps. Comparisons for this area show that the Pico performed at up to 77% of a 1800 MHz Pi 400, equivalent to a Pi 4 at 1386 MHz. Then, The C version was up to 1239.3 times faster than the Python variety.

Sleep Only Tests - With the same maximum performance, in all areas, being the same as the full tests, using busy_wait_us(microsecs), new comparisons are unnecessary. Results using the updated sleep_us(microsecs) are provided, showing slightly less accuracy with 1 microsecond sleeps but not so using 2. Results from my Pi 4 based input frequency monitor are provided.

                    One output + sleep              13 Outputs + Sleep
    Loops   micro      run   cycles      over        run   cycles     over
          seconds  seconds  /second     heads    seconds  /second    heads

      100  100000    20.00      5.0      0.00      20.00      5.0     0.00
     1000   10000    20.00     50.0      0.00      20.00     50.0     0.00
    10000    1000    20.00    500.0      0.00      20.00    500.0     0.00
   100000     100    20.00   5000.0      0.00      20.00   5000.0     0.00
  1000000      10    20.00  50000.0      0.00      20.00  50000.0     0.00
 10000000       1    20.00 500000.0      0.00      20.00 499999.9     0.00

Pi 400 1800 MHz C
    10000    1000    21.20    471.6     1.204      21.22    471.2     1.22
 10000000       1    20.03 499255.9     0.030      20.02 499475.6     0.02

Pico Python
  1000000      10   138.51   7219.9    118.51     776.34   1288.1   756.34
 10000000       1   927.56  10781.0    907.56    7249.31   1379.4  7229.31

C/Python       10              6.9                           38.8                        
C/Python        1             46.4                          362.5

                   One Output No Sleep           13 Outputs No Sleep
    Loops   micro      run   cycles microsecs        run   cycles    Total microsecs
          seconds  seconds  /second     /loop    seconds  /second      CPS     /loop

      100       0    0.000 11111111                0.000  1470588 19117647
     1000       0    0.000 20408164                0.001  1984127 25793651
    10000       0    0.000 20833334                0.005  1984127 25793650     0.500
   100000       0    0.005 20833332     0.050      0.050  1984127 25793651     0.500
  1000000       0    0.048 20833334     0.048      0.504  1984127 25793651     0.504
 10000000       0    0.480 20833334     0.048      5.040  1984127 25793651     0.504

 Maximum Mbps               41.66                                  51.60

Pi 400 1800 MHz
 10000000       0    0.314 31823078     0.031      3.876  2579794 33537325     0.388
Pico Python
 10000000       0   310.42    32214      31.0    6246.46     1601    20812     624.6

C Pico / Pi 400                0.65                                   0.77 
Pico C / Python              646.72                                1239.30 

 Just Sleep  using busy_wait_us(microsecs)       using sleep_us(microsecs)
    Loops   micro      run   cycles      over        run   cycles     over
          seconds  seconds  /second     heads    seconds  /second    heads

      100  100000   20.000      5.0     0.000     20.000      5.0    0.000
     1000   10000   20.000     50.0     0.000     20.000     50.0    0.000
    10000    1000   20.000    500.0     0.000     20.000    500.0    0.000
   100000     100   20.000   5000.0     0.000     20.000   5000.0    0.000
  1000000      10   20.000  50000.0     0.000     20.000  50000.0    0.000
 10000000       1   20.000 500000.0     0.000     27.619 362068.9    7.619
  5000000       2                                 20.000 250000.0    0.000

 ./incount for Raspberry Pi GPIO Frequency
  10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF
  10.00 Seconds for Cycles Per Second 50.10 ON and 50.00 OFF
  10.00 Seconds for Cycles Per Second 500.00 ON and 500.10 OFF
  10.00 Seconds for Cycles Per Second 4995.67 ON and 4995.67 OFF
  10.00 Seconds for Cycles Per Second 49988.64 ON and 49988.74 OFF
  10.00 Seconds for Cycles Per Second 499324.08 ON and 499324.08 OFF

Power abd Wiring Next or Go To Start


Power

I carried out various tests, measuring voltages and current. There were, of course, variations in current but USB input voltage changed little from 5.26 volts and Pico 3.3V pin from 3.27 volts. A split cable was used for measuring USB current.

  USB Current C 13 Outputs

 50.0 mA - C Continuous output ON, no sleeps
 28.2 mA - C Program inactive


 Current to ground - on breadboard

 32.3 mA - C Continuous On output
 16.5 mA - C Continuous On/Off output


 USB Current MicroPython 13 outputs
 
 19.0 mA     - Thonny Python open    
 35.3 mA     - output ON/OFF no delays 
 20 to 45 mA - output 13 flashing


 USB Current CPU C Benchmarks - see later

  7.9 mA         - Waiting to copy uf2 file
 19.2 to 20.4 mA - Whetstone
 20.2 to 20.3 mA - Dhrystone
 19.0 to 20.5 mA - MemSpeed
 17.9 mA         - Finished

Longer Test - I also ran a two hour continuously, 32.3 mA, C output test, measuring temperatures with an infrared thermometer. At a room temperature of 21°C, maximum Pico board readings increased from 25°C to only 27°C. Meanwhile, the effectively inactive Pi 4 CPU was at 47°C.


Wiring

As indicated earlier, I put together two test beds, one attached to a ribbon cable, for plugging into either a Pi 4 or Pi 400, the other with wiring to a Pico. Both have the same set up, with 11 LEDs, connected to ground with input for each supplied via 220 ohm resistors. Then there is one output connected directly to ground via a 330 ohm resistor, plus another output connected to a Pi 4 input pin, via 1000 ohms, that is for monitoring transmitted signal frequencies.

The simple diagrams below show which of the pi 4/400 and Pico physical pins are used. As shown later, I included these physical pin numbers in the program pin names to help in understanding the different program structures. The names are allocated to the partner logical pin numbers in the programs, in this case the standard ones for Pico and those required by WiringPi for the Pi computers.

The top three Pico connections to the Pi 4 are for serial I/O to allow program printed output to be displayed in a Pi 4 or Pi 400 Terminal window, following executing the appropriate minicom command.

          Pi 4 or Pi 400                                      Pico Top
          _________________
                           |                                    USB
                  1       2|                       Pi 4 10<  1      40
                  3       4|                       Pi 4  8<  2      39
                  5       6| >GROUND               PI 4 14<  3      38
                  7       8|        < Pico 2                 4      37
                  9      10|        < Pico 1                 5      36
                 11      12|                           LED<  6      35
 Pi 4/400 INPUT> 13      14|        < Pico 3           LED<  7      34
            LED< 15      16| >LED                            8      33
                 17      18| >LED                      LED<  9      32
                 19      20|                           LED< 10      31
                 21      22| >INPUT                    LED< 11      30
                 23      24|                           LED< 12      29
                 25      26|                                13      28
                 27      28|                           LED< 14      27
            1kR< 29      30|                           LED< 15      26
            LED< 31      32| >LED                      LED< 16      25
            LED< 33      34|                           1kR< 17      24
            LED< 35      36| >LED                           18      23 >GROUND
            LED< 37      38| >LED                      LED< 19      22
                 39      40| >LED                      LED< 20      21 >Pi 4/400 INPUT
                           |
                           | < Pi 400 bottom
  

Program Notes Next or Go To Start


Program Code Format and Execution Notes

Required Documentation - References to raspberry-pi-pico-python-sdk.pdf and raspberry-pi-pico-c-sdk.pdf are required to install the appropriate software and to produce application programs.

One Output - Following are two Python and two C program listings for tests driving one output with sleep delays, firstly the Pi 4 versions, followed by those for Pico. These have differing and varying pin allocation and use functions, also variations in timing procedures and, particularly, print formatting. Then, to ease wiring, common pin program names, P4Pin40 and PicoPin20, that are physical pin numbers.

These programs can have temporary modifications, by changing the printed title and either commenting out sleep or output functions for “Output With No Sleeps” or “Sleep Only” tests.

Pi 4 Python Operation - Assuming Thonny Python IDE is installed, clicking on the .py program loads it and can be executed by clicking on the Run button, the output being displayed by the IDE.

Pi 4 C Operation - In the supplied format, the programs require the installation of WiringPi. For compilation and running, normal Terminal commands are used, an example following. For execution, the program failed to run properly on a Pi 400, if the recommended sudo was included.

gcc -O3 -o Pi4OneOut Pi4OneOut.c -lwiringPi sudo ./Pi4OneOut
Pico Python - This requires installation of Raspberry Pi Pico Python SDK and copying the MicroPython UF2 file to the Pico. This is too complicated to explain here, but is easily obtainable on searching Internet. With this UF2 file installed, Thonny Python can be loaded to create or copy a new file, save it on the Pico and run it, with data displayed by Thonny. For opening an existing Python file, a choice is provided to access it from the computer or the MicroPython device.

Pico C - Pico SDK installation is required for this. The end process leads to a folder with the C source code files installed, along with CMakeLists.txt, identifying project name and source and destination file names, plus a standard pico_sdk_import.cmake file. Then the following commands are used, from a normal Terminal, to install the required software and compile the program as a UF2 file.

mkdir build cd build export PICO_SDK_PATH=../../pico-sdk cmake .. make
Then, this has to be copied to the Pico, as MicroPython UF2 above, to immediately begin execution. Beforehand, a new Terminal should be opened to start minicom, as shown below, where the output will be displayed. If necessary, following changes, the program can normally be recompiled by just executing the make command and the copy to Pico repeated.
minicom -b 115200 -o -D /dev/serial0
13 Outputs - Following the four short program listings are details of the changes that were made to drive thirteen outputs, if anything, to emphasise the different program structures used.





Pi 4 Python and C Code Next or Go To Start


Pi 4 Python and C Code

Pi4OneOut.py

import time
from gpiozero import LED
from time import sleep

loops = 100
microsecs = 100000

P4Pin40 = LED(21)

print("Python One Output + Sleep\n")
print("     Loops microsecs   runsecs  cycles/sec")
for m in range(6):
    startTime = time.perf_counter()
    for i in range(loops): 
        P4Pin40.on()
        sleep(microsecs/1000000)

        P4Pin40.off()
        sleep(microsecs/1000000)
    endTime = time.perf_counter()
    runTime = endTime - startTime
    cps = loops/runTime
    print(f"{loops:10d}{microsecs:10.0f}{runTime:10.3f}{cps:12.1f}")
    loops = loops * 10
    microsecs = microsecs / 10
print ("End\n")

Pi4OneOut.c

#include "stdio.h"
#include "wiringPi.h"
#include "time.h"

#define P4Pin40 29

int loops = 100;  
unsigned int microsecs = 100000;
float  cps;
double  runSecs = 0;
double  startSecs;
double  theseSecs;
double  endSecs;
struct  timespec tp1;

double getSecs()
{
    clock_gettime(CLOCK_REALTIME, &tp1);
    theseSecs =  tp1.tv_sec + tp1.tv_nsec / 1e9;               
    return theseSecs;
}

int main(int argc, char *argv[])
{
  if (wiringPiSetup () == -1)return 1 ;
  pinMode (P4Pin40, OUTPUT);
  printf("One Output + Sleep\n\n");
  printf("     Loops microsecs   runsecs  cycles/sec\n");

    for (int r = 0; r < 6; r++)
    {
       startSecs = getSecs();    
      for (int i=0; i < loops; i++) 
      {
        digitalWrite (P4Pin40, 1) ;    
        delayMicroseconds(microsecs);   
        digitalWrite (P4Pin40, 0) ;
        delayMicroseconds(microsecs);
      }
      endSecs = getSecs();
      runSecs = endSecs - startSecs;
      cps = (double)loops / runSecs;
      printf("%10d %9ld %9.3f %10.1f \n", loops, microsecs, runSecs, cps); 
      loops = loops * 10;
      microsecs = microsecs / 10;
    }
    printf(" End\n\n");
    return 0;
 
Pico Python and C Code Next or Go To Start


Pico Python and C Code

PicoOneOut.py

import time
import utime

loops = 100
microsecs = 100000

PicoPin20 = machine.Pin(15, machine.Pin.OUT)

print(' Pico Python One Output + Sleep')
print('     Loops  microsecs   runsecs  cycles/sec')
for j in range (6):
    startTime = utime.ticks_ms()
    for i in range(loops):
        PicoPin20.value(1)
        utime.sleep_us(int(microsecs))

        PicoPin20.value(0)
        utime.sleep_us(int(microsecs))
    endTime = utime.ticks_ms()
    runTime = utime.ticks_diff(endTime,startTime)/1000     
    cps = loops/runTime
    print('{:10d} {:9.0f} {:9.2f} {:11.1f}'
          .format(loops, microsecs, runTime, cps))
    loops = loops * 10
    microsecs = microsecs / 10
print ("End")

PicoOneOut.c

#include "stdio.h"
#include "pico/stdlib.h"
#include "hardware/gpio.h"

const uint PicoPin20 = 15;

uint loops = 100;
uint64_t microsecs = 100000;
uint64_t startTime;
uint64_t endTime;
float runSecs;
float cps;
    
int main() 
{
   setup_default_uart();

   gpio_init(PicoPin20);

   gpio_set_dir(PicoPin20, GPIO_OUT);
   printf("One Output + Sleep\n\n");
   printf("Just Sleep\n\n");
   printf("     Loops microsecs   runsecs  cycles/sec\n");

   for (int r = 0; r < 6; r++)
   {
      startTime =  time_us_64 ();
      for (uint i = 0; i < loops; i++) 
      {
         gpio_put(PicoPin20, 1);
         busy_wait_us(microsecs); 
         gpio_put(PicoPin20, 0);  
         busy_wait_us(microsecs); 
      }
      endTime =  time_us_64 ();
      runSecs = (float)(endTime - startTime) / 1000000.0;
      cps = (float)loops / runSecs;
      printf("%10d %9ld %9.3f %10.1f \n", loops, microsecs, runSecs, cps); 
      loops = loops * 10;
      microsecs = microsecs / 10;
   }
   printf(" End\n\n");
  
Pi 4 Python and C Code Extensions Next or Go To Start


Pi 4 Python and C Code Extensions For 13 Outputs

For 13 outputs, change references to Pi4OneOut.c, in above Pi4OneOut.py or Pi4OneOut.c, to the appropriate following lists. Note that the correct space/tab indent is critical with Python.

Pi4ThirteenOut.py

P4Pin40 = LED(21)
P4Pin38 = LED(20)
P4Pin36 = LED(16)
P4Pin32 = LED(12)
P4Pin37 = LED(26)
P4Pin35 = LED(19)
P4Pin33 = LED(13)
P4Pin31 = LED(6)
P4Pin29 = LED(5)
P4Pin22 = LED(25)
P4Pin18 = LED(24)
P4Pin16 = LED(23)
P4Pin15 = LED(22)















        P4Pin40.on()
        P4Pin38.on()
        P4Pin36.on()
        P4Pin32.on()
        P4Pin37.on()
        P4Pin35.on()
        P4Pin33.on()
        P4Pin31.on()
        P4Pin29.on()
        P4Pin22.on()
        P4Pin18.on()
        P4Pin16.on()
        P4Pin15.on()
    
        P4Pin40.off()
        P4Pin38.off()
        P4Pin36.off()
        P4Pin32.off() 
        P4Pin37.off()
        P4Pin35.off()
        P4Pin33.off()
        P4Pin31.off()
        P4Pin29.off()
        P4Pin22.off()
        P4Pin18.off()
        P4Pin16.off()
        P4Pin15.off()
 
Pi4ThirteenOut.c

#define P4Pin40 29
#define P4Pin38 28
#define P4Pin36 27
#define P4Pin32 26
#define P4Pin37 25
#define P4Pin35 24
#define P4Pin33 23
#define P4Pin31 22
#define P4Pin29 21
#define P4Pin22 6
#define P4Pin18 5
#define P4Pin16 4
#define P4Pin15 3   

  pinMode (P4Pin40, OUTPUT);
  pinMode (P4Pin38, OUTPUT);
  pinMode (P4Pin36, OUTPUT);
  pinMode (P4Pin32, OUTPUT);
  pinMode (P4Pin37, OUTPUT);
  pinMode (P4Pin35, OUTPUT);
  pinMode (P4Pin33, OUTPUT);
  pinMode (P4Pin31, OUTPUT);
  pinMode (P4Pin29, OUTPUT);
  pinMode (P4Pin22, OUTPUT);
  pinMode (P4Pin18, OUTPUT);
  pinMode (P4Pin16, OUTPUT);
  pinMode (P4Pin15, OUTPUT);

        digitalWrite (P4Pin40, 1);    
        digitalWrite (P4Pin38, 1);
        digitalWrite (P4Pin36, 1);
        digitalWrite (P4Pin32, 1);
        digitalWrite (P4Pin37, 1);
        digitalWrite (P4Pin35, 1);
        digitalWrite (P4Pin33, 1);
        digitalWrite (P4Pin31, 1);
        digitalWrite (P4Pin29, 1);
        digitalWrite (P4Pin22, 1);
        digitalWrite (P4Pin18, 1);
        digitalWrite (P4Pin16, 1);
        digitalWrite (P4Pin15, 1);

        digitalWrite (P4Pin40, 0);
        digitalWrite (P4Pin38, 0);
        digitalWrite (P4Pin36, 0);
        digitalWrite (P4Pin32, 0);
        digitalWrite (P4Pin37, 0);
        digitalWrite (P4Pin35, 0);
        digitalWrite (P4Pin33, 0);
        digitalWrite (P4Pin31, 0);
        digitalWrite (P4Pin29, 0);
        digitalWrite (P4Pin22, 0);
        digitalWrite (P4Pin18, 0);
        digitalWrite (P4Pin16, 0);
        digitalWrite (P4Pin15, 0);
        delayMicroseconds(microsecs);

  
Pico Python and C Code Extensions Next or Go To Start


Pico Python and C Code Extensions For 13 Outputs

For 13 outputs, change references to PicoPin20, in above PicoOneOut.py or PicoOneOut.c, to the appropriate following lists. Note that the correct space/tab indent is critical with Python.

PicoThirteenOut.py
Starts below

 




























PicoPin20 = machine.Pin(15, machine.Pin.OUT)
PicoPin19 = machine.Pin(14, machine.Pin.OUT)
PicoPin17 = machine.Pin(13, machine.Pin.OUT)
PicoPin16 = machine.Pin(12, machine.Pin.OUT)
PicoPin15 = machine.Pin(11, machine.Pin.OUT)
PicoPin14 = machine.Pin(10, machine.Pin.OUT)
PicoPin12  = machine.Pin(9, machine.Pin.OUT)
PicoPin11  = machine.Pin(8, machine.Pin.OUT)
PicoPin10  = machine.Pin(7, machine.Pin.OUT)
PicoPin9  = machine.Pin(6, machine.Pin.OUT)
PicoPin7  = machine.Pin(5, machine.Pin.OUT)
PicoPin6  = machine.Pin(4, machine.Pin.OUT)
PicoPin21 = machine.Pin(16, machine.Pin.OUT)

        PicoPin20.value(1)
        PicoPin19.value(1)
        PicoPin17.value(1)
        PicoPin16.value(1)
        PicoPin15.value(1)
        PicoPin14.value(1)
        PicoPin12.value(1)
        PicoPin11.value(1)
        PicoPin10.value(1)
        PicoPin9.value(1)
        PicoPin7.value(1)
        PicoPin6.value(1)
        PicoPin21.value(1)

        PicoPin20.value(0)
        PicoPin19.value(0)
        PicoPin17.value(0)
        PicoPin16.value(0)
        PicoPin15.value(0)
        PicoPin14.value(0)
        PicoPin12.value(0)
        PicoPin11.value(0)
        PicoPin10.value(0)
        PicoPin9.value(0)
        PicoPin7.value(0)
        PicoPin6.value(0)
        PicoPin21.value(0)
   
PicoThirteenOut.c



const uint PicoPin20 = 15;
const uint PicoPin19 = 14;
const uint PicoPin17 = 13;
const uint PicoPin16 = 12;
const uint PicoPin15 = 11;
const uint PicoPin14 = 10;
const uint PicoPin12 = 9;
const uint PicoPin11 = 8;
const uint PicoPin10 = 7;
const uint PicoPin9 = 6;
const uint PicoPin7 = 5;
const uint PicoPin6 = 4;
const uint PicoPin21 = 16;

   gpio_init(PicoPin20);
   gpio_init(PicoPin19);
   gpio_init(PicoPin17);
   gpio_init(PicoPin16);
   gpio_init(PicoPin15);
   gpio_init(PicoPin14);
   gpio_init(PicoPin12);
   gpio_init(PicoPin11);
   gpio_init(PicoPin10);
   gpio_init(PicoPin9);
   gpio_init(PicoPin7);
   gpio_init(PicoPin6);
   gpio_init(PicoPin21);

   gpio_set_dir(PicoPin20, GPIO_OUT);
   gpio_set_dir(PicoPin19, GPIO_OUT);
   gpio_set_dir(PicoPin17, GPIO_OUT);
   gpio_set_dir(PicoPin16, GPIO_OUT);
   gpio_set_dir(PicoPin15, GPIO_OUT);
   gpio_set_dir(PicoPin14, GPIO_OUT);
   gpio_set_dir(PicoPin12, GPIO_OUT);
   gpio_set_dir(PicoPin11, GPIO_OUT);
   gpio_set_dir(PicoPin10, GPIO_OUT);
   gpio_set_dir(PicoPin9, GPIO_OUT);
   gpio_set_dir(PicoPin7, GPIO_OUT);
   gpio_set_dir(PicoPin6, GPIO_OUT);
   gpio_set_dir(PicoPin21, GPIO_OUT);

         gpio_put(PicoPin20, 1);
         gpio_put(PicoPin19, 1);
         gpio_put(PicoPin17, 1);
         gpio_put(PicoPin16, 1);
         gpio_put(PicoPin15, 1);
         gpio_put(PicoPin14, 1);
         gpio_put(PicoPin12, 1);
         gpio_put(PicoPin11, 1);
         gpio_put(PicoPin10, 1);
         gpio_put(PicoPin9, 1);
         gpio_put(PicoPin7, 1);
         gpio_put(PicoPin6, 1);
         gpio_put(PicoPin21, 1);
        
         gpio_put(PicoPin20, 0);  
         gpio_put(PicoPin19, 0);  
         gpio_put(PicoPin17, 0);
         gpio_put(PicoPin16, 0);
         gpio_put(PicoPin15, 0);
         gpio_put(PicoPin14, 0);
         gpio_put(PicoPin12, 0);
         gpio_put(PicoPin11, 0);
         gpio_put(PicoPin10, 0);
         gpio_put(PicoPin9, 0);
         gpio_put(PicoPin7, 0);
         gpio_put(PicoPin6, 0);
         gpio_put(PicoPin21, 0);

  
Pi 4 C Input Monitor Next or Go To Start


Pi 4 C Input Monitor

/*
gcc -O3 -o incount incount.c -lwiringPi
sudo ./incount
*/

#include "stdio.h"
#include "wiringPi.h"
#include "time.h"
 
#define  P4Pin13 2  // WiringPi pin address 

double  startSecs;
double  theseSecs;

struct  timespec tp1;
double  minTime = 10.0;

double getSecs()
{
    clock_gettime(CLOCK_REALTIME, &tp1);
    theseSecs =  tp1.tv_sec + tp1.tv_nsec / 1e9;
    return theseSecs;
}

int main (void)
{
    int     i;
    double     count1 = 1;
    double     count2 = 1;
    double     cycles1 = 0;
    double     cycles0 = 0;
    double     runTime = 0;

    printf ("Raspberry Pi GPIO Frequency\n");
    
    if (wiringPiSetup () == -1) return 1;
    pinMode (P4Pin13, INPUT);
    
    startSecs = getSecs();
    while (runTime < minTime) 
    {
        for (i=0; i < 1000; i++)
        {
            if (digitalRead(P4Pin13)) 
            {
                if (count1 == 1)
                {
                    cycles1 = cycles1 + 1;
                    count1 = 0;
                    count2 = 1;
                }
            }
            else
            {
                if (count2 == 1)
                {
                    cycles0 = cycles0 + 1;
                    count1 = 1;
                    count2 = 0;
                }
            }
        }
        runTime = getSecs() - startSecs;
    }
    if (cycles1 == 0)
    {
        printf (" No cycles recorded\n");
    }
    else
    {
       printf (" %6.2f Seconds for Cycles Per Second "
       "%.2f ON and %.2f OFF\n", runTime, cycles1/runTime, cycles0/runTime);
    }
    return 0;
}
  
Pi 4 C Monitor Results Next or Go To Start


Pi 4 C Monitor Results



Pi 4 performance
0.0   ARM MHz=1500, core volt=0.8625V, CPU temp=56.0'C, pmic temp=51.4'C

Pico

ž13 Outputs + Sleep using busy_wait_us(microsecs)

     Loops microsecs   runsecs  cycles/sec 
       100    100000    20.000        5.0
      1000     10000    20.000       50.0
     10000      1000    20.000      500.0
    100000       100    20.000     5000.0
   1000000        10    20.000    50000.0
  10000000         1    20.000   499999.7
 End

PI 4

  pi@raspberrypi:~/picoME/picoc $ ./incount
  10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF
  10.00 Seconds for Cycles Per Second 50.00 ON and 50.10 OFF
  10.00 Seconds for Cycles Per Second 500.10 ON and 500.00 OFF
  10.00 Seconds for Cycles Per Second 4999.78 ON and 4999.78 OFF
  10.00 Seconds for Cycles Per Second 49997.47 ON and 49997.37 OFF
  10.00 Seconds for Cycles Per Second 499560.23 ON and 499560.23 OFF


pi 4 powersave
0.0   ARM MHz= 600, core volt=0.8625V, CPU temp=54.5'C, pmic temp=51.4'C

Pico

13 Outputs + Sleep using busy_wait_us(microsecs)
    
     Loops microsecs   runsecs  cycles/sec

       100    100000    20.000        5.0
      1000     10000    20.000       50.0
     10000      1000    20.000      500.0
    100000       100    20.000     5000.0
   1000000        10    20.000    50000.0
  10000000         1    20.000   499999.6
 End

pi 4 

  pi@raspberrypi:~/picoME/picoc $ ./incount
  10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF
  10.00 Seconds for Cycles Per Second 50.10 ON and 50.00 OFF
  10.00 Seconds for Cycles Per Second 500.09 ON and 499.99 OFF
  10.00 Seconds for Cycles Per Second 4999.18 ON and 4999.18 OFF
  10.00 Seconds for Cycles Per Second 49894.83 ON and 49894.83 OFF
  10.00 Seconds for Cycles Per Second 496711.77 ON and 496711.87 OFF


  
Pico C CPU Benchmark Next or Go To Start


Pico C CPU Benchmarks

I have run some of my normal C benchmarks on the Pico. Changes needed were references to CPU configuration, file output and timing. Initially, performance appeared to be unacceptable, not realising, at the time, that the Pico has no floating point hardware and it was 16 bit architecture. Results from a Pi 4B are also included, for comparison purposes. The benchmarks run were;
  • Whetstone - Mainly floating point calculations, including functions such as cos and sqrt, but also some integer functions.
  • Dhrystone - All integer handling and arithmetic.
  • MemSpeed - that covers double precision and single precision floating point and integer calculations using stored data that can be in caches and RAM.

The execution times of the benchmark programs are calibrated to run for an approximate reasonable finite time, that are 10 seconds for Whetstone and Dhrystone and a minimum of 0.1 seconds for individual MemSpeed tests.

The benchmarks were run on the Pico CPU, that operates at 125 MHz, and a 1500 MHz Raspberry Pi 4B, twelve times faster. Then the Pi 4 measured 244 times faster with Whetstone, influenced by lack of floating point hardware in the Pico, 38 times faster with Dhrystone and significantly higher using MemSpeed. Performance is often quoted on a per MHz basis, where Pico comes out badly. A complete contrast was apparent running the bit banging type tests.

During the earlier tests, simply measuring pin output speeds, a Pi 400 was found to be capable of transferring a maximum of 67.1 Mega bits/second (Mbps), with a single CPU core running at 100% utilisation. That could be rated as 0.037 Bit Bangs per MHz (BB/MHz). The Pico achieved 51.6 Mbps or 0.41 BB/MHz, more than eleven times more efficient, clearly not dependent on CPU MHz.

The benchmarks are available for downloading in PicoBenchmarks.zip, that contains C source codes and .uf2 Pico execution programs, along with CMakeLists.txt file, needed for compilation, plus example Pico results.





Pico Whetstone CPU Benchmark Next or Go To Start


Pico Whetstone CPU Benchmark

The first one running successfully was the Whetstone Benchmark, with single precision floating point and integer operations and overall rating in Millions of Whetstone Instructions Per Second (MWIPS).

Note the difference in numerical results, between Pico and Pi 4 tests. However, the Pico numbers are of the right precision for 32 bit floating point numbers, and rounded from those from Pi 4 output. The differences might be due to processor hardware variations.

The Pi 4 produced an impossible huge MOPS score for the IF test, caused by compiler optimisation (like we only need to execute the test loop once). The time for this, when running as intended, is inevitably so short that it has no real influence on the MWIPS rating.

Pico 125 MHz
##########################################  
                                              
Single Precision C Whetstone Benchmark  
                                                  
Calibrate                                                                                 
       1.20 Seconds          1   Passes (x 100)                                           
       5.99 Seconds          5   Passes (x 100)                                           
                                                                                          
Use 8  passes (x 100)                                                                     
                                                                                          
          Single Precision C/C++ Whetstone Benchmark                                      
                                                                                          
Loop content                  Result              MFLOPS      MOPS   Seconds              
                                                                                          
N1 floating point     -1.12475013700000000         1.493              0.103               
N2 floating point     -1.12274742100000000         1.495              0.719               
N3 if then else        1.00000000000000000                  93.729    0.009               
N4 fixed point        12.00000000000000000                   5.716    0.441               
N5 sin,cos etc.        0.49911010300000000                   0.160    4.171               
N6 floating point      0.99999982100000000         1.531              2.819               
N7 assignments         3.00000000000000000                  53.567    0.028               
N8 exp,sqrt etc.       0.75110864600000000                   0.228    1.306               
                                                                                          
MWIPS                                              8.338              9.595               


Pi 4B 1500 MHz
##########################################                           

          Single Precision C/C++ Whetstone Benchmark

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12475013732910156       524.661              0.074
N2 floating point     -1.12274742126464844       533.855              0.511
N3 if then else        1.00000000000000000                     N/A    0.000
N4 fixed point        12.00000000000000000                2497.509    0.256
N5 sin,cos etc.        0.49911010265350342                  55.124    3.065
N6 floating point      0.99999982118606567       387.309              2.829
N7 assignments         3.00000000000000000                 998.853    0.376
N8 exp,sqrt etc.       0.75110864639282227                  26.174    2.887

MWIPS                                           2031.394              9.998

  

Pico Dhrystone CPU Benchmark Next or Go To Start


Pico Dhrystone CPU Benchmark

This is one of ARM’s normal performance specifications, quoted as DMIPS instead of VAX MIPS. The benchmark results can vary according to the compiler used, but the quote for ARM Cortex-M0+ (that I found) was 0.93 DMIPS/MHz. Here, assuming the Pico runs at 125 MHz (and is the same M0+ CPU), the rating was 1.14 DMIPS/MHz and the Pi 4 at 3.60, more than three times greater.

Pico 125 MHz
##########################################                           
                                                                      
Dhrystone Benchmark, Version 2.1 (Language: C or C++)                 
                                                                      
Register option not selected                                          
                                                                      
       10000 runs   0.04 seconds                                      
      100000 runs   0.40 seconds                                      
      200000 runs   0.80 seconds                                      
      400000 runs   1.60 seconds                                      
      800000 runs   3.20 seconds                                      
                                                                      
Final values (* implementation-dependent):                            
                                                                      
Int_Glob:      O.K.  5  Bool_Glob:     O.K.  1                        
Ch_1_Glob:     O.K.  A  Ch_2_Glob:     O.K.  B                        
Arr_1_Glob[8]: O.K.  7  Arr_2_Glob8/7: O.K.      800010               
Ptr_Glob->              Ptr_Comp:       *    536884992                
  Discr:       O.K.  0  Enum_Comp:     O.K.  2                        
  Int_Comp:    O.K.  17 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME G
Next_Ptr_Glob->         Ptr_Comp:       *    536884992 same as above  
  Discr:       O.K.  0  Enum_Comp:     O.K.  1                        
  Int_Comp:    O.K.  18 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME G
Int_1_Loc:     O.K.  5  Int_2_Loc:     O.K.  13                       
Int_3_Loc:     O.K.  7  Enum_Loc:      O.K.  1                        
Str_1_Loc:                             O.K.  DHRYSTONE PROGRAM, 1'ST G
Str_2_Loc:                             O.K.  DHRYSTONE PROGRAM, 2'ND G
                                                                      
Nanoseconds one Dhrystone run:      4000.00                           
Dhrystones per Second:               250000                           
VAX  MIPS rating =                   142.29  


Pi 4B 1500 MHz
##########################################                           

Nanoseconds one Dhrystone run:       105.46
Dhrystones per Second:              9482703
VAX  MIPS rating =                  5397.10



  
Pico MemSpeed CPU Benchmark Next or Go To Start


Pico MemSpeed CPU Benchmark

This benchmark uses two arrays that can cover handling data from caches and RAM, as is apparent in the Pi 4 results. With the Pico providing consistent performance, at all data sizes, it is not clear whether caches are availble or it is simply due to slow processing.

The Pico is said to have 264 KB RAM? For the benchmark K is 1024, where 256 KB is 262.144 decimal KB. The program had to be run with a maximum of two times 64 KB to fit.

Directly comparing these Pico and Pi 4 results is not really appropriate, the Pi 4 making use of advanced SIMD vector instructions, to say the least. Looking at those slow floating point speeds, 6 MBytes/second equates to 48 Mbits/second and 97 MBps integer operations to 776 Mbps, much greater that the Bit Banging capabilities for the types of operation considered in this report.

Pico 125 MHz
##########################################                           

                  Memory Reading Speed Test Pico                                          
                                                                                      
  Memory  x[m]=x[m]+s*y[m] Int+  x[m]=x[m]+y[m]         x[m]=y[m]                     
  KBytes    Dble   Sngl  Int32   Dble   Sngl  Int32   Dble   Sngl  Int32              
    Used    MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S              
                                                                                      
       8       6      6     97     18     11     88    107     95     95              
      16       6      6     97     18     11     88    108     95     95              
      32       6      6     97     18     11     88    108     95     95              
      64       6      6     97     18     11     88    108     95     95              
     128       6      6     97     18     11     88    108     95     95              
   End of test


Pi 4B 1500 MHz
##########################################                           

  Memory  x[m]=x[m]+s*y[m] Int+  x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl  Int32   Dble   Sngl  Int32   Dble   Sngl  Int32
    Used    MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S

       8   11761   8660  11894  11787   9516  11889  10318   5225   7796
      16   11874   8690  11921  11886   9552  11919  10479   5118   7892
      32   10592   8195  10732  10719   8832  10728   8853   4468   7360
      64   10093   8361  10407   9996   9082  10400   8704   4632   7541
     128    9997   8521  10535   9948   9309  10529   8143   4750   7491
     256    9987   8536  10569   9956   9320  10568   7990   4928   7644
     512    9124   8336  10168   9321   9085  10215   7992   4929   7681
    1024    3736   6332   6594   3696   6424   6717   5179   3849   4296

  

End Go To Start