Raspberry Pi Pico, Pi 4 and Pi 400 Python and C Basic Beginners Bit Banging BenchmarksRoy Longbottom
These programs were compiled using gcc using WiringPi GPIO access library, where it is recommended that execution should use sudo access, but the programs would only execute on the Pi 400 without sudo.
|
One output + sleep 13 Outputs + Sleep
Loops micro run cycles over run cycles Total over
seconds seconds /second heads seconds /second CPS heads
Pi 4 1500 MHz C
100 100000 20.013 5.0 0.013 20.013 5.0 65 0.013
1000 10000 20.125 49.7 0.125 20.127 49.7 646 0.127
10000 1000 21.248 470.6 1.248 21.255 470.5 6117 1.255
100000 100 32.273 3098.6 12.273 32.327 3093.4 40214 12.327
1000000 10 20.010 49975.9 0.010 20.299 49264.4 640437 0.299
10000000 1 20.021 499474.9 0.021 29.996 333383.2 4333982 9.996
Pi 400 1800 MHz C
100 100000 20.012 5.0 0.012 20.016 5.0 65 0.016
1000 10000 20.122 49.7 0.122 20.125 49.7 646 0.125
10000 1000 21.204 471.6 1.204 21.224 471.2 6126 1.224
100000 100 31.940 3130.8 11.940 32.009 3124.1 40613 12.009
1000000 10 20.031 49922.0 0.031 20.008 49980.9 649752 0.008
10000000 1 20.030 499255.9 0.030 20.021 499475.6 6493183 0.021
Pi 400 600 MHz C
100 100000 20.02 5.0 0.020 20.018 5.0 65 0.018
1000 10000 20.172 49.6 0.172 20.175 49.6 645 0.175
10000 1000 21.638 462.1 1.638 21.631 462.3 6010 1.631
100000 100 36.325 2752.9 16.325 36.232 2760.0 35880 16.232
1000000 10 22.426 44591.6 2.426 23.678 42233.1 549030 3.678
10000000 1 43.946 227552.0 23.946 55.172 181252.9 2356288 35.172
Weird Results - Pi 400 1800 MHz, One output + sleep
Loops microsecs runsecs cycles/sec real user sys
1000 9938 20.000 50.0 0m20.005s 0m0.024s 0m0.000s
10000 1000 21.222 471.2 0m21.226s 0m0.029s 0m0.172s
10000 939 19.999 500.0 0m20.003s 0m0.005s 0m0.198s
100000 100 32.111 3114.2 0m32.115s 0m0.109s 0m1.789s
100000 99 19.802 5050.0 0m19.806s 0m6.349s 0m13.457s
1000000 10 20.002 49994.5 0m20.006s 0m5.239s 0m14.767s
Pi 400 1800 MHz Maximum Speeds 10000000 Loops
One output + sleep 13 Outputs + Sleep
Program micro run cycles over run cycles Total over
seconds seconds /second heads seconds /second CPS heads
Python 1 1520.876 6575.2 1500.876 3428.312 2916.9 37908 3408.352
C 1 20.030 499255.9 0.030 20.021 499475.6 6493183 0.021
C / Python Gain 75 171
|
No Sleeps - The sub-microsecond speeds of these output control operations were shown to be up to around 560 times faster than from the Python versions but, of course, they do not represent arithmetic calculation speeds. Bit banging speed was indicated as up to 67.1 Million bits per second.
Monitor Confirmation - A longer running 13 outputs, no sleep version was compiled to check with monitoring options on the Pi 400, as shown below. Using the time function confirmed the running time and indicated 100% CPU utilisation. Then my input speed monitoring program was run to confirm performance of around 2.58 million cycles per second from the input connection, also indicating 67 million bits per second overall, from 13 outputs.
Pi 400 1800 Versus 600 MHz - confirmed that performance was proportional to CPU MHz.
Sleep Only Tests - As implied by the sub-microsecond output speeds, indicated above, these results were almost identical to those from running the tests with sleeping, with the same weird running times.
Speed gains over Python were not as high as the no sleep tests, due to the inclusion of a constant 20 seconds sleeping times.
C One Output no Sleep C 13 Outputs No Sleep
Loops micro run cycles microsecs run cycles total microsecs
seconds seconds /second /loop seconds /second CPS /loop
Pi 400 1800 MHz
100 0 0.000 20971520 0.000 2452809 31886520
1000 0 0.000 30840470 0.000 2584291 33595780
10000 0 0.000 28747800 0.004 2568151 33385960 0.400
100000 0 0.003 31581236 0.030 0.039 2547669 33119697 0.390
1000000 0 0.031 31863136 0.031 0.388 2578077 33514998 0.388
10000000 0 0.314 31823078 0.031 3.876 2579794 33537325 0.388
C 10M Mbps 63.6 67.1
Python 10M Mbps 0.12 0.12
C/Python 530 559
Pi 400 600 MHz
100 0 0.000 7231559 0.000 830555 10797218
1000 0 0.000 10618491 0.001 858257 11157346
10000 0 0.001 11087243 0.012 846308 11002008 1.200
100000 0 0.009 10648415 0.090 0.117 855015 11115200 1.170
1000000 0 0.097 10357508 0.097 1.165 858464 11160027 1.165
10000000 0 0.943 10606199 0.094 11.650 858359 11158668 1.165
C 10M Mbps 21.2 22.3
Python 10M Mbps 0.040 0.042
C/Python 530 531
----------------------------------------------------------------------------------
No Sleep Time Monitoring 13 Outputs No Sleep - around 67 Mbps
Loops microsecs runsecs cycles/sec Time Results
100000000 0 38.703 2583751.0 real 0m38.708s user 0m38.705s sys 0m0.000s
No Sleep Speed Monitoring 13 Outputs No Sleep - around 67 Mbps
Loops microsecs runsecs cycles/sec ./incount cycles per second
100000000 0 38.782 2578491.8 2564947.90 ON and 2564948.00 OFF
100000000 0 38.844 2574369.0 2565509.59 ON and 2565509.59 OFF
100000000 0 38.780 2578670.5 2566093.49 ON and 2566093.49 OFF
----------------------------------------------------------------------------------
C Just sleep
Pi 400 1800 MHz Pi 400 600 MHz
Loops micro run cycles over run cycles total over
seconds seconds /second heads seconds /second CPS heads
100 100000 20.012 5.0 0.012 20.017 5.0 65 0.017
1000 10000 20.125 49.7 0.125 20.171 49.6 645 0.171
10000 1000 21.222 471.2 1.222 21.651 461.9 6005 1.651
100000 100 32.103 3114.9 12.103 36.302 2754.7 35811 16.302
1000000 10 20.003 49993.0 0.003 22.249 44945.1 584286 2.249
10000000 1 20.022 499459.8 0.022 42.535 235100.6 3056308 22.535
Maximum Speeds 10000000 Loops Just sleep
Python 1 1280.522 7809.3 1260.522 1776.560 5628.9 73176 1756.560
C 1 20.022 499459.8 0.022 42.535 235100.6 3056308 22.535
C/Python 64.0 64.0 57296 41.8 41.8 41.8 77.9
|
Output With No Sleeps - For the larger loop counts, running time is normally sufficient to produce consistent performance. Python results for Pi 400, running at both 1800 and 600 MHz, are provided, where performance differences were nearly proportional to CPU MHz and running time for 13 outputs around 13 times longer than with one output.
Pico performance relationships were somewhat different, certainly not proportional to CPU MHz, said to be 125 MHz. Performance with one output was equivalent to that of a Pi 400 running at around 1200 MHz, then similar to that of a Pi 400 at 600 MHz driving 13 outputs. For this, the running time for 13 outputs was just over 20 times longer than for one output. An additional test was run, using 8 outputs, where the eight to one increase was around 13 times.
Sleep Only Tests - Results for Pico and 1800 MHz Pi 400 are provided. Because of the overheads, both varied from, what might be expected, cycles per second, the Pico timer suffering less and increasing apparent higher throughput.
Loops micro run cycles over run cycles over
seconds seconds /second heads seconds /second heads
Pico one output + sleep Pico 13 outputs + sleep
100 100000 20.01 5.0 0.01 20.07 5.0 0.07
1000 10000 20.12 49.7 0.12 20.76 48.2 0.76
10000 1000 21.19 472.0 1.19 27.56 362.8 7.56
100000 100 31.85 3139.5 11.85 95.63 1045.7 75.63
1000000 10 138.51 7219.9 118.51 776.34 1288.1 756.34
10000000 1 927.56 10781.0 907.56 7249.31 1379.4 7229.31
Pi 400 1800 MHz GPIO Python
10000 1000 21.51 464.8 1.51 23.451 426.4 3.45
10000000 1 1520.88 6575.2 1500.88 3428.312 2916.9 3408.31
Pico one output no sleeps Pico 13 outputs no sleeps
Loops micro run cycles microsecs run cycles Total microsecs
seconds seconds /second /Loop seconds /second CPS /Loop
100 0 0.00 25000 0.0 0.05 1887 24528 500.0
1000 0 0.03 32258 30.0 0.63 1600 20800 630.0
10000 0 0.31 32154 31.0 6.25 1601 20810 625.0
100000 0 3.11 32206 31.1 62.47 1601 20812 624.7
1000000 0 31.04 32213 31.0 624.65 1601 20812 624.7
10000000 0 310.42 32214 31.0 6246.46 1601 20812 624.6
10000000 0 8 outputs 4133.32 2419 19352 413.3
Pi 400 1800 and 600 MHz
MHz
10000000 1800 167.93 59549 16.8 2210.00 4525 58824 221.0
10000000 600 500.35 19986 50.0 6160.18 1623 21103 616.0
Ratios
Pi 400 MHz 3.0 2.98 2.79
Pico/Pi 1800 0.54 0.35
Pico/Pi 600 1.61 0.99
Pico Python Sleep only Pi 400 1800 MHz Python Expected
Loops micro run cycles over run cycles over cycles
seconds seconds /second heads seconds /second heads /second
100 100000 20.01 5.0 0.01 20.03 5.0 0.03 5.0
1000 10000 20.07 49.8 0.07 20.13 49.7 0.13 50.0
10000 1000 20.67 483.7 0.67 21.26 470.3 1.26 500.0
100000 100 26.75 3738.3 6.75 32.49 3077.7 12.49 5000.0
1000000 10 87.38 11444.3 67.38 144.32 6929.3 124.32 50000.0
10000000 1 677.01 14770.8 657.01 1280.52 7809.3 1260.52 500000.0
|
As indicated earlier, for Pi 400 tests, the sleep timer produced weird timing variations at mid point but came good with the shorter delays. The results indicate that maximum performance, in the range down to one microsecond, were effectively the same from a Pi 400 GPIO and a Pico. For these tests, with sleeping, the Pico C compilations were up to 362.5 times faster than those from Python.
Output With No Sleeps - As indicated earlier, these represent maximum data transfer speeds, where cycles per second can be converted to Mega bits per second, in this case, with Pico C achieving up to 51.6 Mbps. Comparisons for this area show that the Pico performed at up to 77% of a 1800 MHz Pi 400, equivalent to a Pi 4 at 1386 MHz. Then, The C version was up to 1239.3 times faster than the Python variety.
Sleep Only Tests - With the same maximum performance, in all areas, being the same as the full tests, using busy_wait_us(microsecs), new comparisons are unnecessary. Results using the updated sleep_us(microsecs) are provided, showing slightly less accuracy with 1 microsecond sleeps but not so using 2.
Results from my Pi 4 based input frequency monitor are provided.
One output + sleep 13 Outputs + Sleep
Loops micro run cycles over run cycles over
seconds seconds /second heads seconds /second heads
100 100000 20.00 5.0 0.00 20.00 5.0 0.00
1000 10000 20.00 50.0 0.00 20.00 50.0 0.00
10000 1000 20.00 500.0 0.00 20.00 500.0 0.00
100000 100 20.00 5000.0 0.00 20.00 5000.0 0.00
1000000 10 20.00 50000.0 0.00 20.00 50000.0 0.00
10000000 1 20.00 500000.0 0.00 20.00 499999.9 0.00
Pi 400 1800 MHz C
10000 1000 21.20 471.6 1.204 21.22 471.2 1.22
10000000 1 20.03 499255.9 0.030 20.02 499475.6 0.02
Pico Python
1000000 10 138.51 7219.9 118.51 776.34 1288.1 756.34
10000000 1 927.56 10781.0 907.56 7249.31 1379.4 7229.31
C/Python 10 6.9 38.8
C/Python 1 46.4 362.5
One Output No Sleep 13 Outputs No Sleep
Loops micro run cycles microsecs run cycles Total microsecs
seconds seconds /second /loop seconds /second CPS /loop
100 0 0.000 11111111 0.000 1470588 19117647
1000 0 0.000 20408164 0.001 1984127 25793651
10000 0 0.000 20833334 0.005 1984127 25793650 0.500
100000 0 0.005 20833332 0.050 0.050 1984127 25793651 0.500
1000000 0 0.048 20833334 0.048 0.504 1984127 25793651 0.504
10000000 0 0.480 20833334 0.048 5.040 1984127 25793651 0.504
Maximum Mbps 41.66 51.60
Pi 400 1800 MHz
10000000 0 0.314 31823078 0.031 3.876 2579794 33537325 0.388
Pico Python
10000000 0 310.42 32214 31.0 6246.46 1601 20812 624.6
C Pico / Pi 400 0.65 0.77
Pico C / Python 646.72 1239.30
Just Sleep using busy_wait_us(microsecs) using sleep_us(microsecs)
Loops micro run cycles over run cycles over
seconds seconds /second heads seconds /second heads
100 100000 20.000 5.0 0.000 20.000 5.0 0.000
1000 10000 20.000 50.0 0.000 20.000 50.0 0.000
10000 1000 20.000 500.0 0.000 20.000 500.0 0.000
100000 100 20.000 5000.0 0.000 20.000 5000.0 0.000
1000000 10 20.000 50000.0 0.000 20.000 50000.0 0.000
10000000 1 20.000 500000.0 0.000 27.619 362068.9 7.619
5000000 2 20.000 250000.0 0.000
./incount for Raspberry Pi GPIO Frequency
10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF
10.00 Seconds for Cycles Per Second 50.10 ON and 50.00 OFF
10.00 Seconds for Cycles Per Second 500.00 ON and 500.10 OFF
10.00 Seconds for Cycles Per Second 4995.67 ON and 4995.67 OFF
10.00 Seconds for Cycles Per Second 49988.64 ON and 49988.74 OFF
10.00 Seconds for Cycles Per Second 499324.08 ON and 499324.08 OFF
|
USB Current C 13 Outputs 50.0 mA - C Continuous output ON, no sleeps 28.2 mA - C Program inactive Current to ground - on breadboard 32.3 mA - C Continuous On output 16.5 mA - C Continuous On/Off output USB Current MicroPython 13 outputs 19.0 mA - Thonny Python open 35.3 mA - output ON/OFF no delays 20 to 45 mA - output 13 flashing USB Current CPU C Benchmarks - see later 7.9 mA - Waiting to copy uf2 file 19.2 to 20.4 mA - Whetstone 20.2 to 20.3 mA - Dhrystone 19.0 to 20.5 mA - MemSpeed 17.9 mA - Finished
Longer Test - I also ran a two hour continuously, 32.3 mA, C output test, measuring temperatures with an infrared thermometer. At a room temperature of 21°C, maximum Pico board readings increased from 25°C to only 27°C. Meanwhile, the effectively inactive Pi 4 CPU was at 47°C.
The simple diagrams below show which of the pi 4/400 and Pico physical pins are used. As shown later, I included these physical pin numbers in the program pin names to help in understanding the different program structures. The names are allocated to the partner logical pin numbers in the programs, in this case the standard ones for Pico and those required by WiringPi for the Pi computers.
The top three Pico connections to the Pi 4 are for serial I/O to allow program printed output to be displayed in a Pi 4 or Pi 400 Terminal window, following executing the appropriate minicom command.
Pi 4 or Pi 400 Pico Top
_________________
| USB
1 2| Pi 4 10< 1 40
3 4| Pi 4 8< 2 39
5 6| >GROUND PI 4 14< 3 38
7 8| < Pico 2 4 37
9 10| < Pico 1 5 36
11 12| LED< 6 35
Pi 4/400 INPUT> 13 14| < Pico 3 LED< 7 34
LED< 15 16| >LED 8 33
17 18| >LED LED< 9 32
19 20| LED< 10 31
21 22| >INPUT LED< 11 30
23 24| LED< 12 29
25 26| 13 28
27 28| LED< 14 27
1kR< 29 30| LED< 15 26
LED< 31 32| >LED LED< 16 25
LED< 33 34| 1kR< 17 24
LED< 35 36| >LED 18 23 >GROUND
LED< 37 38| >LED LED< 19 22
39 40| >LED LED< 20 21 >Pi 4/400 INPUT
|
| < Pi 400 bottom
|
One Output - Following are two Python and two C program listings for tests driving one output with sleep delays, firstly the Pi 4 versions, followed by those for Pico. These have differing and varying pin allocation and use functions, also variations in timing procedures and, particularly, print formatting. Then, to ease wiring, common pin program names, P4Pin40 and PicoPin20, that are physical pin numbers.
These programs can have temporary modifications, by changing the printed title and either commenting out sleep or output functions for “Output With No Sleeps” or “Sleep Only” tests.
Pi 4 Python Operation - Assuming Thonny Python IDE is installed, clicking on the .py program loads it and can be executed by clicking on the Run button, the output being displayed by the IDE.
Pi 4 C Operation - In the supplied format, the programs require the installation of WiringPi. For compilation and running, normal Terminal commands are used, an example following. For execution, the program failed to run properly on a Pi 400, if the recommended sudo was included.
Pico Python - This requires installation of Raspberry Pi Pico Python SDK and copying the MicroPython UF2 file to the Pico. This is too complicated to explain here, but is easily obtainable on searching Internet. With this UF2 file installed, Thonny Python can be loaded to create or copy a new file, save it on the Pico and run it, with data displayed by Thonny. For opening an existing Python file, a choice is provided to access it from the computer or the MicroPython device.gcc -O3 -o Pi4OneOut Pi4OneOut.c -lwiringPi sudo ./Pi4OneOut
Pico C - Pico SDK installation is required for this. The end process leads to a folder with the C source code files installed, along with CMakeLists.txt, identifying project name and source and destination file names, plus a standard pico_sdk_import.cmake file. Then the following commands are used, from a normal Terminal, to install the required software and compile the program as a UF2 file.
Then, this has to be copied to the Pico, as MicroPython UF2 above, to immediately begin execution. Beforehand, a new Terminal should be opened to start minicom, as shown below, where the output will be displayed. If necessary, following changes, the program can normally be recompiled by just executing the make command and the copy to Pico repeated.mkdir build cd build export PICO_SDK_PATH=../../pico-sdk cmake .. make
13 Outputs - Following the four short program listings are details of the changes that were made to drive thirteen outputs, if anything, to emphasise the different program structures used.minicom -b 115200 -o -D /dev/serial0
Pi4OneOut.py
import time
from gpiozero import LED
from time import sleep
loops = 100
microsecs = 100000
P4Pin40 = LED(21)
print("Python One Output + Sleep\n")
print(" Loops microsecs runsecs cycles/sec")
for m in range(6):
startTime = time.perf_counter()
for i in range(loops):
P4Pin40.on()
sleep(microsecs/1000000)
P4Pin40.off()
sleep(microsecs/1000000)
endTime = time.perf_counter()
runTime = endTime - startTime
cps = loops/runTime
print(f"{loops:10d}{microsecs:10.0f}{runTime:10.3f}{cps:12.1f}")
loops = loops * 10
microsecs = microsecs / 10
print ("End\n")
Pi4OneOut.c
#include "stdio.h"
#include "wiringPi.h"
#include "time.h"
#define P4Pin40 29
int loops = 100;
unsigned int microsecs = 100000;
float cps;
double runSecs = 0;
double startSecs;
double theseSecs;
double endSecs;
struct timespec tp1;
double getSecs()
{
clock_gettime(CLOCK_REALTIME, &tp1);
theseSecs = tp1.tv_sec + tp1.tv_nsec / 1e9;
return theseSecs;
}
int main(int argc, char *argv[])
{
if (wiringPiSetup () == -1)return 1 ;
pinMode (P4Pin40, OUTPUT);
printf("One Output + Sleep\n\n");
printf(" Loops microsecs runsecs cycles/sec\n");
for (int r = 0; r < 6; r++)
{
startSecs = getSecs();
for (int i=0; i < loops; i++)
{
digitalWrite (P4Pin40, 1) ;
delayMicroseconds(microsecs);
digitalWrite (P4Pin40, 0) ;
delayMicroseconds(microsecs);
}
endSecs = getSecs();
runSecs = endSecs - startSecs;
cps = (double)loops / runSecs;
printf("%10d %9ld %9.3f %10.1f \n", loops, microsecs, runSecs, cps);
loops = loops * 10;
microsecs = microsecs / 10;
}
printf(" End\n\n");
return 0;
|
PicoOneOut.py
import time
import utime
loops = 100
microsecs = 100000
PicoPin20 = machine.Pin(15, machine.Pin.OUT)
print(' Pico Python One Output + Sleep')
print(' Loops microsecs runsecs cycles/sec')
for j in range (6):
startTime = utime.ticks_ms()
for i in range(loops):
PicoPin20.value(1)
utime.sleep_us(int(microsecs))
PicoPin20.value(0)
utime.sleep_us(int(microsecs))
endTime = utime.ticks_ms()
runTime = utime.ticks_diff(endTime,startTime)/1000
cps = loops/runTime
print('{:10d} {:9.0f} {:9.2f} {:11.1f}'
.format(loops, microsecs, runTime, cps))
loops = loops * 10
microsecs = microsecs / 10
print ("End")
PicoOneOut.c
#include "stdio.h"
#include "pico/stdlib.h"
#include "hardware/gpio.h"
const uint PicoPin20 = 15;
uint loops = 100;
uint64_t microsecs = 100000;
uint64_t startTime;
uint64_t endTime;
float runSecs;
float cps;
int main()
{
setup_default_uart();
gpio_init(PicoPin20);
gpio_set_dir(PicoPin20, GPIO_OUT);
printf("One Output + Sleep\n\n");
printf("Just Sleep\n\n");
printf(" Loops microsecs runsecs cycles/sec\n");
for (int r = 0; r < 6; r++)
{
startTime = time_us_64 ();
for (uint i = 0; i < loops; i++)
{
gpio_put(PicoPin20, 1);
busy_wait_us(microsecs);
gpio_put(PicoPin20, 0);
busy_wait_us(microsecs);
}
endTime = time_us_64 ();
runSecs = (float)(endTime - startTime) / 1000000.0;
cps = (float)loops / runSecs;
printf("%10d %9ld %9.3f %10.1f \n", loops, microsecs, runSecs, cps);
loops = loops * 10;
microsecs = microsecs / 10;
}
printf(" End\n\n");
|
Pi4ThirteenOut.py
P4Pin40 = LED(21)
P4Pin38 = LED(20)
P4Pin36 = LED(16)
P4Pin32 = LED(12)
P4Pin37 = LED(26)
P4Pin35 = LED(19)
P4Pin33 = LED(13)
P4Pin31 = LED(6)
P4Pin29 = LED(5)
P4Pin22 = LED(25)
P4Pin18 = LED(24)
P4Pin16 = LED(23)
P4Pin15 = LED(22)
P4Pin40.on()
P4Pin38.on()
P4Pin36.on()
P4Pin32.on()
P4Pin37.on()
P4Pin35.on()
P4Pin33.on()
P4Pin31.on()
P4Pin29.on()
P4Pin22.on()
P4Pin18.on()
P4Pin16.on()
P4Pin15.on()
P4Pin40.off()
P4Pin38.off()
P4Pin36.off()
P4Pin32.off()
P4Pin37.off()
P4Pin35.off()
P4Pin33.off()
P4Pin31.off()
P4Pin29.off()
P4Pin22.off()
P4Pin18.off()
P4Pin16.off()
P4Pin15.off()
|
Pi4ThirteenOut.c
#define P4Pin40 29
#define P4Pin38 28
#define P4Pin36 27
#define P4Pin32 26
#define P4Pin37 25
#define P4Pin35 24
#define P4Pin33 23
#define P4Pin31 22
#define P4Pin29 21
#define P4Pin22 6
#define P4Pin18 5
#define P4Pin16 4
#define P4Pin15 3
pinMode (P4Pin40, OUTPUT);
pinMode (P4Pin38, OUTPUT);
pinMode (P4Pin36, OUTPUT);
pinMode (P4Pin32, OUTPUT);
pinMode (P4Pin37, OUTPUT);
pinMode (P4Pin35, OUTPUT);
pinMode (P4Pin33, OUTPUT);
pinMode (P4Pin31, OUTPUT);
pinMode (P4Pin29, OUTPUT);
pinMode (P4Pin22, OUTPUT);
pinMode (P4Pin18, OUTPUT);
pinMode (P4Pin16, OUTPUT);
pinMode (P4Pin15, OUTPUT);
digitalWrite (P4Pin40, 1);
digitalWrite (P4Pin38, 1);
digitalWrite (P4Pin36, 1);
digitalWrite (P4Pin32, 1);
digitalWrite (P4Pin37, 1);
digitalWrite (P4Pin35, 1);
digitalWrite (P4Pin33, 1);
digitalWrite (P4Pin31, 1);
digitalWrite (P4Pin29, 1);
digitalWrite (P4Pin22, 1);
digitalWrite (P4Pin18, 1);
digitalWrite (P4Pin16, 1);
digitalWrite (P4Pin15, 1);
digitalWrite (P4Pin40, 0);
digitalWrite (P4Pin38, 0);
digitalWrite (P4Pin36, 0);
digitalWrite (P4Pin32, 0);
digitalWrite (P4Pin37, 0);
digitalWrite (P4Pin35, 0);
digitalWrite (P4Pin33, 0);
digitalWrite (P4Pin31, 0);
digitalWrite (P4Pin29, 0);
digitalWrite (P4Pin22, 0);
digitalWrite (P4Pin18, 0);
digitalWrite (P4Pin16, 0);
digitalWrite (P4Pin15, 0);
delayMicroseconds(microsecs);
|
PicoThirteenOut.py
Starts below
PicoPin20 = machine.Pin(15, machine.Pin.OUT)
PicoPin19 = machine.Pin(14, machine.Pin.OUT)
PicoPin17 = machine.Pin(13, machine.Pin.OUT)
PicoPin16 = machine.Pin(12, machine.Pin.OUT)
PicoPin15 = machine.Pin(11, machine.Pin.OUT)
PicoPin14 = machine.Pin(10, machine.Pin.OUT)
PicoPin12 = machine.Pin(9, machine.Pin.OUT)
PicoPin11 = machine.Pin(8, machine.Pin.OUT)
PicoPin10 = machine.Pin(7, machine.Pin.OUT)
PicoPin9 = machine.Pin(6, machine.Pin.OUT)
PicoPin7 = machine.Pin(5, machine.Pin.OUT)
PicoPin6 = machine.Pin(4, machine.Pin.OUT)
PicoPin21 = machine.Pin(16, machine.Pin.OUT)
PicoPin20.value(1)
PicoPin19.value(1)
PicoPin17.value(1)
PicoPin16.value(1)
PicoPin15.value(1)
PicoPin14.value(1)
PicoPin12.value(1)
PicoPin11.value(1)
PicoPin10.value(1)
PicoPin9.value(1)
PicoPin7.value(1)
PicoPin6.value(1)
PicoPin21.value(1)
PicoPin20.value(0)
PicoPin19.value(0)
PicoPin17.value(0)
PicoPin16.value(0)
PicoPin15.value(0)
PicoPin14.value(0)
PicoPin12.value(0)
PicoPin11.value(0)
PicoPin10.value(0)
PicoPin9.value(0)
PicoPin7.value(0)
PicoPin6.value(0)
PicoPin21.value(0)
|
PicoThirteenOut.c
const uint PicoPin20 = 15;
const uint PicoPin19 = 14;
const uint PicoPin17 = 13;
const uint PicoPin16 = 12;
const uint PicoPin15 = 11;
const uint PicoPin14 = 10;
const uint PicoPin12 = 9;
const uint PicoPin11 = 8;
const uint PicoPin10 = 7;
const uint PicoPin9 = 6;
const uint PicoPin7 = 5;
const uint PicoPin6 = 4;
const uint PicoPin21 = 16;
gpio_init(PicoPin20);
gpio_init(PicoPin19);
gpio_init(PicoPin17);
gpio_init(PicoPin16);
gpio_init(PicoPin15);
gpio_init(PicoPin14);
gpio_init(PicoPin12);
gpio_init(PicoPin11);
gpio_init(PicoPin10);
gpio_init(PicoPin9);
gpio_init(PicoPin7);
gpio_init(PicoPin6);
gpio_init(PicoPin21);
gpio_set_dir(PicoPin20, GPIO_OUT);
gpio_set_dir(PicoPin19, GPIO_OUT);
gpio_set_dir(PicoPin17, GPIO_OUT);
gpio_set_dir(PicoPin16, GPIO_OUT);
gpio_set_dir(PicoPin15, GPIO_OUT);
gpio_set_dir(PicoPin14, GPIO_OUT);
gpio_set_dir(PicoPin12, GPIO_OUT);
gpio_set_dir(PicoPin11, GPIO_OUT);
gpio_set_dir(PicoPin10, GPIO_OUT);
gpio_set_dir(PicoPin9, GPIO_OUT);
gpio_set_dir(PicoPin7, GPIO_OUT);
gpio_set_dir(PicoPin6, GPIO_OUT);
gpio_set_dir(PicoPin21, GPIO_OUT);
gpio_put(PicoPin20, 1);
gpio_put(PicoPin19, 1);
gpio_put(PicoPin17, 1);
gpio_put(PicoPin16, 1);
gpio_put(PicoPin15, 1);
gpio_put(PicoPin14, 1);
gpio_put(PicoPin12, 1);
gpio_put(PicoPin11, 1);
gpio_put(PicoPin10, 1);
gpio_put(PicoPin9, 1);
gpio_put(PicoPin7, 1);
gpio_put(PicoPin6, 1);
gpio_put(PicoPin21, 1);
gpio_put(PicoPin20, 0);
gpio_put(PicoPin19, 0);
gpio_put(PicoPin17, 0);
gpio_put(PicoPin16, 0);
gpio_put(PicoPin15, 0);
gpio_put(PicoPin14, 0);
gpio_put(PicoPin12, 0);
gpio_put(PicoPin11, 0);
gpio_put(PicoPin10, 0);
gpio_put(PicoPin9, 0);
gpio_put(PicoPin7, 0);
gpio_put(PicoPin6, 0);
gpio_put(PicoPin21, 0);
|
/*
gcc -O3 -o incount incount.c -lwiringPi
sudo ./incount
*/
#include "stdio.h"
#include "wiringPi.h"
#include "time.h"
#define P4Pin13 2 // WiringPi pin address
double startSecs;
double theseSecs;
struct timespec tp1;
double minTime = 10.0;
double getSecs()
{
clock_gettime(CLOCK_REALTIME, &tp1);
theseSecs = tp1.tv_sec + tp1.tv_nsec / 1e9;
return theseSecs;
}
int main (void)
{
int i;
double count1 = 1;
double count2 = 1;
double cycles1 = 0;
double cycles0 = 0;
double runTime = 0;
printf ("Raspberry Pi GPIO Frequency\n");
if (wiringPiSetup () == -1) return 1;
pinMode (P4Pin13, INPUT);
startSecs = getSecs();
while (runTime < minTime)
{
for (i=0; i < 1000; i++)
{
if (digitalRead(P4Pin13))
{
if (count1 == 1)
{
cycles1 = cycles1 + 1;
count1 = 0;
count2 = 1;
}
}
else
{
if (count2 == 1)
{
cycles0 = cycles0 + 1;
count1 = 1;
count2 = 0;
}
}
}
runTime = getSecs() - startSecs;
}
if (cycles1 == 0)
{
printf (" No cycles recorded\n");
}
else
{
printf (" %6.2f Seconds for Cycles Per Second "
"%.2f ON and %.2f OFF\n", runTime, cycles1/runTime, cycles0/runTime);
}
return 0;
}
|
Pi 4 performance
0.0 ARM MHz=1500, core volt=0.8625V, CPU temp=56.0'C, pmic temp=51.4'C
Pico
ž13 Outputs + Sleep using busy_wait_us(microsecs)
Loops microsecs runsecs cycles/sec
100 100000 20.000 5.0
1000 10000 20.000 50.0
10000 1000 20.000 500.0
100000 100 20.000 5000.0
1000000 10 20.000 50000.0
10000000 1 20.000 499999.7
End
PI 4
pi@raspberrypi:~/picoME/picoc $ ./incount
10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF
10.00 Seconds for Cycles Per Second 50.00 ON and 50.10 OFF
10.00 Seconds for Cycles Per Second 500.10 ON and 500.00 OFF
10.00 Seconds for Cycles Per Second 4999.78 ON and 4999.78 OFF
10.00 Seconds for Cycles Per Second 49997.47 ON and 49997.37 OFF
10.00 Seconds for Cycles Per Second 499560.23 ON and 499560.23 OFF
pi 4 powersave
0.0 ARM MHz= 600, core volt=0.8625V, CPU temp=54.5'C, pmic temp=51.4'C
Pico
13 Outputs + Sleep using busy_wait_us(microsecs)
Loops microsecs runsecs cycles/sec
100 100000 20.000 5.0
1000 10000 20.000 50.0
10000 1000 20.000 500.0
100000 100 20.000 5000.0
1000000 10 20.000 50000.0
10000000 1 20.000 499999.6
End
pi 4
pi@raspberrypi:~/picoME/picoc $ ./incount
10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF
10.00 Seconds for Cycles Per Second 50.10 ON and 50.00 OFF
10.00 Seconds for Cycles Per Second 500.09 ON and 499.99 OFF
10.00 Seconds for Cycles Per Second 4999.18 ON and 4999.18 OFF
10.00 Seconds for Cycles Per Second 49894.83 ON and 49894.83 OFF
10.00 Seconds for Cycles Per Second 496711.77 ON and 496711.87 OFF
|
The execution times of the benchmark programs are calibrated to run for an approximate reasonable finite time, that are 10 seconds for Whetstone and Dhrystone and a minimum of 0.1 seconds for individual MemSpeed tests.
The benchmarks were run on the Pico CPU, that operates at 125 MHz, and a 1500 MHz Raspberry Pi 4B, twelve times faster. Then the Pi 4 measured 244 times faster with Whetstone, influenced by lack of floating point hardware in the Pico, 38 times faster with Dhrystone and significantly higher using MemSpeed. Performance is often quoted on a per MHz basis, where Pico comes out badly. A complete contrast was apparent running the bit banging type tests.
During the earlier tests, simply measuring pin output speeds, a Pi 400 was found to be capable of transferring a maximum of 67.1 Mega bits/second (Mbps), with a single CPU core running at 100% utilisation. That could be rated as 0.037 Bit Bangs per MHz (BB/MHz). The Pico achieved 51.6 Mbps or 0.41 BB/MHz, more than eleven times more efficient, clearly not dependent on CPU MHz.
PicoBenchmarks.zip,
that contains C source codes and .uf2 Pico execution programs, along with CMakeLists.txt file, needed for compilation, plus example Pico results.
Note the difference in numerical results, between Pico and Pi 4 tests. However, the Pico numbers are of the right precision for 32 bit floating point numbers, and rounded from those from Pi 4 output. The differences might be due to processor hardware variations.
The Pi 4 produced an impossible huge MOPS score for the IF test, caused by compiler optimisation (like we only need to execute the test loop once). The time for this, when running as intended, is inevitably so short that it has no real influence on the MWIPS rating.
Pico 125 MHz
##########################################
Single Precision C Whetstone Benchmark
Calibrate
1.20 Seconds 1 Passes (x 100)
5.99 Seconds 5 Passes (x 100)
Use 8 passes (x 100)
Single Precision C/C++ Whetstone Benchmark
Loop content Result MFLOPS MOPS Seconds
N1 floating point -1.12475013700000000 1.493 0.103
N2 floating point -1.12274742100000000 1.495 0.719
N3 if then else 1.00000000000000000 93.729 0.009
N4 fixed point 12.00000000000000000 5.716 0.441
N5 sin,cos etc. 0.49911010300000000 0.160 4.171
N6 floating point 0.99999982100000000 1.531 2.819
N7 assignments 3.00000000000000000 53.567 0.028
N8 exp,sqrt etc. 0.75110864600000000 0.228 1.306
MWIPS 8.338 9.595
Pi 4B 1500 MHz
##########################################
Single Precision C/C++ Whetstone Benchmark
Loop content Result MFLOPS MOPS Seconds
N1 floating point -1.12475013732910156 524.661 0.074
N2 floating point -1.12274742126464844 533.855 0.511
N3 if then else 1.00000000000000000 N/A 0.000
N4 fixed point 12.00000000000000000 2497.509 0.256
N5 sin,cos etc. 0.49911010265350342 55.124 3.065
N6 floating point 0.99999982118606567 387.309 2.829
N7 assignments 3.00000000000000000 998.853 0.376
N8 exp,sqrt etc. 0.75110864639282227 26.174 2.887
MWIPS 2031.394 9.998
|
Pico 125 MHz
##########################################
Dhrystone Benchmark, Version 2.1 (Language: C or C++)
Register option not selected
10000 runs 0.04 seconds
100000 runs 0.40 seconds
200000 runs 0.80 seconds
400000 runs 1.60 seconds
800000 runs 3.20 seconds
Final values (* implementation-dependent):
Int_Glob: O.K. 5 Bool_Glob: O.K. 1
Ch_1_Glob: O.K. A Ch_2_Glob: O.K. B
Arr_1_Glob[8]: O.K. 7 Arr_2_Glob8/7: O.K. 800010
Ptr_Glob-> Ptr_Comp: * 536884992
Discr: O.K. 0 Enum_Comp: O.K. 2
Int_Comp: O.K. 17 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME G
Next_Ptr_Glob-> Ptr_Comp: * 536884992 same as above
Discr: O.K. 0 Enum_Comp: O.K. 1
Int_Comp: O.K. 18 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME G
Int_1_Loc: O.K. 5 Int_2_Loc: O.K. 13
Int_3_Loc: O.K. 7 Enum_Loc: O.K. 1
Str_1_Loc: O.K. DHRYSTONE PROGRAM, 1'ST G
Str_2_Loc: O.K. DHRYSTONE PROGRAM, 2'ND G
Nanoseconds one Dhrystone run: 4000.00
Dhrystones per Second: 250000
VAX MIPS rating = 142.29
Pi 4B 1500 MHz
##########################################
Nanoseconds one Dhrystone run: 105.46
Dhrystones per Second: 9482703
VAX MIPS rating = 5397.10
|
The Pico is said to have 264 KB RAM? For the benchmark K is 1024, where 256 KB is 262.144 decimal KB. The program had to be run with a maximum of two times 64 KB to fit.
Directly comparing these Pico and Pi 4 results is not really appropriate, the Pi 4 making use of advanced SIMD vector instructions, to say the least. Looking at those slow floating point speeds, 6 MBytes/second equates to 48 Mbits/second and 97 MBps integer operations to 776 Mbps, much greater that the Bit Banging capabilities for the types of operation considered in this report.
Pico 125 MHz
##########################################
Memory Reading Speed Test Pico
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
8 6 6 97 18 11 88 107 95 95
16 6 6 97 18 11 88 108 95 95
32 6 6 97 18 11 88 108 95 95
64 6 6 97 18 11 88 108 95 95
128 6 6 97 18 11 88 108 95 95
End of test
Pi 4B 1500 MHz
##########################################
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
8 11761 8660 11894 11787 9516 11889 10318 5225 7796
16 11874 8690 11921 11886 9552 11919 10479 5118 7892
32 10592 8195 10732 10719 8832 10728 8853 4468 7360
64 10093 8361 10407 9996 9082 10400 8704 4632 7541
128 9997 8521 10535 9948 9309 10529 8143 4750 7491
256 9987 8536 10569 9956 9320 10568 7990 4928 7644
512 9124 8336 10168 9321 9085 10215 7992 4929 7681
1024 3736 6332 6594 3696 6424 6717 5179 3849 4296
|