Raspberry Pi Pico, Pi 4 and Pi 400 Python and C Basic Beginners Bit Banging BenchmarksRoy Longbottom
These programs were compiled using gcc using WiringPi GPIO access library, where it is recommended that execution should use sudo access, but the programs would only execute on the Pi 400 without sudo.
|
One output + sleep 13 Outputs + Sleep Loops micro run cycles over run cycles Total over seconds seconds /second heads seconds /second CPS heads Pi 4 1500 MHz C 100 100000 20.013 5.0 0.013 20.013 5.0 65 0.013 1000 10000 20.125 49.7 0.125 20.127 49.7 646 0.127 10000 1000 21.248 470.6 1.248 21.255 470.5 6117 1.255 100000 100 32.273 3098.6 12.273 32.327 3093.4 40214 12.327 1000000 10 20.010 49975.9 0.010 20.299 49264.4 640437 0.299 10000000 1 20.021 499474.9 0.021 29.996 333383.2 4333982 9.996 Pi 400 1800 MHz C 100 100000 20.012 5.0 0.012 20.016 5.0 65 0.016 1000 10000 20.122 49.7 0.122 20.125 49.7 646 0.125 10000 1000 21.204 471.6 1.204 21.224 471.2 6126 1.224 100000 100 31.940 3130.8 11.940 32.009 3124.1 40613 12.009 1000000 10 20.031 49922.0 0.031 20.008 49980.9 649752 0.008 10000000 1 20.030 499255.9 0.030 20.021 499475.6 6493183 0.021 Pi 400 600 MHz C 100 100000 20.02 5.0 0.020 20.018 5.0 65 0.018 1000 10000 20.172 49.6 0.172 20.175 49.6 645 0.175 10000 1000 21.638 462.1 1.638 21.631 462.3 6010 1.631 100000 100 36.325 2752.9 16.325 36.232 2760.0 35880 16.232 1000000 10 22.426 44591.6 2.426 23.678 42233.1 549030 3.678 10000000 1 43.946 227552.0 23.946 55.172 181252.9 2356288 35.172 Weird Results - Pi 400 1800 MHz, One output + sleep Loops microsecs runsecs cycles/sec real user sys 1000 9938 20.000 50.0 0m20.005s 0m0.024s 0m0.000s 10000 1000 21.222 471.2 0m21.226s 0m0.029s 0m0.172s 10000 939 19.999 500.0 0m20.003s 0m0.005s 0m0.198s 100000 100 32.111 3114.2 0m32.115s 0m0.109s 0m1.789s 100000 99 19.802 5050.0 0m19.806s 0m6.349s 0m13.457s 1000000 10 20.002 49994.5 0m20.006s 0m5.239s 0m14.767s Pi 400 1800 MHz Maximum Speeds 10000000 Loops One output + sleep 13 Outputs + Sleep Program micro run cycles over run cycles Total over seconds seconds /second heads seconds /second CPS heads Python 1 1520.876 6575.2 1500.876 3428.312 2916.9 37908 3408.352 C 1 20.030 499255.9 0.030 20.021 499475.6 6493183 0.021 C / Python Gain 75 171 |
No Sleeps - The sub-microsecond speeds of these output control operations were shown to be up to around 560 times faster than from the Python versions but, of course, they do not represent arithmetic calculation speeds. Bit banging speed was indicated as up to 67.1 Million bits per second.
Monitor Confirmation - A longer running 13 outputs, no sleep version was compiled to check with monitoring options on the Pi 400, as shown below. Using the time function confirmed the running time and indicated 100% CPU utilisation. Then my input speed monitoring program was run to confirm performance of around 2.58 million cycles per second from the input connection, also indicating 67 million bits per second overall, from 13 outputs.
Pi 400 1800 Versus 600 MHz - confirmed that performance was proportional to CPU MHz.
Sleep Only Tests - As implied by the sub-microsecond output speeds, indicated above, these results were almost identical to those from running the tests with sleeping, with the same weird running times.
Speed gains over Python were not as high as the no sleep tests, due to the inclusion of a constant 20 seconds sleeping times.
C One Output no Sleep C 13 Outputs No Sleep Loops micro run cycles microsecs run cycles total microsecs seconds seconds /second /loop seconds /second CPS /loop Pi 400 1800 MHz 100 0 0.000 20971520 0.000 2452809 31886520 1000 0 0.000 30840470 0.000 2584291 33595780 10000 0 0.000 28747800 0.004 2568151 33385960 0.400 100000 0 0.003 31581236 0.030 0.039 2547669 33119697 0.390 1000000 0 0.031 31863136 0.031 0.388 2578077 33514998 0.388 10000000 0 0.314 31823078 0.031 3.876 2579794 33537325 0.388 C 10M Mbps 63.6 67.1 Python 10M Mbps 0.12 0.12 C/Python 530 559 Pi 400 600 MHz 100 0 0.000 7231559 0.000 830555 10797218 1000 0 0.000 10618491 0.001 858257 11157346 10000 0 0.001 11087243 0.012 846308 11002008 1.200 100000 0 0.009 10648415 0.090 0.117 855015 11115200 1.170 1000000 0 0.097 10357508 0.097 1.165 858464 11160027 1.165 10000000 0 0.943 10606199 0.094 11.650 858359 11158668 1.165 C 10M Mbps 21.2 22.3 Python 10M Mbps 0.040 0.042 C/Python 530 531 ---------------------------------------------------------------------------------- No Sleep Time Monitoring 13 Outputs No Sleep - around 67 Mbps Loops microsecs runsecs cycles/sec Time Results 100000000 0 38.703 2583751.0 real 0m38.708s user 0m38.705s sys 0m0.000s No Sleep Speed Monitoring 13 Outputs No Sleep - around 67 Mbps Loops microsecs runsecs cycles/sec ./incount cycles per second 100000000 0 38.782 2578491.8 2564947.90 ON and 2564948.00 OFF 100000000 0 38.844 2574369.0 2565509.59 ON and 2565509.59 OFF 100000000 0 38.780 2578670.5 2566093.49 ON and 2566093.49 OFF ---------------------------------------------------------------------------------- C Just sleep Pi 400 1800 MHz Pi 400 600 MHz Loops micro run cycles over run cycles total over seconds seconds /second heads seconds /second CPS heads 100 100000 20.012 5.0 0.012 20.017 5.0 65 0.017 1000 10000 20.125 49.7 0.125 20.171 49.6 645 0.171 10000 1000 21.222 471.2 1.222 21.651 461.9 6005 1.651 100000 100 32.103 3114.9 12.103 36.302 2754.7 35811 16.302 1000000 10 20.003 49993.0 0.003 22.249 44945.1 584286 2.249 10000000 1 20.022 499459.8 0.022 42.535 235100.6 3056308 22.535 Maximum Speeds 10000000 Loops Just sleep Python 1 1280.522 7809.3 1260.522 1776.560 5628.9 73176 1756.560 C 1 20.022 499459.8 0.022 42.535 235100.6 3056308 22.535 C/Python 64.0 64.0 57296 41.8 41.8 41.8 77.9 |
Output With No Sleeps - For the larger loop counts, running time is normally sufficient to produce consistent performance. Python results for Pi 400, running at both 1800 and 600 MHz, are provided, where performance differences were nearly proportional to CPU MHz and running time for 13 outputs around 13 times longer than with one output.
Pico performance relationships were somewhat different, certainly not proportional to CPU MHz, said to be 125 MHz. Performance with one output was equivalent to that of a Pi 400 running at around 1200 MHz, then similar to that of a Pi 400 at 600 MHz driving 13 outputs. For this, the running time for 13 outputs was just over 20 times longer than for one output. An additional test was run, using 8 outputs, where the eight to one increase was around 13 times.
Sleep Only Tests - Results for Pico and 1800 MHz Pi 400 are provided. Because of the overheads, both varied from, what might be expected, cycles per second, the Pico timer suffering less and increasing apparent higher throughput.
Loops micro run cycles over run cycles over seconds seconds /second heads seconds /second heads Pico one output + sleep Pico 13 outputs + sleep 100 100000 20.01 5.0 0.01 20.07 5.0 0.07 1000 10000 20.12 49.7 0.12 20.76 48.2 0.76 10000 1000 21.19 472.0 1.19 27.56 362.8 7.56 100000 100 31.85 3139.5 11.85 95.63 1045.7 75.63 1000000 10 138.51 7219.9 118.51 776.34 1288.1 756.34 10000000 1 927.56 10781.0 907.56 7249.31 1379.4 7229.31 Pi 400 1800 MHz GPIO Python 10000 1000 21.51 464.8 1.51 23.451 426.4 3.45 10000000 1 1520.88 6575.2 1500.88 3428.312 2916.9 3408.31 Pico one output no sleeps Pico 13 outputs no sleeps Loops micro run cycles microsecs run cycles Total microsecs seconds seconds /second /Loop seconds /second CPS /Loop 100 0 0.00 25000 0.0 0.05 1887 24528 500.0 1000 0 0.03 32258 30.0 0.63 1600 20800 630.0 10000 0 0.31 32154 31.0 6.25 1601 20810 625.0 100000 0 3.11 32206 31.1 62.47 1601 20812 624.7 1000000 0 31.04 32213 31.0 624.65 1601 20812 624.7 10000000 0 310.42 32214 31.0 6246.46 1601 20812 624.6 10000000 0 8 outputs 4133.32 2419 19352 413.3 Pi 400 1800 and 600 MHz MHz 10000000 1800 167.93 59549 16.8 2210.00 4525 58824 221.0 10000000 600 500.35 19986 50.0 6160.18 1623 21103 616.0 Ratios Pi 400 MHz 3.0 2.98 2.79 Pico/Pi 1800 0.54 0.35 Pico/Pi 600 1.61 0.99 Pico Python Sleep only Pi 400 1800 MHz Python Expected Loops micro run cycles over run cycles over cycles seconds seconds /second heads seconds /second heads /second 100 100000 20.01 5.0 0.01 20.03 5.0 0.03 5.0 1000 10000 20.07 49.8 0.07 20.13 49.7 0.13 50.0 10000 1000 20.67 483.7 0.67 21.26 470.3 1.26 500.0 100000 100 26.75 3738.3 6.75 32.49 3077.7 12.49 5000.0 1000000 10 87.38 11444.3 67.38 144.32 6929.3 124.32 50000.0 10000000 1 677.01 14770.8 657.01 1280.52 7809.3 1260.52 500000.0 |
As indicated earlier, for Pi 400 tests, the sleep timer produced weird timing variations at mid point but came good with the shorter delays. The results indicate that maximum performance, in the range down to one microsecond, were effectively the same from a Pi 400 GPIO and a Pico. For these tests, with sleeping, the Pico C compilations were up to 362.5 times faster than those from Python.
Output With No Sleeps - As indicated earlier, these represent maximum data transfer speeds, where cycles per second can be converted to Mega bits per second, in this case, with Pico C achieving up to 51.6 Mbps. Comparisons for this area show that the Pico performed at up to 77% of a 1800 MHz Pi 400, equivalent to a Pi 4 at 1386 MHz. Then, The C version was up to 1239.3 times faster than the Python variety.
Sleep Only Tests - With the same maximum performance, in all areas, being the same as the full tests, using busy_wait_us(microsecs), new comparisons are unnecessary. Results using the updated sleep_us(microsecs) are provided, showing slightly less accuracy with 1 microsecond sleeps but not so using 2.
Results from my Pi 4 based input frequency monitor are provided.
One output + sleep 13 Outputs + Sleep Loops micro run cycles over run cycles over seconds seconds /second heads seconds /second heads 100 100000 20.00 5.0 0.00 20.00 5.0 0.00 1000 10000 20.00 50.0 0.00 20.00 50.0 0.00 10000 1000 20.00 500.0 0.00 20.00 500.0 0.00 100000 100 20.00 5000.0 0.00 20.00 5000.0 0.00 1000000 10 20.00 50000.0 0.00 20.00 50000.0 0.00 10000000 1 20.00 500000.0 0.00 20.00 499999.9 0.00 Pi 400 1800 MHz C 10000 1000 21.20 471.6 1.204 21.22 471.2 1.22 10000000 1 20.03 499255.9 0.030 20.02 499475.6 0.02 Pico Python 1000000 10 138.51 7219.9 118.51 776.34 1288.1 756.34 10000000 1 927.56 10781.0 907.56 7249.31 1379.4 7229.31 C/Python 10 6.9 38.8 C/Python 1 46.4 362.5 One Output No Sleep 13 Outputs No Sleep Loops micro run cycles microsecs run cycles Total microsecs seconds seconds /second /loop seconds /second CPS /loop 100 0 0.000 11111111 0.000 1470588 19117647 1000 0 0.000 20408164 0.001 1984127 25793651 10000 0 0.000 20833334 0.005 1984127 25793650 0.500 100000 0 0.005 20833332 0.050 0.050 1984127 25793651 0.500 1000000 0 0.048 20833334 0.048 0.504 1984127 25793651 0.504 10000000 0 0.480 20833334 0.048 5.040 1984127 25793651 0.504 Maximum Mbps 41.66 51.60 Pi 400 1800 MHz 10000000 0 0.314 31823078 0.031 3.876 2579794 33537325 0.388 Pico Python 10000000 0 310.42 32214 31.0 6246.46 1601 20812 624.6 C Pico / Pi 400 0.65 0.77 Pico C / Python 646.72 1239.30 Just Sleep using busy_wait_us(microsecs) using sleep_us(microsecs) Loops micro run cycles over run cycles over seconds seconds /second heads seconds /second heads 100 100000 20.000 5.0 0.000 20.000 5.0 0.000 1000 10000 20.000 50.0 0.000 20.000 50.0 0.000 10000 1000 20.000 500.0 0.000 20.000 500.0 0.000 100000 100 20.000 5000.0 0.000 20.000 5000.0 0.000 1000000 10 20.000 50000.0 0.000 20.000 50000.0 0.000 10000000 1 20.000 500000.0 0.000 27.619 362068.9 7.619 5000000 2 20.000 250000.0 0.000 ./incount for Raspberry Pi GPIO Frequency 10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF 10.00 Seconds for Cycles Per Second 50.10 ON and 50.00 OFF 10.00 Seconds for Cycles Per Second 500.00 ON and 500.10 OFF 10.00 Seconds for Cycles Per Second 4995.67 ON and 4995.67 OFF 10.00 Seconds for Cycles Per Second 49988.64 ON and 49988.74 OFF 10.00 Seconds for Cycles Per Second 499324.08 ON and 499324.08 OFF |
USB Current C 13 Outputs 50.0 mA - C Continuous output ON, no sleeps 28.2 mA - C Program inactive Current to ground - on breadboard 32.3 mA - C Continuous On output 16.5 mA - C Continuous On/Off output USB Current MicroPython 13 outputs 19.0 mA - Thonny Python open 35.3 mA - output ON/OFF no delays 20 to 45 mA - output 13 flashing USB Current CPU C Benchmarks - see later 7.9 mA - Waiting to copy uf2 file 19.2 to 20.4 mA - Whetstone 20.2 to 20.3 mA - Dhrystone 19.0 to 20.5 mA - MemSpeed 17.9 mA - Finished
Longer Test - I also ran a two hour continuously, 32.3 mA, C output test, measuring temperatures with an infrared thermometer. At a room temperature of 21°C, maximum Pico board readings increased from 25°C to only 27°C. Meanwhile, the effectively inactive Pi 4 CPU was at 47°C.
The simple diagrams below show which of the pi 4/400 and Pico physical pins are used. As shown later, I included these physical pin numbers in the program pin names to help in understanding the different program structures. The names are allocated to the partner logical pin numbers in the programs, in this case the standard ones for Pico and those required by WiringPi for the Pi computers.
The top three Pico connections to the Pi 4 are for serial I/O to allow program printed output to be displayed in a Pi 4 or Pi 400 Terminal window, following executing the appropriate minicom command.
Pi 4 or Pi 400 Pico Top _________________ | USB 1 2| Pi 4 10< 1 40 3 4| Pi 4 8< 2 39 5 6| >GROUND PI 4 14< 3 38 7 8| < Pico 2 4 37 9 10| < Pico 1 5 36 11 12| LED< 6 35 Pi 4/400 INPUT> 13 14| < Pico 3 LED< 7 34 LED< 15 16| >LED 8 33 17 18| >LED LED< 9 32 19 20| LED< 10 31 21 22| >INPUT LED< 11 30 23 24| LED< 12 29 25 26| 13 28 27 28| LED< 14 27 1kR< 29 30| LED< 15 26 LED< 31 32| >LED LED< 16 25 LED< 33 34| 1kR< 17 24 LED< 35 36| >LED 18 23 >GROUND LED< 37 38| >LED LED< 19 22 39 40| >LED LED< 20 21 >Pi 4/400 INPUT | | < Pi 400 bottom |
One Output - Following are two Python and two C program listings for tests driving one output with sleep delays, firstly the Pi 4 versions, followed by those for Pico. These have differing and varying pin allocation and use functions, also variations in timing procedures and, particularly, print formatting. Then, to ease wiring, common pin program names, P4Pin40 and PicoPin20, that are physical pin numbers.
These programs can have temporary modifications, by changing the printed title and either commenting out sleep or output functions for “Output With No Sleeps” or “Sleep Only” tests.
Pi 4 Python Operation - Assuming Thonny Python IDE is installed, clicking on the .py program loads it and can be executed by clicking on the Run button, the output being displayed by the IDE.
Pi 4 C Operation - In the supplied format, the programs require the installation of WiringPi. For compilation and running, normal Terminal commands are used, an example following. For execution, the program failed to run properly on a Pi 400, if the recommended sudo was included.
Pico Python - This requires installation of Raspberry Pi Pico Python SDK and copying the MicroPython UF2 file to the Pico. This is too complicated to explain here, but is easily obtainable on searching Internet. With this UF2 file installed, Thonny Python can be loaded to create or copy a new file, save it on the Pico and run it, with data displayed by Thonny. For opening an existing Python file, a choice is provided to access it from the computer or the MicroPython device.gcc -O3 -o Pi4OneOut Pi4OneOut.c -lwiringPi sudo ./Pi4OneOut
Pico C - Pico SDK installation is required for this. The end process leads to a folder with the C source code files installed, along with CMakeLists.txt, identifying project name and source and destination file names, plus a standard pico_sdk_import.cmake file. Then the following commands are used, from a normal Terminal, to install the required software and compile the program as a UF2 file.
Then, this has to be copied to the Pico, as MicroPython UF2 above, to immediately begin execution. Beforehand, a new Terminal should be opened to start minicom, as shown below, where the output will be displayed. If necessary, following changes, the program can normally be recompiled by just executing the make command and the copy to Pico repeated.mkdir build cd build export PICO_SDK_PATH=../../pico-sdk cmake .. make
13 Outputs - Following the four short program listings are details of the changes that were made to drive thirteen outputs, if anything, to emphasise the different program structures used.minicom -b 115200 -o -D /dev/serial0
Pi4OneOut.py import time from gpiozero import LED from time import sleep loops = 100 microsecs = 100000 P4Pin40 = LED(21) print("Python One Output + Sleep\n") print(" Loops microsecs runsecs cycles/sec") for m in range(6): startTime = time.perf_counter() for i in range(loops): P4Pin40.on() sleep(microsecs/1000000) P4Pin40.off() sleep(microsecs/1000000) endTime = time.perf_counter() runTime = endTime - startTime cps = loops/runTime print(f"{loops:10d}{microsecs:10.0f}{runTime:10.3f}{cps:12.1f}") loops = loops * 10 microsecs = microsecs / 10 print ("End\n") Pi4OneOut.c #include "stdio.h" #include "wiringPi.h" #include "time.h" #define P4Pin40 29 int loops = 100; unsigned int microsecs = 100000; float cps; double runSecs = 0; double startSecs; double theseSecs; double endSecs; struct timespec tp1; double getSecs() { clock_gettime(CLOCK_REALTIME, &tp1); theseSecs = tp1.tv_sec + tp1.tv_nsec / 1e9; return theseSecs; } int main(int argc, char *argv[]) { if (wiringPiSetup () == -1)return 1 ; pinMode (P4Pin40, OUTPUT); printf("One Output + Sleep\n\n"); printf(" Loops microsecs runsecs cycles/sec\n"); for (int r = 0; r < 6; r++) { startSecs = getSecs(); for (int i=0; i < loops; i++) { digitalWrite (P4Pin40, 1) ; delayMicroseconds(microsecs); digitalWrite (P4Pin40, 0) ; delayMicroseconds(microsecs); } endSecs = getSecs(); runSecs = endSecs - startSecs; cps = (double)loops / runSecs; printf("%10d %9ld %9.3f %10.1f \n", loops, microsecs, runSecs, cps); loops = loops * 10; microsecs = microsecs / 10; } printf(" End\n\n"); return 0; |
PicoOneOut.py import time import utime loops = 100 microsecs = 100000 PicoPin20 = machine.Pin(15, machine.Pin.OUT) print(' Pico Python One Output + Sleep') print(' Loops microsecs runsecs cycles/sec') for j in range (6): startTime = utime.ticks_ms() for i in range(loops): PicoPin20.value(1) utime.sleep_us(int(microsecs)) PicoPin20.value(0) utime.sleep_us(int(microsecs)) endTime = utime.ticks_ms() runTime = utime.ticks_diff(endTime,startTime)/1000 cps = loops/runTime print('{:10d} {:9.0f} {:9.2f} {:11.1f}' .format(loops, microsecs, runTime, cps)) loops = loops * 10 microsecs = microsecs / 10 print ("End") PicoOneOut.c #include "stdio.h" #include "pico/stdlib.h" #include "hardware/gpio.h" const uint PicoPin20 = 15; uint loops = 100; uint64_t microsecs = 100000; uint64_t startTime; uint64_t endTime; float runSecs; float cps; int main() { setup_default_uart(); gpio_init(PicoPin20); gpio_set_dir(PicoPin20, GPIO_OUT); printf("One Output + Sleep\n\n"); printf("Just Sleep\n\n"); printf(" Loops microsecs runsecs cycles/sec\n"); for (int r = 0; r < 6; r++) { startTime = time_us_64 (); for (uint i = 0; i < loops; i++) { gpio_put(PicoPin20, 1); busy_wait_us(microsecs); gpio_put(PicoPin20, 0); busy_wait_us(microsecs); } endTime = time_us_64 (); runSecs = (float)(endTime - startTime) / 1000000.0; cps = (float)loops / runSecs; printf("%10d %9ld %9.3f %10.1f \n", loops, microsecs, runSecs, cps); loops = loops * 10; microsecs = microsecs / 10; } printf(" End\n\n"); |
Pi4ThirteenOut.py P4Pin40 = LED(21) P4Pin38 = LED(20) P4Pin36 = LED(16) P4Pin32 = LED(12) P4Pin37 = LED(26) P4Pin35 = LED(19) P4Pin33 = LED(13) P4Pin31 = LED(6) P4Pin29 = LED(5) P4Pin22 = LED(25) P4Pin18 = LED(24) P4Pin16 = LED(23) P4Pin15 = LED(22) P4Pin40.on() P4Pin38.on() P4Pin36.on() P4Pin32.on() P4Pin37.on() P4Pin35.on() P4Pin33.on() P4Pin31.on() P4Pin29.on() P4Pin22.on() P4Pin18.on() P4Pin16.on() P4Pin15.on() P4Pin40.off() P4Pin38.off() P4Pin36.off() P4Pin32.off() P4Pin37.off() P4Pin35.off() P4Pin33.off() P4Pin31.off() P4Pin29.off() P4Pin22.off() P4Pin18.off() P4Pin16.off() P4Pin15.off() |
Pi4ThirteenOut.c #define P4Pin40 29 #define P4Pin38 28 #define P4Pin36 27 #define P4Pin32 26 #define P4Pin37 25 #define P4Pin35 24 #define P4Pin33 23 #define P4Pin31 22 #define P4Pin29 21 #define P4Pin22 6 #define P4Pin18 5 #define P4Pin16 4 #define P4Pin15 3 pinMode (P4Pin40, OUTPUT); pinMode (P4Pin38, OUTPUT); pinMode (P4Pin36, OUTPUT); pinMode (P4Pin32, OUTPUT); pinMode (P4Pin37, OUTPUT); pinMode (P4Pin35, OUTPUT); pinMode (P4Pin33, OUTPUT); pinMode (P4Pin31, OUTPUT); pinMode (P4Pin29, OUTPUT); pinMode (P4Pin22, OUTPUT); pinMode (P4Pin18, OUTPUT); pinMode (P4Pin16, OUTPUT); pinMode (P4Pin15, OUTPUT); digitalWrite (P4Pin40, 1); digitalWrite (P4Pin38, 1); digitalWrite (P4Pin36, 1); digitalWrite (P4Pin32, 1); digitalWrite (P4Pin37, 1); digitalWrite (P4Pin35, 1); digitalWrite (P4Pin33, 1); digitalWrite (P4Pin31, 1); digitalWrite (P4Pin29, 1); digitalWrite (P4Pin22, 1); digitalWrite (P4Pin18, 1); digitalWrite (P4Pin16, 1); digitalWrite (P4Pin15, 1); digitalWrite (P4Pin40, 0); digitalWrite (P4Pin38, 0); digitalWrite (P4Pin36, 0); digitalWrite (P4Pin32, 0); digitalWrite (P4Pin37, 0); digitalWrite (P4Pin35, 0); digitalWrite (P4Pin33, 0); digitalWrite (P4Pin31, 0); digitalWrite (P4Pin29, 0); digitalWrite (P4Pin22, 0); digitalWrite (P4Pin18, 0); digitalWrite (P4Pin16, 0); digitalWrite (P4Pin15, 0); delayMicroseconds(microsecs); |
PicoThirteenOut.py Starts below PicoPin20 = machine.Pin(15, machine.Pin.OUT) PicoPin19 = machine.Pin(14, machine.Pin.OUT) PicoPin17 = machine.Pin(13, machine.Pin.OUT) PicoPin16 = machine.Pin(12, machine.Pin.OUT) PicoPin15 = machine.Pin(11, machine.Pin.OUT) PicoPin14 = machine.Pin(10, machine.Pin.OUT) PicoPin12 = machine.Pin(9, machine.Pin.OUT) PicoPin11 = machine.Pin(8, machine.Pin.OUT) PicoPin10 = machine.Pin(7, machine.Pin.OUT) PicoPin9 = machine.Pin(6, machine.Pin.OUT) PicoPin7 = machine.Pin(5, machine.Pin.OUT) PicoPin6 = machine.Pin(4, machine.Pin.OUT) PicoPin21 = machine.Pin(16, machine.Pin.OUT) PicoPin20.value(1) PicoPin19.value(1) PicoPin17.value(1) PicoPin16.value(1) PicoPin15.value(1) PicoPin14.value(1) PicoPin12.value(1) PicoPin11.value(1) PicoPin10.value(1) PicoPin9.value(1) PicoPin7.value(1) PicoPin6.value(1) PicoPin21.value(1) PicoPin20.value(0) PicoPin19.value(0) PicoPin17.value(0) PicoPin16.value(0) PicoPin15.value(0) PicoPin14.value(0) PicoPin12.value(0) PicoPin11.value(0) PicoPin10.value(0) PicoPin9.value(0) PicoPin7.value(0) PicoPin6.value(0) PicoPin21.value(0) |
PicoThirteenOut.c const uint PicoPin20 = 15; const uint PicoPin19 = 14; const uint PicoPin17 = 13; const uint PicoPin16 = 12; const uint PicoPin15 = 11; const uint PicoPin14 = 10; const uint PicoPin12 = 9; const uint PicoPin11 = 8; const uint PicoPin10 = 7; const uint PicoPin9 = 6; const uint PicoPin7 = 5; const uint PicoPin6 = 4; const uint PicoPin21 = 16; gpio_init(PicoPin20); gpio_init(PicoPin19); gpio_init(PicoPin17); gpio_init(PicoPin16); gpio_init(PicoPin15); gpio_init(PicoPin14); gpio_init(PicoPin12); gpio_init(PicoPin11); gpio_init(PicoPin10); gpio_init(PicoPin9); gpio_init(PicoPin7); gpio_init(PicoPin6); gpio_init(PicoPin21); gpio_set_dir(PicoPin20, GPIO_OUT); gpio_set_dir(PicoPin19, GPIO_OUT); gpio_set_dir(PicoPin17, GPIO_OUT); gpio_set_dir(PicoPin16, GPIO_OUT); gpio_set_dir(PicoPin15, GPIO_OUT); gpio_set_dir(PicoPin14, GPIO_OUT); gpio_set_dir(PicoPin12, GPIO_OUT); gpio_set_dir(PicoPin11, GPIO_OUT); gpio_set_dir(PicoPin10, GPIO_OUT); gpio_set_dir(PicoPin9, GPIO_OUT); gpio_set_dir(PicoPin7, GPIO_OUT); gpio_set_dir(PicoPin6, GPIO_OUT); gpio_set_dir(PicoPin21, GPIO_OUT); gpio_put(PicoPin20, 1); gpio_put(PicoPin19, 1); gpio_put(PicoPin17, 1); gpio_put(PicoPin16, 1); gpio_put(PicoPin15, 1); gpio_put(PicoPin14, 1); gpio_put(PicoPin12, 1); gpio_put(PicoPin11, 1); gpio_put(PicoPin10, 1); gpio_put(PicoPin9, 1); gpio_put(PicoPin7, 1); gpio_put(PicoPin6, 1); gpio_put(PicoPin21, 1); gpio_put(PicoPin20, 0); gpio_put(PicoPin19, 0); gpio_put(PicoPin17, 0); gpio_put(PicoPin16, 0); gpio_put(PicoPin15, 0); gpio_put(PicoPin14, 0); gpio_put(PicoPin12, 0); gpio_put(PicoPin11, 0); gpio_put(PicoPin10, 0); gpio_put(PicoPin9, 0); gpio_put(PicoPin7, 0); gpio_put(PicoPin6, 0); gpio_put(PicoPin21, 0); |
/* gcc -O3 -o incount incount.c -lwiringPi sudo ./incount */ #include "stdio.h" #include "wiringPi.h" #include "time.h" #define P4Pin13 2 // WiringPi pin address double startSecs; double theseSecs; struct timespec tp1; double minTime = 10.0; double getSecs() { clock_gettime(CLOCK_REALTIME, &tp1); theseSecs = tp1.tv_sec + tp1.tv_nsec / 1e9; return theseSecs; } int main (void) { int i; double count1 = 1; double count2 = 1; double cycles1 = 0; double cycles0 = 0; double runTime = 0; printf ("Raspberry Pi GPIO Frequency\n"); if (wiringPiSetup () == -1) return 1; pinMode (P4Pin13, INPUT); startSecs = getSecs(); while (runTime < minTime) { for (i=0; i < 1000; i++) { if (digitalRead(P4Pin13)) { if (count1 == 1) { cycles1 = cycles1 + 1; count1 = 0; count2 = 1; } } else { if (count2 == 1) { cycles0 = cycles0 + 1; count1 = 1; count2 = 0; } } } runTime = getSecs() - startSecs; } if (cycles1 == 0) { printf (" No cycles recorded\n"); } else { printf (" %6.2f Seconds for Cycles Per Second " "%.2f ON and %.2f OFF\n", runTime, cycles1/runTime, cycles0/runTime); } return 0; } |
Pi 4 performance 0.0 ARM MHz=1500, core volt=0.8625V, CPU temp=56.0'C, pmic temp=51.4'C Pico ž13 Outputs + Sleep using busy_wait_us(microsecs) Loops microsecs runsecs cycles/sec 100 100000 20.000 5.0 1000 10000 20.000 50.0 10000 1000 20.000 500.0 100000 100 20.000 5000.0 1000000 10 20.000 50000.0 10000000 1 20.000 499999.7 End PI 4 pi@raspberrypi:~/picoME/picoc $ ./incount 10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF 10.00 Seconds for Cycles Per Second 50.00 ON and 50.10 OFF 10.00 Seconds for Cycles Per Second 500.10 ON and 500.00 OFF 10.00 Seconds for Cycles Per Second 4999.78 ON and 4999.78 OFF 10.00 Seconds for Cycles Per Second 49997.47 ON and 49997.37 OFF 10.00 Seconds for Cycles Per Second 499560.23 ON and 499560.23 OFF pi 4 powersave 0.0 ARM MHz= 600, core volt=0.8625V, CPU temp=54.5'C, pmic temp=51.4'C Pico 13 Outputs + Sleep using busy_wait_us(microsecs) Loops microsecs runsecs cycles/sec 100 100000 20.000 5.0 1000 10000 20.000 50.0 10000 1000 20.000 500.0 100000 100 20.000 5000.0 1000000 10 20.000 50000.0 10000000 1 20.000 499999.6 End pi 4 pi@raspberrypi:~/picoME/picoc $ ./incount 10.00 Seconds for Cycles Per Second 5.00 ON and 5.10 OFF 10.00 Seconds for Cycles Per Second 50.10 ON and 50.00 OFF 10.00 Seconds for Cycles Per Second 500.09 ON and 499.99 OFF 10.00 Seconds for Cycles Per Second 4999.18 ON and 4999.18 OFF 10.00 Seconds for Cycles Per Second 49894.83 ON and 49894.83 OFF 10.00 Seconds for Cycles Per Second 496711.77 ON and 496711.87 OFF |
The execution times of the benchmark programs are calibrated to run for an approximate reasonable finite time, that are 10 seconds for Whetstone and Dhrystone and a minimum of 0.1 seconds for individual MemSpeed tests.
The benchmarks were run on the Pico CPU, that operates at 125 MHz, and a 1500 MHz Raspberry Pi 4B, twelve times faster. Then the Pi 4 measured 244 times faster with Whetstone, influenced by lack of floating point hardware in the Pico, 38 times faster with Dhrystone and significantly higher using MemSpeed. Performance is often quoted on a per MHz basis, where Pico comes out badly. A complete contrast was apparent running the bit banging type tests.
During the earlier tests, simply measuring pin output speeds, a Pi 400 was found to be capable of transferring a maximum of 67.1 Mega bits/second (Mbps), with a single CPU core running at 100% utilisation. That could be rated as 0.037 Bit Bangs per MHz (BB/MHz). The Pico achieved 51.6 Mbps or 0.41 BB/MHz, more than eleven times more efficient, clearly not dependent on CPU MHz.
The benchmarks are available for downloading in
PicoBenchmarks.zip,
that contains C source codes and .uf2 Pico execution programs, along with CMakeLists.txt file, needed for compilation, plus example Pico results.
Note the difference in numerical results, between Pico and Pi 4 tests. However, the Pico numbers are of the right precision for 32 bit floating point numbers, and rounded from those from Pi 4 output. The differences might be due to processor hardware variations.
The Pi 4 produced an impossible huge MOPS score for the IF test, caused by compiler optimisation (like we only need to execute the test loop once). The time for this, when running as intended, is inevitably so short that it has no real influence on the MWIPS rating.
Pico 125 MHz ########################################## Single Precision C Whetstone Benchmark Calibrate 1.20 Seconds 1 Passes (x 100) 5.99 Seconds 5 Passes (x 100) Use 8 passes (x 100) Single Precision C/C++ Whetstone Benchmark Loop content Result MFLOPS MOPS Seconds N1 floating point -1.12475013700000000 1.493 0.103 N2 floating point -1.12274742100000000 1.495 0.719 N3 if then else 1.00000000000000000 93.729 0.009 N4 fixed point 12.00000000000000000 5.716 0.441 N5 sin,cos etc. 0.49911010300000000 0.160 4.171 N6 floating point 0.99999982100000000 1.531 2.819 N7 assignments 3.00000000000000000 53.567 0.028 N8 exp,sqrt etc. 0.75110864600000000 0.228 1.306 MWIPS 8.338 9.595 Pi 4B 1500 MHz ########################################## Single Precision C/C++ Whetstone Benchmark Loop content Result MFLOPS MOPS Seconds N1 floating point -1.12475013732910156 524.661 0.074 N2 floating point -1.12274742126464844 533.855 0.511 N3 if then else 1.00000000000000000 N/A 0.000 N4 fixed point 12.00000000000000000 2497.509 0.256 N5 sin,cos etc. 0.49911010265350342 55.124 3.065 N6 floating point 0.99999982118606567 387.309 2.829 N7 assignments 3.00000000000000000 998.853 0.376 N8 exp,sqrt etc. 0.75110864639282227 26.174 2.887 MWIPS 2031.394 9.998 |
Pico 125 MHz ########################################## Dhrystone Benchmark, Version 2.1 (Language: C or C++) Register option not selected 10000 runs 0.04 seconds 100000 runs 0.40 seconds 200000 runs 0.80 seconds 400000 runs 1.60 seconds 800000 runs 3.20 seconds Final values (* implementation-dependent): Int_Glob: O.K. 5 Bool_Glob: O.K. 1 Ch_1_Glob: O.K. A Ch_2_Glob: O.K. B Arr_1_Glob[8]: O.K. 7 Arr_2_Glob8/7: O.K. 800010 Ptr_Glob-> Ptr_Comp: * 536884992 Discr: O.K. 0 Enum_Comp: O.K. 2 Int_Comp: O.K. 17 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME G Next_Ptr_Glob-> Ptr_Comp: * 536884992 same as above Discr: O.K. 0 Enum_Comp: O.K. 1 Int_Comp: O.K. 18 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME G Int_1_Loc: O.K. 5 Int_2_Loc: O.K. 13 Int_3_Loc: O.K. 7 Enum_Loc: O.K. 1 Str_1_Loc: O.K. DHRYSTONE PROGRAM, 1'ST G Str_2_Loc: O.K. DHRYSTONE PROGRAM, 2'ND G Nanoseconds one Dhrystone run: 4000.00 Dhrystones per Second: 250000 VAX MIPS rating = 142.29 Pi 4B 1500 MHz ########################################## Nanoseconds one Dhrystone run: 105.46 Dhrystones per Second: 9482703 VAX MIPS rating = 5397.10 |
The Pico is said to have 264 KB RAM? For the benchmark K is 1024, where 256 KB is 262.144 decimal KB. The program had to be run with a maximum of two times 64 KB to fit.
Directly comparing these Pico and Pi 4 results is not really appropriate, the Pi 4 making use of advanced SIMD vector instructions, to say the least. Looking at those slow floating point speeds, 6 MBytes/second equates to 48 Mbits/second and 97 MBps integer operations to 776 Mbps, much greater that the Bit Banging capabilities for the types of operation considered in this report.
Pico 125 MHz ########################################## Memory Reading Speed Test Pico Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 8 6 6 97 18 11 88 107 95 95 16 6 6 97 18 11 88 108 95 95 32 6 6 97 18 11 88 108 95 95 64 6 6 97 18 11 88 108 95 95 128 6 6 97 18 11 88 108 95 95 End of test Pi 4B 1500 MHz ########################################## Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m] KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32 Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S 8 11761 8660 11894 11787 9516 11889 10318 5225 7796 16 11874 8690 11921 11886 9552 11919 10479 5118 7892 32 10592 8195 10732 10719 8832 10728 8853 4468 7360 64 10093 8361 10407 9996 9082 10400 8704 4632 7541 128 9997 8521 10535 9948 9309 10529 8143 4750 7491 256 9987 8536 10569 9956 9320 10568 7990 4928 7644 512 9124 8336 10168 9321 9085 10215 7992 4929 7681 1024 3736 6332 6594 3696 6424 6717 5179 3849 4296 |