My implementation of an API function doing a simple SPI transfer, offering a void *intfPtr parameter to pass a "device descriptor" which I am using to pass the I/O port and pin for SPI chip select, looks like this:
#include <stdint.h>
typedef struct {
volatile uint8_t *port;
uint8_t pin;
} Intf;
static volatile uint8_t PORTD_OUT = 4;
static uint8_t transmit(uint8_t data) {
return 0x00;
}
static uint8_t bme68xWrite(uint8_t reg,
const uint8_t *data,
uint32_t len,
void *intfPtr) {
const Intf intf = *((Intf *)intfPtr);
*intf.port &= ~(1u << intf.pin);
transmit(reg);
for (uint32_t i = 0; i < len; i++) {
transmit(data[i]);
}
*intf.port |= (1u << intf.pin);
return 0;
}
I was wondering about how efficient (as in number of instructions) this implementation is and, if I picked the correct part, this is the two lines before transmit(reg):
00002cdc <.Loc.97>:
const Intf intf = *((Intf *)intfPtr);
2cdc: 00 81 ld r16, Z
2cde: 11 81 ldd r17, Z+1 ; 0x01
00002ce0 <.LVL44>:
*intf.port &= ~(1u << intf.pin);
2ce0: d8 01 movw r26, r16
2ce2: 2c 91 ld r18, X
00002ce4 <.Loc.100>:
2ce4: 92 81 ldd r25, Z+2 ; 0x02
00002ce6 <.Loc.101>:
2ce6: 41 e0 ldi r20, 0x01 ; 1
2ce8: 50 e0 ldi r21, 0x00 ; 0
2cea: 5a 01 movw r10, r20
2cec: 01 c0 rjmp .+2 ; 0x2cf0 <.L2^B2>
00002cee <.L1^B6>:
2cee: aa 0c add r10, r10
00002cf0 <.L2^B2>:
2cf0: 9a 95 dec r25
2cf2: ea f7 brpl .-6 ; 0x2cee <.L1^B6>
00002cf4 <.Loc.102>:
2cf4: 9a 2d mov r25, r10
2cf6: 90 95 com r25
2cf8: 92 23 and r25, r18
2cfa: 9c 93 st X, r25
Not so surprising, simply hardcoding port and pin like PORTD_OUT &= ~(1u << BME_CS_PD4); yields a lot fewer instructions:
00002cd0 <.Loc.97>:
PORTD_OUT &= ~(1u << BME_CS_PD4);
2cd0: 90 91 64 04 lds r25, 0x0464 ; 0x800464 <__TEXT_REGION_LENGTH__+0x7f0464>
00002cd4 <.Loc.98>:
2cd4: 9f 7e andi r25, 0xEF ; 239
2cd6: 90 93 64 04 sts 0x0464, r25 ; 0x800464 <__TEXT_REGION_LENGTH__+0x7f0464>
Counting all instructions of both implementations, it is 76 vs. 53. This with avr-gcc (GCC) 14.2.0 and -O2 by the way.
So, even if passing the port and pin as parameter is maybe more elegant than hardcoding them, it seems to be an expensive deal, especially considering that the function is called very often?
uint32_t len? Usinguint16_twill shave off some more bytes.