3

Let's take for example the following two 1-byte variables:

uint8_t x1 = 0x00;
uint8_t x2 = 0xFF;

When printing the bitwise complement, the result is a 4-byte variable:

printf("%02X -> %02X; %02X -> %02X\n", x1, ~x1, x2, ~x2);
00 -> FFFFFFFF; FF -> FFFFFF00

I know this can be "solved" using casting or masking:

printf("%02X -> %02X; %02X -> %02X\n", x1, (uint8_t) ~x1, x2, (uint8_t) ~x2);
00 -> FF; FF -> 00
printf("%02X -> %02X; %02X -> %02X\n", x1, ~x1&0xFF, x2, ~x2&0xFF);
00 -> FF; FF -> 00

But why the non-intuitive behavior in the first place?

2
  • 1
    Because %X is for unsigned int. And no, uint8_t is not a 2-byte variable. Commented Dec 20, 2017 at 19:02
  • 2
    Look up "integer promotions in C" to learn what is going on. Commented Dec 20, 2017 at 19:06

2 Answers 2

2

Many computer processors have a “word” size for most of their operations. E.g., on a 32-bit machine, there may be an instruction that loads 32 bits, an instruction that stores 32 bits, an instruction that adds one 32-bit number to another, and so on.

On these processors, it may be a nuisance to work with other sizes. There may be no instruction for multiplying a 16-bit number by another 16-bit number. C grew up on these machines. It was designed so that int (or unsigned int) was “whatever size is good for the machine you are running on” and char or short were fine for storing things in memory, but, once they were loaded from memory into processor registers, C worked with them like they were int.

This simplified the development of early C compilers. The compiler did not have to implement your complement by doing a 32-bit complement instruction followed by an AND instruction to remove the unwanted high bits. It only did a plain 32-bit complement.

We could develop languages differently today, but C is burdened with this legacy.

Sign up to request clarification or add additional context in comments.

1 Comment

Even modern processors are not able to work on words of arbitrary sizes. Afaik ARM has 16-bit multiplication (that is impossible to use from C), but has no 8 bit multiplication. Same for 16 and 8-bit addition, subtraction and others. Everything has to be promoted to 32-bit values. Intel is the only architecture that deals with different sizes of values. So this is not a legacy, it how processors are usually built.
1

When you apply the ~ operator to x1 and x2, the values are first subject to integer promotions because uint8_t is smaller than an int. The operator is then applied to the promoted value.

So ~x1 is really ~0x00000000 (i.e. 0xFFFFFFFF) and ~x2 is really ~0x000000FF (i.e. FFFFFF00). That's why you get the values you're getting.

Also, the %x format specifier expects an unsigned int which it prints as such.

You need to use %hhx for the format specifier. That signifies an unsigned char argument.

printf("%02hhX -> %02hhX; %02hhX -> %02hhX\n", x1, ~x1, x2, ~x2);

6 Comments

hhX prints 00 -> FFFF; FF -> FF00
Are you sure you're using hhX and not hX?
@chux It actually might not help in this case. My implementation defined PRIx8 as "x".
@dbush, you're right, apparently hhX on my system yields a too many arguments for format [-Wformat-extra-args] warning
PRIx8 as "x" is OK for the implementation. That is no problem when it prints a uint8_t. This answer is trying to do something different with printf("%02hhX\n", some_int_with_negative_value); which is UB - I think Hmmmm.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.