Store an int in a char buffer in C and then retrieve the same

Question

I am writing a socket client-server application where the server needs to send a large buffer to a client and all buffers should be processed separately, so I want to put the buffer length in the buffer so that the client can read the length of data from the buffer and process accordingly.

To put the length value I need to divide an integer value in one byte each and store it in a buffer to be sent over the socket. I am able to break the integer into four parts, but at the time of joining I am not able to retrieve the correct value. To demonstrate my problem I have written a sample program where I am dividing int into four char variables and then join it back in another integer. The goal is that after joining I should get the same result.

Here is my small program.

#include <stdio.h>

int main ()
{
    int inVal = 0, outVal =0;
    char buf[5] = {0};

    inVal = 67502978;

    printf ("inVal: %d\n", inVal);

    buf[0] = inVal & 0xff;
    buf[1] = (inVal >> 8) & 0xff;
    buf[2] = (inVal >> 16) & 0xff;
    buf[3] = (inVal >> 24) & 0xff;

    outVal = buf[3];
    outVal = outVal << 8;
    outVal |= buf[2];
    outVal = outVal << 8;
    outVal |= buf[1];
    outVal = outVal << 8;
    outVal |= buf[0];

    printf ("outVal: %d\n",outVal);
    return 0;
}

Output

inVal: 67502978 outVal: -126

What am I doing wrong?

Possible duplicate of Integer overflow and undefined behavior — LPs
– LPs, Commented Jan 11, 2017 at 7:28

Peter - Reinstate Monica · Accepted Answer · 2017-01-12 06:33:13Z

19

One problem is that you are using bit-wise operators on signed numbers. This is always a bad idea and almost always incorrect. Please note that char has implementation-defined signedness, unlike int which is always signed.

Therefore you should replace int with uint32_t and char with uint8_t. With such unsigned types you eliminate the possibility of using bit shifts on negative numbers, which would be a bug. Similarly, if you shift data into the sign bits of a signed number, you will get bugs.

And needless to say, the code will not work if integers are not 4 bytes large.

edited Jan 12, 2017 at 6:33

Peter - Reinstate Monica

16.4k4 gold badges43 silver badges70 bronze badges

answered Jan 11, 2017 at 7:34

Lundin

220k47 gold badges281 silver badges446 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Peter - Reinstate Monica Over a year ago

@LPs My thinking too. The general algorithmic idea is, if I'm not mistaken, endianness-independent because the bit shift operator is endianness-ignorant (or -invariant, or whatever: left shift is always multiplying, right-shift dividing). This is the way to go in order to implement ntohl and friends without looking at the architecture's endianness.

simon Over a year ago

I don't understand Should you use bit shifts on a number that is negative, you might get bugs.. How can he have negative numbers if he changes the types to uint32_t and uint8_t?

Daniel Fischer Over a year ago

@gurka The point is to use unsigned types of specified width to prevent such bugs. Since you misunderstood, the answer is apparently not optimally worded.

Lundin Over a year ago

Unfortunately, C is a very inconsistent and irrational language. What's not mentioned here is that even if you use uint8_t, you could end up with negative numbers, because of the integer promotion rules in C. For example ~uint8 << n always invokes undefined behavior, unlike ~uint32 << n which is always perfectly fine (on 32 bit sys). This is because the former is equivalent to ~(int)uint8 << n. The key to avoiding these subtle but disastrous bugs is to learn about the various forms of implicit type promotion in C.

chqrlie · Accepted Answer · 2017-01-21 17:21:01Z

12

Your method has potential implementation defined behavior as well as undefined behavior:

storing values into the array of type char beyond the range of type char has implementation defined behavior: buf[0] = inVal & 0xff; and the next 3 statements (inVal & 0xff might be larger than CHAR_MAX if char type is signed by default).
left shifting negative values invokes undefined behavior: if any of the 3 first bytes in the array becomes negative as the implementation defined result of storing a value larger than CHAR_MAX into it, the resulting outVal becomes negative, left shifting it is undefined.

In your specific example, your architecture uses 2's complement representation for negative values and the type char is signed. The value stored into buf[0] is 67502978 & 0xff = 130, becomes -126. The last statement outVal |= buf[0]; sets bits 7 through 31 of outVal and the result is -126.

You can avoid these issues by using an array of unsigned char and values of type unsigned int:

#include <stdio.h>

int main(void) {
    unsigned int inVal = 0, outVal = 0;
    unsigned char buf[4] = { 0 };

    inVal = 67502978;

    printf("inVal: %u\n", inVal);

    buf[0] = inVal & 0xff;
    buf[1] = (inVal >> 8) & 0xff;
    buf[2] = (inVal >> 16) & 0xff;
    buf[3] = (inVal >> 24) & 0xff;

    outVal = buf[3];
    outVal <<= 8;
    outVal |= buf[2];
    outVal <<= 8;
    outVal |= buf[1];
    outVal <<= 8;
    outVal |= buf[0];

    printf("outVal: %u\n", outVal);
    return 0;
}

Note that the above code still assumes 32-bit ints.

edited Jan 21, 2017 at 17:21

answered Jan 11, 2017 at 8:23

chqrlie

152k12 gold badges145 silver badges231 bronze badges

5 Comments

Martin Bonner supports Monica Over a year ago

Actually, it doesn't assume 8-bit bytes. It assumes that unsigned char can hold at least 8 bits (which the standard guarantees). (For the calculation of outval it also assumes that buf only contains valid values in the range 0..255.)

Peter - Reinstate Monica Over a year ago

The program does not have undefined behavior. It also does not have implementation defined behavior since all values stored in the chars are masked by 0xff and thus representable in a char which is guaranteed to have at least 8 bits.

chqrlie Over a year ago

@PeterA.Schneider: it does have implementation defined behavior in C: 6.3.1.3 Signed and unsigned integers: When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged. 2 Otherwise, if the new type is unsigned, ... 3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. The values masked by 0xff may exceed the range of type char, and 130 indeed does.

chqrlie Over a year ago

@PeterA.Schneider: The program does not invoke undefined behavior for the specific value inVal = 67502978, but the method used does for many other values, such as inVal = 32768.

John U Over a year ago

For the avoidance of doubt, use std types with fixed widths uint8_t and uint32_t for example. As an embedded coder I only ever use these types (never just int for example) as I've learned that assumption will bite you eventually.

Peter - Reinstate Monica · Accepted Answer · 2017-01-11 09:17:09Z

While bit shifts of signed values can be a problem, this is not the case here (all left hand values are positive, and all results are within the range of a 32 bit unsigned int).

The problematic expression with somewhat unintuitive semantics is the last bitwise OR:

outVal |= buf[0];

buf[0] is a (on your and my architecture) signed char with the value -126, simply because the most significant bit in the least significant byte of 67502978 is set. In C all operands in an arithmetic expression are subject to the arithmetic conversions. Specifically, they undergo integer promotion which states: "If an int can represent all values of the original type [...], the value is converted to an int". Accordingly, the signed character buf[0] is converted to a (signed) int, preserving its value of -126. A negative signed int has the sign bit set. ORing that with another signed int sets the result's sign bit as well, making that value negative. That is exactly what we are seeing.

Making the bytes unsigned chars fixes the issue because the value of the temporary integer to which the unsigned char is converted is then a simple 8 bit value of 130.

Saurav Sahu · Accepted Answer · 2017-01-11 08:36:03Z

4

C++ standard N3936 quotes about shift operators:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled.

If E1 has an unsigned type,

the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type.

Otherwise, if E1 has a signed type and non-negative value,

and E1 × 2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

So, to avoid undefined behaviour, it is recommended to use unsigned data types, and ensure the 64-bits length of data type.

edited Jan 11, 2017 at 8:36

answered Jan 11, 2017 at 7:38

Saurav Sahu

14.1k9 gold badges73 silver badges85 bronze badges

1 Comment

Peter - Reinstate Monica Over a year ago

Hm. I see the following values for outval before each shift: (32 bit little endian ints): 4; 1030; 263680. None of them are negative, and the results are all within range, so they are all well defined. The value after the last shift is 67502848. It's the last OR that creates the negative value.

Stephan Lechner · Accepted Answer · 2017-01-11 21:59:19Z

4

Use unsigned char buf[5] = {0}; and unsigned int for inVal and outVal, and it should work.

When using signed integral types, there arise two sorts of problems:

First, if buf[3] is negative, then due to outVal = buf[3] variable outVal becomes negative; consequent bit shift operators on outVal are then undefined behaviour cppreference.com concerning bit shift operators:

For signed and positive a, the value of a << b is a * 2b if it is representable the return type, otherwise the behavior is undefined. (until C++14), the value of a << b is a * 2b if it is representable in the unsigned version of the return type (which is then converted to signed: this makes it legal to create INT_MIN as 1<<31), otherwise the behavior is undefined. (since C++14)

For negative a, the behavior of a << b is undefined.

Note that with OP's inVal = 67502978 this does not occur, since buf[3]=4; But for other inVals it may occur and then may bring problems due to "undefined behaviour".

The second problem is that with operation outVal |= buf[0] with buf[0]=-126, the value (char)-126, which in binary format is 10000010, is converted to (int)-126, which in binary format is 11111111111111111111111110000010 before operator |= is applied, and this then will fill up outVal with a lot of 1-bits. The reason for conversion is defined at conversion rules for arithmetic operations (cppreference.com):

If both operands are signed or both are unsigned, the operand with lesser conversion rank is converted to the operand with the greater integer conversion rank

So the problem in OP's case is actually not because of any undefined behaviour, but because of having character buf[3] being a negative value, which is converted to int before |= operation.

Note, however, that if either buf[2] or buf[1] had been negative, this would have made outVal negative and would have lead to undefined behaviour on subsequent shift operations, too.

edited Jan 11, 2017 at 21:59

answered Jan 11, 2017 at 7:24

Stephan Lechner

35.2k4 gold badges38 silver badges59 bronze badges

11 Comments

Peter - Reinstate Monica Over a year ago

Where do you see an overflowing integer operation? All shifts are done on (afaics none-overflowing) ints.

Lundin Over a year ago

This is not an integer overflow, it is a direct manipulation of the sign bits. It is undefined behavior per C11 6.5.7/4: "The result of E1 << E2 is E1 left-shifted E2 bit positions;" /--/ "If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined."

Stephan Lechner Over a year ago

@Lundin: You are right, since there is a more specific specification of bit shift's behaviour. I actually thought it would also violate the more general overflow rule, but maybe I am wrong.

Peter - Reinstate Monica Over a year ago

@Lundin All shits are well defined, cf. my comment to Saurav's post. Stephan: The shifts are not the issue here.

Martin Bonner supports Monica Over a year ago

@PeterA.Schneider - ignore my comment. It was wrong.

|

John U · Accepted Answer · 2017-01-11 12:56:27Z

3

This may be a terrible idea but I'll post it here for interest - you can use a union:

union my_data
{
    uint32_t one_int;

    struct
    {
        uint8_t  byte3;
        uint8_t  byte2;
        uint8_t  byte1;
        uint8_t  byte0;
    }bytes;
};


// Your original code modified to use union my_data
#include <stdio.h>

int main(void) {
    union my_data data;
    uint32_t inVal = 0, outVal = 0;
    uint8_t buf[4] = {0};

    inVal = 67502978;

    printf("inVal: %u\n", inVal);

    data.one_int = inVal;

    // Populate bytes into buff    
    buf[3] = data.bytes.byte3;
    buf[2] = data.bytes.byte2;
    buf[1] = data.bytes.byte1;
    buf[0] = data.bytes.byte0;

    return 0;
}

I don't know if this would also work, can't see why not:

union my_data
{
    uint32_t one_int;
    uint8_t  bytes[4];
};

edited Jan 11, 2017 at 12:56

answered Jan 11, 2017 at 11:31

John U

3,0233 gold badges33 silver badges40 bronze badges

5 Comments

anon Over a year ago

+1 This is the best way of doing this. It is both non-portable, and undefined behaviour, but so is anything assuming that assumes you can translate between an int and four chars. I would combine with Jim D.'s answer to ensure endian-correctness, however.

anon Over a year ago

Also, I'm pretty sure you've made an error in your code beginning with outVal = data.bytes.byte3, etc.

John U Over a year ago

Thanks Jack, well spotted. I've updated code now to reflect the task at hand (getting bytes into buf from inVal).

Agent_L Over a year ago

That's precisely not what unions are for. Anyway, you forgot #pragma pack 1. Alternatively, uint8_t [4] is much safer than 4 independent fields because it avoids the packing issue.

John U Over a year ago

I did say it's probably a terrible idea, so I was right.

JimD. · Accepted Answer · 2017-01-11 07:59:03Z

2

Because of endian differences between architectures, it is best practice to convert numeric values to network order, which is big-endian. On receipt, they can then be converted to the native host order. We can do this in a portable way by using htonl() (host to network "long" = uint32_t), and convert to host order on receipt with ntohl(). Example:

#include <stdio.h>
#include <arpa/inet.h>

int main(int argc, char **argv) {
  uint32_t inval = 67502978, outval, backinval;

  outval = htonl(inval);
  printf("outval: %d\n", outval);
  backinval = ntohl(outval);
  printf("backinval: %d\n", backinval);
  return 0;
}

This gives the following result on my 64 bit x86 which is little endian:

$ gcc -Wall example.c
$ ./a.out
outval: -2113731068
backinval: 67502978
$

answered Jan 11, 2017 at 7:59

JimD.

2,4431 gold badge16 silver badges20 bronze badges

10 Comments

chqrlie Over a year ago

While your remark raises an important point for the binary exchange of values, it does not explain the OP's problem. Using a different byte order would not solve the problem either.

Peter - Reinstate Monica Over a year ago

While @chqrlie has a point, +1 for the mandatory pointer to htonl and friends. I suspect that many spontaneous resets of the various embedded systems in my household could be avoided if people stopped attempts at re-implementing htonl etc. Disseminating best practice is a service to mankind. Especially because the attempts are usually not a sign of competence and thus bound to fail.

Martin Bonner supports Monica Over a year ago

I really really don't like the interface of htonl et al. htonl should return an array of octets - not an int (I can imagine a platform where htonl of a valid integer ends up returning a trap value).

Martin Bonner supports Monica Over a year ago

A little endian sign and magnitude or ones-complement machine where negative zero is a trap. A 0x00000080 would end up as a big-endian 0x80000000 which is the trap value.

JimD. Over a year ago

@MartinBonner Although it is called htonl() it doesn't return an int, it returns a uint32_t. I would expect that 0x80000000 is a valid unsigned 32 bit integer on all architectures.

|

Collectives™ on Stack Overflow

Store an int in a char buffer in C and then retrieve the same

Output

7 Answers 7

4 Comments

5 Comments

Comments

1 Comment

11 Comments

5 Comments

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Output

7 Answers 7

4 Comments

5 Comments

Comments

1 Comment

11 Comments

5 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related