1
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);

so if the memory looks like this: 0000 0000 0000 0000 0000 0000 0000 0001

The program outputs 0. How do I make it output a uint value of the entire 32 bits?

1
  • Because you are only displaying the converted 0000 from the 1st byte? Commented Nov 19, 2016 at 0:05

5 Answers 5

3

Because you are using type promotion. char will promote to int when accessed. You'll get no diagnostic for this. So what you are doing is dereferencing the first element in your char array, which is 0, and assigning it to an int...which likewise ends up being 0.

What you want to do is technically undefined behavior but generally works. You want to do this:

unsigned int j = *reinterpret_cast<unsigned int*>(f);

At this point you'll be dealing with undefined behavior and with the endianness of the platform. You probably do not have the value you want recorded in your byte stream. You're treading in territory that requires intimate knowledge of your compiler and your target architecture.

Sign up to request clarification or add additional context in comments.

Comments

3

Supposed your platform supports 32bit length integers, you can do the following to achieve the kind of cast you want:

char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;

uint32_t j;
memcpy(&j,f,sizeof(j));
printf("%u\n", j);

Be aware of endianess in integer representation.

3 Comments

Would this ever produce different results to using "reinterpret_cast"?
@Nathaniel No. The result would be the same. Did you expect a different result?
Cool, I just wanted to make sure that I understood the code correctly.
2

In order to ensure that your code works on both little endian and big endian systems, you could do the following:

char f[4] = {0,0,0,1};
int32_t j = *((int32_t *)f);
j=ntohl(j);
printf("%d", j);

This will print 1 on both little endian and big endian systems. Without using ntohl, 1 will only be printed on Big Endian systems.

The code works because f is being assigned values in the same way as in a Big Endian System. Since network order is also Big Endian, ntohl will correctly convert j. If the host is Big Endian, j will remain unchanged. If the host is Little Endian, the bytes in j will be reversed.

2 Comments

Since "ntohl" is only on the print command, wouldn't that actual value of the uint still be incorrect?
yes, assigning j with the ntohl value is the better way to go.
1

What happens in the line:

unsigned int j = *f; 

is simply assigning the first element of f to the integer j. It is equivalent to:

unsigned int j = f[0];

and since f[0] is 0 it is really just assigning a 0 to the integer:

unsigned int j = 0;

You will have to convert the elements of f.

Reinterpretation will always cause undefined behavior. The following example shows such usage and it is always incorrect:

unsigned int j = *( unsigned int* )f;

Undefined behavior may produce any result, even apparently correct ones. Even if such code appears to produce correct results when you run it for the first time, this isn't proof that the program is defined. The program is still undefined, and may produce incorrect results at any time.

There is no such thing as technically undefined behavior or generally works, the program is either undefined or not. Relying on such statements is dangerous and irresponsible.

Luckily we don't have to rely on such bad code.

All you need to do is choose the representation of the integer that will be stored in f, and then convert it. It appears you want to store in big-endian, with at most 8 bits per element. This doesn't mean that the machine must be big-endian, only the representation of the integer you're encoding in f. Representation of integers on the machine is not important, as this method is completely portable.

This means the most significant byte will appear first. The most significant byte is f[0], and the least significant byte is f[3].

We will need an integer capable of storing at least 32 bits and type unsigned long does this.

Type char is for used storing characters not integers. An unsigned integer type like unsigned char should be used.

Then only the conversion from big-endian encoded in f must be done:

unsigned char encoded[4] = { 0 , 0 , 0 , 1 };
unsigned long value = 0;
value = value | ( ( ( unsigned long )encoded[0] & 0xFF ) << 24 );
value = value | ( ( ( unsigned long )encoded[1] & 0xFF ) << 16 );
value = value | ( ( ( unsigned long )encoded[2] & 0xFF ) << 8 );
value = value | ( ( ( unsigned long )encoded[3] & 0xFF ) << 0 );

7 Comments

Surely all those bitshift operations will be slower than the memcopy or the reinterpret_cast though? I wanted to avoid calculating the uint, I just wanted to read the uint straight from the memory. I usually program in memory managed languages, now that I'm learning C, I was hoping to take advantage of all that control over the memory.
@Nathaniel This code will be faster than memcpy on a modern machine. Using the cast is incorrect (and won't be faster anyway).
I tested it by timing it with this code: imgur.com/a/qMot0 Recast is the fastest, memcopy takes twice as long, and this bitwise code takes slightly longer than memcopy
@Nathaniel Your test is invalid because it causes undefined behavior. But a test isn't needed. It can be seen from the generated assembly that my version performs less instructions.
compiling using "x86-64 gcc 6.2" (godbolt.org) says that if recast takes X instructions, memcopy takes (X-1) instructions, and the bitwise code takes (X+13) instructions.
|
-2

regarding the posted code:

char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
  1. in C, the return type from malloc() is void* which can be assigned to any other pointer, so casting just clutters the code and can be a problem when applying maintenance to the code.
  2. The C standard defines sizeof(char) as 1, so that expression has absolutely no effect as a part of the expression passed to malloc()
  3. the size of a int is not necessarily 4 (think of microprocessors or 64bit architecture)
  4. the function: calloc() will pre set all the bytes to 0x00
  5. which byte should be set to 0x01 depends on the Endianness of the underlying architecture

lets' assume, for now, your computer is a little Endian architecture. (I.E. Intel or similar)

then the code should look similar to the following:

#include <stdio.h>  // printf(), perror()
#include <stdlib.h> // calloc(), exit(), EXIT_FAILURE

int main( void )
{
    char *f = calloc( 1, sizeof(unsigned int) );
    if( !f )
    {
        perror( "calloc failed" );
        exit( EXIT_FAILURE );
    }

    // implied else, calloc successful

    // f[sizeof(unsigned int)-1] = 0x01; // if big Endian
    f[0] = 0x01;   // assume little Endian/Intel x86 or similar
    unsigned int j = *(unsigned int*)f;
    printf("%u\n", j);
}

Which when compiled/linked, outputs the following:

1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.