3

In ANSI C, how do we convert a string in to an array of binary bytes? All the googling and searching gives me answers for C++ and others and not C.

One idea I had was to convert the string into ASCII and then convert each ASCII value into its binary. (Duh!) I know it is the dumbest of ideas but I am not sure of any other option.

I've heard abt the encoding function in Java. I am not sure if that suits the same purpose and can be adopted to C.

string = "Hello"
bytearr[] = 10100101... some byte array..

It would be great if someone can throw some light on this.

Thanks!

3
  • 2
    What do you mean by "array of binary bytes" ? A "String" in C is simply a chunk of memory (an array) containing values (bytes) that get mapped to ASCII characters. Commented Apr 14, 2011 at 17:08
  • Some thing similar to the byte array in Java. Where u can process the string also in the form of a byte array. Commented Apr 14, 2011 at 17:10
  • 1
    You seem to be very confused about terminology. A string in C already is an array of binary bytes, more or less by definition. And it's probably also already ASCII (unless it's some other encoding of Unicode that supports characters outside U+0000 through U+007F). So please try again to explain what you want the contents of this "bytearr" to be. Commented Apr 14, 2011 at 17:11

6 Answers 6

10

Or did you mean how to convert C string to binary representation?

Here is one solution which can convert strings to binary representation. It can be easily altered to save the binary strings into array of strings.

#include <stdio.h>

int main(int argc, char *argv[])
{
    if(argv[1] == NULL) return 0; /* no input string */

    char *ptr = argv[1];
    int i;

    for(; *ptr != 0; ++ptr)
    {
        printf("%c => ", *ptr);

        /* perform bitwise AND for every bit of the character */
        for(i = 7; i >= 0; --i) 
            (*ptr & 1 << i) ? putchar('1') : putchar('0');

        putchar('\n');
    }

    return 0;
}

Example input & output:

./ascii2bin hello

h => 01101000
e => 01100101
l => 01101100
l => 01101100
o => 01101111
Sign up to request clarification or add additional context in comments.

Comments

3

There is no any strings in C. Any string IS an array of bytes.

1 Comment

I meant "Any string IS an array of bytes."
1

A string is an array of bytes.

If you want to display the ASCII value of each character in hex form, you would simply do something like:

while (*str != 0)
  printf("%02x ", (unsigned char) *str++);

Comments

1

On most of the systems I have worked on, the width of char is 1-byte and so a char[] or char* is a byte array.

In most other languages such as Java, the string datatype takes care of looking after, to a certain degree, concepts like encoding, by using an encoding like say UTF-8. In C this is not the case. If I were to read a UTF-8 string whose contents included multi-byte values, my characters would be represented by two buckets in the array (or potentially more).

To look at it from another point of view, consider that all types in C have a fixed width for your system (although they may vary between implementations).

So that string you're operating on is a byte array.

Next question I guess then is how do you display those bytes? That's pretty straightforward:

char* x = ???; /* some string */
unsigned int xlen = strlen(x);
int i = 0;

for ( i = 0; i < xlen; i++ )
{
    printf("%x", x[i]);
}

I can't think of a reason why you'd want to convert that output to binary, but it could be done if you were so minded.

1 Comment

This is not quite the same as "the width of char is 1 byte", but it probably deserves being said again in this context: sizeof(char)==1 BY DEFINITION. It will never be anything else. (However, the value of CHAR_BIT is not necessarily 8.)
0

If you just want to iterate (or randomly access) individual bytes' numeric values, you don't have to do any conversion at all, because C strings are arrays already:

void dumpbytevals(const char *str)
{
    while (*str)
    {
        printf("%02x ", (unsigned char)*str);
        str++;
    }
    putchar('\n');
}

If you're not careful with this kind of code, though, you run the risk of being in a world of hurt when you need to support non-ASCII characters.

Comments

0

Since printf is slow when converting a huge binary array. Here is another approach that does not use printf:

#define BASE16VAL               ("x0x1x2x3x4x5x6x7x8x9|||||||xAxBxCxDxExF") 
#define BASE16_ENCODELO(b)      (BASE16SYM[((uint8)(b)) >> 4])
#define BASE16_ENCODEHI(b)      (BASE16SYM[((uint8)(b)) & 0xF]) 
#define BASE16_DECODELO(b)      (BASE16VAL[Char_Upper(b) - '0'] << 4)
#define BASE16_DECODEHI(b)      (BASE16VAL[Char_Upper(b) - '0']). 

To convert a hex string to a byte array you would do the following:

while (*Source != 0)   
    {   
    Target[0]  = BASE16_DECODELO(Souce[0]);   
    Target[0] |= BASE16_DECODEHI(Souce[1]);    

    Target += 1;   
    Source += 2;   
    } 

*Target = 0;

Source is a pointer to a char array that contains a hex string. Target is a pointer to a char array that will contain the byte array.

To convert a byte array to a hex string you would to the following:

while (*Source != 0)   
    {   
    Target[0] = BASE16_ENCODELO(*Source);   
    Target[1] = BASE16_ENCODEHI(*Source);    

    Target += 2;   
    Source += 1;   
    }

Target is a pointer to a char array that contains a hex string. Source is a pointer to a char array that will contain the byte array.

Here are a few missing macros:

#define Char_IsLower(C)  ((uint8)(C - 'a') < 26)
#define Char_IsUpper(C)  ((uint8)(C - 'A') < 26)
#define Char_Upper(C)    (Char_IsLower(C) ? (C + ('A' - 'a')) : C)
#define Char_Lower(C)    (Char_IsUpper(C) ? (C + ('a' - 'A')) : C)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.