4

C23 added support for binary literals (0xb10100110) and for grouping them for human readability using a separator character ('). This is great for input, but how does one change the printf grouping to sets of four bits (a hexadecimal "digit") instead of three bits (an octal digit)?

#include <stdio.h>
#include <locale.h>

int main() {
    setlocale(LC_ALL, "en_US");
    unsigned long long int i = 0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001;
    printf("binary: %'#0*llb\n", (int)sizeof(i) * 8, i); 
    return 0;
}

I would like the output to look like this:

0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001

not like this:

0b1,000,100,100,010,000,100,001,111,010,010,110,001,011,011,000,001,110,010,110,001

Update: From the answers given so far, it appears C (as of 2025) does not have a natural way to do what I'm asking. Most of the proposed answers require explicitly calling a special function to convert the number into a string. I'm hoping there exists a more transparent solution, even if that means using a GNU extension.

7
  • 1
    Btw. that's a GNU extension and not a C23 feature. Commented Aug 18, 2024 at 9:21
  • 1
    Looks like you want to change the .grouping of the locale from 3 to 4. Commented Aug 18, 2024 at 21:55
  • 1
    Neither of the versions are readable no matter what you do. This is why there was so much hesitation to include binary in the C standard to begin with. Hex however would be perfectly readable. Commented Aug 19, 2024 at 6:27
  • 1
    @cremno, nope, it's a C23 addition. You can read it in section 6.4.4.1 of ISO-IEC-C-9899-DRAFT-N3096-apr-2023. Personally, I prefer the Java _ approach, which is more readable. But this is C. Commented Aug 19, 2024 at 8:00
  • 1
    @LuisColorado: I'm not talking about integer constants. The ' printf flag isn't a C23 feature. Commented Aug 19, 2024 at 17:10

8 Answers 8

3

Digit grouping is not a standard feature in the printf family of functions in Standard C. It is a POSIX extension supported by the GNU libc and other unix libraries. It applies to decimal integer conversions (%i, %d and %u) and the integral portion of floating point conversions (%f, %g and %G).

The number of digits in each group cannot be specified on an conversion basis nor depending on the base or the group number, it is specified in the locale.

The thousands separator is specified in the locale as well as the decimal separator that will be used for floating point conversions. The purpose is the conversion of currency amounts consistent with the local culture. Note that in some cases, the number of digits is not the same for all groups (eg: the Indian numbering system), so this feature is only half-baked.

This feature does not meet your goal as it does not apply to hexadecimal or binary conversions and modifying the locale definition is risky anyway.

Here is a simple function to format integers in different bases where you can specify both the grouping number and the separator.

#include <limits.h>
#include <stdio.h>

/* Convert an integer with parameterized grouping
 * return -1 if base is invalid
 * returns the length of the output without truncation
 */
int format_ull(char *dest,              /* destination array */
               size_t size,             /* array length */
               unsigned long long n,    /* value to convert */
               int base,                /* output radix, 0 or 2..36 */
               int mindigits,           /* minimum number of digits */
               int grouping,            /* group length, 0 for no groups */
               int sep)                 /* separator character */
{
    char buf[sizeof(n) * CHAR_BIT];
    char *p = buf + sizeof(buf);
    const char digits[] = "0123456789abcdefghijklmnopqrstuvwxyz";
    int ndigits, nzeroes, phase;
    size_t pos, len;

    if (base < 2 || base > 36) {
        if (base == 0)
            base = 10;
        else
            return -1;
    }
    while (n) {
        *--p = digits[n % (unsigned)base];
        n = n / (unsigned)base;
    }
    ndigits = buf + sizeof(buf) - p;
    nzeroes = mindigits > ndigits ? mindigits - ndigits : 0;
    ndigits += nzeroes;
    if (grouping <= 0 || ndigits <= grouping) {
        len = phase = ndigits;
    } else {
        phase = (ndigits + grouping - 1) % grouping + 1;
        len = ndigits + (ndigits - 1) / grouping;
    }
    if (size > 0) {
        size--;
        if (size > len)
            size = len;
        for (pos = 0; pos < size; pos++) {
            if (phase-- > 0) {
                dest[pos] = (char)((nzeroes-- > 0) ? '0' : *p++);
            } else {
                phase = grouping;
                dest[pos] = (char)sep;
            }
        }
        dest[pos] = '\0';
    }
    return (int)len;
}

#define TEST(n)  test(n, #n)
void test(unsigned long long n, const char *source) {
    char buf[100];
    int nbits = sizeof(n) * CHAR_BIT;
    int len;
    printf("source:    %s\n", source);
    printf("normal:    %llu\n", n);
    format_ull(buf, sizeof buf, n, 10, 1, 3, ',');
    printf("base 10/3: %s\n", buf);
    len = format_ull(buf, sizeof buf, n, 8, 1, 3, '\'');
    printf("base 8/3:  %.*s%s\n", (*buf != '0') * (1 + (len % 4 == 3)), "0'", buf);
    format_ull(buf, sizeof buf, n, 16, nbits / 4, 4, '\'');
    printf("base 16/4: 0x%s\n", buf);
    format_ull(buf, sizeof buf, n, 2, nbits, 8, '\'');
    printf("base 2/8:  0b%s\n", buf);
    printf("\n");
}

int main(void) {
    TEST(0);
    TEST(ULLONG_MAX);
    TEST(0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001);
    return 0;
}

Output:

source:    0
normal:    0
base 10/3: 0
base 8/3:  0
base 16/4: 0x0000'0000'0000'0000
base 2/8:  0b00000000'00000000'00000000'00000000'00000000'00000000'00000000'00000000

source:    ULLONG_MAX
normal:    18446744073709551615
base 10/3: 18,446,744,073,709,551,615
base 8/3:  01'777'777'777'777'777'777'777
base 16/4: 0xffff'ffff'ffff'ffff
base 2/8:  0b11111111'11111111'11111111'11111111'11111111'11111111'11111111'11111111

source:    0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001
normal:    1234567890987654321
base 10/3: 1,234,567,890,987,654,321
base 8/3:  0'104'420'417'226'133'016'261
base 16/4: 0x1122'10f4'b16c'1cb1
base 2/8:  0b00010001'00100010'00010000'11110100'10110001'01101100'00011100'10110001

For your specific purpose, here is a simpler version for binary output grouped in sets of 4 bits. It uses a static buffer so it is not reentrant and can only be used once per printf call:

#include <limits.h>
#include <stdio.h>

/* Convert an integer to binary, grouping digits in sets of 4
 */
const char *format_bin4(int prefix, int min_digits, unsigned long long n) {
    static char buf[2 + sizeof(n) * CHAR_BIT * 5 / 4 + 1];
    char *p = buf + sizeof(buf);
    int group = 4;
    int i;

    if (n == 0) prefix--;
    *--p = '\0';
    for (i = 0; p > buf + 2 && (i < min_digits || n != 0); i++) {
        if (!group--) {
            *--p = '\'';
            group = 3;
        }
        *--p = '0' + (n & 1);
        n >>= 1;
    }
    if (prefix > 0) {
        *--p = 'b';
        *--p = '0';
    }
    return p;
}

int main(void) {
    unsigned long long x =  0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001;
    const char *x_source = "0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001";
    printf("         0,  0: %s\n", format_bin4(1, 0, 0));
    printf("         0,  1: %s\n", format_bin4(1, 1, 0));
    printf("         0,  2: %s\n", format_bin4(1, 2, 0));
    printf("         0, 64: %s\n", format_bin4(1, 64, 0));
    printf("ULLONG_MAX, 64: %s\n", format_bin4(1, 64, ULLONG_MAX));
    printf("  x source, 64: %s\n", x_source);
    printf("  x format, 64: %s\n", format_bin4(1, 64, x));
    return 0;
}

Output:

         0,  0:
         0,  1: 0
         0,  2: 00
         0, 64: 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000
ULLONG_MAX, 64: 0b1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111
  x source, 64: 0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001
  x format, 64: 0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001
Sign up to request clarification or add additional context in comments.

3 Comments

Oh, I had thought digit grouping was part of POSIX. Is it not? If it is just a GNU extension, maybe I should go full GNU and call register_printf_function so I can use %B to print formatted binary.
@hackerb9: you are correct, the ' flag is specified in the Single UNIX Specification
@hackerb9: register_printf_function OTOH is a GNU extension and it seems you can redefine the behavior of standard conversions: %#B performs a binary conversion with a 0B prefix for non zero arguments, probably not a problem for you.
2
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbit.h>
/**
nibbles(str, u)
does the same job as sprintf(str, u)
but groups nibbles
str : where to store nibbles, 
u : number to convert to binary string
returns str
*/
char *
nibbles(char *str, unsigned long u)
{
    // case u = 0
    if(u == 0) {*str='0'; str[1] = '\0'; return str; }
    /*
    compute te length in digits of a unsigned long 
    minus the count of leading zeros
    */
    //unsigned int len = sizeof(u)*8 - __builtin_clzl(u) - 1; 
    unsigned int len = stdc_bit_width(u) - 1; 
    int i = 4 - len % 4;  // before the first quote
    char *p = str;
    for(unsigned long v = 1UL << len; v; v >>= 1)
    {
        *p++ = v & u ? '1' : '0';  // bit value
        if(v > 1UL && i++ % 4 == 0) *p++ = '\'';  // add '\''
    }
    *p = '\0'; // end str                             
    return str;
}

int main() 
{
    unsigned long x;
    char s[81];
    puts("input the unsigned long to convert\nq to quit");
    while(printf("> ") && scanf("%ld", &x) == 1) 
    {
        printf("standard C23 output:%32lb\n", x);
        printf("grouping nibbles:   %32s\n", nibbles(s, x)); 
    }
}

Outputs :

input the unsigned long to convert
q to quit
> 123
standard C23 output:                         1111011
grouping nibbles:                           111'1011
> 65231
standard C23 output:                1111111011001111
grouping nibbles:                1111'1110'1100'1111
> q

Thank you chux for the comment. I hope I improved my source.

1 Comment

Thank you chux. I updated my source using stdc_bit_width
2

Simplest possible, standard-compliant code:

#include <stdio.h>
#include <stdint.h>

void print_nibbles (uint32_t n, char delim)
{
  for(size_t i=0; i<8; i++)
  {
    uint32_t masked = n >> (32-4 - i*4) & 0xFu;
    printf("%.4b%c", masked, i+1==8 ? '\0' : delim);
  }
}


int main() 
{
  uint32_t binary_goo = 0b0001'1010'0101'1111'1100'0011'1110'0001;
  print_nibbles(binary_goo, '\'');
}

Output

0001'1010'0101'1111'1100'0011'1110'0001
  • This shifts down each nibble starting at msb into the 4 least significant bits, then masks away everything else.
  • %.4b to printf gives minimum precision but in this case numbers will never have a bigger value than 4 digits.
  • Whenever we use a precision with printf, all results get padded with zeroes rather than spaces.
  • For a generic version swap magic number 8 with sizeof(the int)*2 and swap the magic number 32 with sizeof(the int) * CHAR_BITS.

5 Comments

Whilst probably the best answer here, I don't think this is quite as simple as it could be. I think that for (size_t shift = 32; shift; shift -= 4) makes for a simpler loop. Or perhaps for (size_t shift = 32; shift; ) { shift -= 4; … } - then we get masked = (n >> shift) & 0xFu and shift ? delim : '\0' with no additions or multiplications inside the loop.
Generic version of magic number 8 looks wrong - shouldn't it be ceil(sizeof n * CHAR_BIT / 4)?
@TobySpeight Yeah it might be possible to make it even more readable (defining all the magic numbers as constants in particular). As for the generic version, I didn't consider portability to exotic systems. If bytes aren't 8 bits then is the term nibble even meaningful?
You were inconsistent, because you used CHAR_BIT for the other constant (sizeof n * CHAR_BIT - I assume the S was a typo). I've not used any systems with 16-bit char, but I think formatting in 4-bit chunks would make sense for those. Perhaps less so when CHAR_BIT isn't a multiple of 4, though.
@TobySpeight Perhaps - what I mainly had in mind for a generic version was something that could handle uint8_t to uint64_t on the same mainstream target, rather than something fully portable to PDP-11 and DSPs :)
2

how does one change the printf grouping to sets of four bits (a hexadecimal "digit") instead of three bits (an octal digit)?

printf() lacks a direct convenient solution.

Alternatively we can code a helper function:

// Something like
char *ull2str(unsigned long long)

Yet since this is a printf()- like problem, how to easily handle the buffer/space management?.

printf("1st:%s 2nd:%s 3rd:%s\n", 
    ull2str(0), ull2str(42), ull2str(ULLONG_MAX));

Consider using a compound literal.

Now with the buffer management issue addressed, then proceed to determine the details of the helper function.

#include <assert.h>
#include <limits.h>
#include <stdio.h>

// Compound literal C99 or later
#define ULL2STR_SIZE (sizeof(unsigned long long)*CHAR_BIT*5/4 + 2 + 2)
#define ULL2STR(x) ull2str(ULL2STR_SIZE, (char [ULL2STR_SIZE]){""}, (x))
//                                       ^----compound literal---^ 

// Use dest[] to form the answer.  Return a pointer to somewhere in dest[].
char* ull2str(size_t sz, char dest[sz], unsigned long long x) {
  assert(sz > 0 && dest != NULL);
  char *s = dest + sz;
  *--s = '\0';
  unsigned count = 0;
  do {
    assert(--sz > 2);
    *--s = "01"[x & 1];
    x >>= 1;
    count++;
    if (count % 4 == 0 && x) {
      *--s = '\'';
    }
  } while (x > 0);
  *--s = 'b';
  *--s = '0';
  return s;
}

int main() {
  printf("1st:%s 2nd:%s 3rd:%s\n", ULL2STR(0), ULL2STR(42),
      ULL2STR(ULLONG_MAX));
}

Output

1st:0b0 2nd:0b10'1010 3rd:0b1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111

Ref: itostr()

Comments

1

Probably not the answer you are looking for but did you consider this:

#define number 0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001
unsigned long long int i = number;
puts(STR(number));

Where STR is your average stringification macro:

#define STR_(s) #s
#define STR(s) STR_(s)

Now why does this make sense? Because ' is there solely for the benefit of the programmer. It is not for the compiler, not for the program and not for the end user. So since you then by definition already have a binary literal in your source for sure, why not simply print that one?

2 Comments

No, not the answer that would solve my problem. As you guessed, I do not want to print constants. Still, here's an upvote for being technically correct.
I guess the motivation is to produce results that can be inserted into program source - meaning the "end user" is a programmer.
1

Alas, Programmatically changing [stuff] with C setlocale() is possible, but requires modification of system resources. Boo.

So you are stuck writing code to do this yourself. Fortunately, it is very easy:

#include <stdio.h>
#include <locale.h>

size_t format_binary_required_size( size_t nbits )
{
    return nbits ? (nbits + (nbits-1)/4 + 3) : 4;
}

char * format_binary( char * s, size_t n, unsigned long long int value )
{
    static const char * digits = "01";
    s[--n] = 0;
    int sep = -1;
    while (n --> 2)
    {
        if (sep++ == 3)
        {
            sep = 0;
            s[n--] = '\'';
        }
        s[n] = digits[value & 1];
        value >>= 1;
    }
    s[  n] = 'b';
    s[--n] = '0';
    return s;
}

int main() {
  setlocale(LC_ALL, "en_US");
  unsigned long long int i =
    0b0001'0001'0010'0010'0001'0000'1111'0100'1011'0001'0110'1100'0001'1100'1011'0001;
  //printf("binary: %'#0*llb\n", (int)sizeof(i)*8, i); 
  size_t n = format_binary_required_size(sizeof(i)*8);
  char s[n];
  printf("binary: %s\n", format_binary(s,n,i));
  return 0;
}

Compile with C23 to make that binary literal work, but the code will otherwise work with older standards. Modify it as per your needs.

Oh, for documentation sake, the formatting function takes a string buffer to use, which must be n characters long, where 4 ≤ n. The formatted binary value will fill the string entirely, prefixing it with “0b” and properly terminating it with a nul byte.

Use the first function if you wish to know how big the string must be in order to output N bits. The result will always be for a minimum of 1 bit, even if nbits is zero.

Comments

1

Another solution using C23 printf and memmove

/* vim: et:ts=4:sw=4 
:w | !gcc -std=gnu23 % -o %< -Wall
!./%<
*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for memmove

char * 
nibbles(char *str, long x)
{
    int len; 
    char *p;
    len = sprintf(str, "%lb", x); // len = strlen(str)
    str[len + len/4] = '\0'; // null byte to end str
    // ptr to terminating '\0' of str
    p = str + len;
    for(int n = len/4; n >=0; n--) // loop over nibbles
    {
        p -= 4;
        memmove(p + n, p, 4); // translate the nibble in place
        if(p > str) *(p + n - 1) = '\''; // put the quote
    }
    return str;

}

int main(int argc, char *argv[]) 
{
    long x;
    char str[81]; // 64 bits + at most 16 quotes + '\0'
    puts("Enter a long int or not a digit to quit");
    while(printf("> ") && scanf("%ld", &x) == 1)
    {
        printf("%lb\n", x);
        printf("nibbles -> %s\n",nibbles(str, x));
    }
}

Yet another solution that doesn't use <string.h> and interprets the hexadecimal form as a sequence of quartets. The switch structure is chosen in the hope of improving speed.

char *
nibbles(char *str, unsigned long x)
{
    char hexa[17]; // at most 16 digits
    sprintf(hexa, "%lx", x);
    char *p = hexa; // ptr in hexa
    char *q = str; // ptr in the result
    for(; *p; p++)
    {
        switch (*p)
        {
            case '0' : q += sprintf(q, "%s", "0000'"); break;
            case '1' : q += sprintf(q, "%s", "0001'"); break;
            case '2' : q += sprintf(q, "%s", "0010'"); break;
            case '3' : q += sprintf(q, "%s", "0011'"); break;
            case '4' : q += sprintf(q, "%s", "0100'"); break;
            case '5' : q += sprintf(q, "%s", "0101'"); break;
            case '6' : q += sprintf(q, "%s", "0110'"); break;
            case '7' : q += sprintf(q, "%s", "0111'"); break;
            case '8' : q += sprintf(q, "%s", "1000'"); break;
            case '9' : q += sprintf(q, "%s", "1001'"); break;
            case 'a' : q += sprintf(q, "%s", "1010'"); break;
            case 'b' : q += sprintf(q, "%s", "1011'"); break;
            case 'c' : q += sprintf(q, "%s", "1100'"); break;
            case 'd' : q += sprintf(q, "%s", "1101'"); break;
            case 'e' : q += sprintf(q, "%s", "1110'"); break;
            case 'f' : q += sprintf(q, "%s", "1111'"); break;
        }
    }
    *(q - 1) = '\0'; // remove  the last quote
    return str;
}

@chux I thank you for scrutinizing my code and I'm grateful for your comments and good advice.

4 Comments

Since OP's example was unsigned long long, unclear why this answer uses the signed long. Note that %b is for unsigned types, else UB.
Minor points: Instead of 81, consider values based upon C23's _WIDTH. "at most 16 quotes" seems off-by-1. Is it not one less as there is no leading/trailing quote?
@chux You're right. I thought we could replace “long” with int or unsigned int or any other type of signed or unsigned integer.
You're right again chux, about the byte count. I should be ashamed, because 25 years ago I was a math teacher. But ever since I was a kid I've always had trouble with counting. N.B. Sorry, I don't speak English, but a little. Fortunately deepl is my friend.
1

The locale specifies how numbers are printed. But this is used for monetary info normally, and the characters used (in your case , commas) is the locale definition (as the number of digits in a group) are defined and fixed for the locale you use.

The use of tick marks ('), to separate digit groups (which can be done based on your preference to add clarity to the program listing) is some unrelated feature of the C language to make more readable the program listings.... and so, applies only to the source code. One is for your best reading, the other is mandated by the locale for printing monetary info (some locales use spaces, others use commas and other use tick marks, but that is fixed by the locale definition)

printf() is a library function, so its implementation can be (or not) coherent with earlier standards, and the formatting of source code is not related to how it implements the printing of numbers.

As you have been told in some of the comments, the group separator character (you use , in your locale) and the positions of it (each three decimal digits) doesn't apply well to the feature you want to reproduce, more by the fact that the ticks in the source are there for your convenience, you are free to group the digits the way you prefer (e.g. D. Knuth in his constants listing from "The art of computer programming" uses groups of three digits before the decimal point, but groups of five after it. e.g. 1,000 * PI = 3,141.59265 35897 93238 46264 33832 79502 8841)

If you read the standard definition (cf. 6.4.4.1) you will see that this is for the programmer convenience, to add readability, and ignored by the compiler.

7 Comments

Here's a proposal I wrote for how to best use C23 digit separators in engineering: Where to place digit separators in C23? Donald Knuth ought to study engineering notation I guess :)
Donald knuth is a teacher of mathematics in Stanford U. He is the author of TeX, and many other things... he is also honoured with Turing's price. He even has a pipe organ at home (he made it himself)
Good for him! Did he also create a rationale for why to use 5 digits when all engineers use 3? Given that engineering notation is based on multiples of 10^-3, 10^-6, 10^-9 and so on. Milli, micro, nano.
you cannot use it with floating point numbers, only with integer literals: I am afraid you are mistaken: n3096 specifies in 6.4.4.2 Floating constants that a digit-sequence: is either digit or digit-sequence ’opt digit so you can use ' separators in both parts of the mantissa and in the exponent.
You are right... sorry for the mistake. Thanks for the hint. Answer corrected.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.