4

I'm reading this guide about network programming, which I'm liking a lot: https://beej.us/guide/bgnet/html/split/slightly-advanced-techniques.html#serialization

I'm confused about something though. In this section about serialization, he talks about serializing ints for byte-ordering reasons, which makes sense to me, but he also includes these two functions pack754 and unpack754 for serializing floats in IEEE-754 format.

uint64_t pack754(long double f, unsigned bits, unsigned expbits)
{
    long double fnorm;
    int shift;
    long long sign, exp, significand;
    unsigned significandbits = bits - expbits - 1; // -1 for sign bit

    if (f == 0.0) return 0; // get this special case out of the way

    // check sign and begin normalization
    if (f < 0) { sign = 1; fnorm = -f; }
    else { sign = 0; fnorm = f; }

    // get the normalized form of f and track the exponent
    shift = 0;
    while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
    fnorm = fnorm - 1.0;

    // calculate the binary form (non-float) of the significand data
    significand = fnorm * ((1LL<<significandbits) + 0.5f);

    // get the biased exponent
    exp = shift + ((1<<(expbits-1)) - 1); // shift + bias

    // return the final answer
    return (sign<<(bits-1)) | (exp<<(bits-expbits-1)) | significand;
}

long double unpack754(uint64_t i, unsigned bits, unsigned expbits)
{
    long double result;
    long long shift;
    unsigned bias;
    unsigned significandbits = bits - expbits - 1; // -1 for sign bit

    if (i == 0) return 0.0;

    // pull the significand
    result = (i&((1LL<<significandbits)-1)); // mask
    result /= (1LL<<significandbits); // convert back to float
    result += 1.0f; // add the one back on

    // deal with the exponent
    bias = (1<<(expbits-1)) - 1;
    shift = ((i>>significandbits)&((1LL<<expbits)-1)) - bias;
    while(shift > 0) { result *= 2.0; shift--; }
    while(shift < 0) { result /= 2.0; shift++; }

    // sign it
    result *= (i>>(bits-1))&1? -1.0: 1.0;

    return result;
}

What I'm confused about is that these functions work by looking at the first bit for the sign, then the next X bits for the exponent, then the next Y bits for the mantissa. So doesn't that mean the float has to already be in IEEE-754 format on the host machine for this to work?

Is this just here to explain the format, or is this something you would actually do in real life?

8
  • 4
    IEEE-754 specifies the encoding of floating-point values into bit strings and the reverse decoding. It does not specify how those bits are ordered in bytes. If you just sent the raw bytes that encode an IEEE-754 binary64 (“double precision”) on one system and reinterpreted them on another system, you would get a different value if the systems used different endianness. Commented Sep 24, 2024 at 19:01
  • 1
    Just as you have to worry about big-endian vs little-endian representations for multi-byte integer types, so, too, you also have to worry about big-endian vs little-endian representations for (necessarily multi-byte) floating-point numbers. Commented Sep 24, 2024 at 19:04
  • 3
    That said, this is a crude and inefficient routine. It looks like it does not handle NaNs, infinities, −0, or subnormal values. There is no need for loops to normalize or to reconstruct the exponent because C has frexp and ldexp for that. Commented Sep 24, 2024 at 19:04
  • Note how both functions only touch the floating point number with floating point operations. So the operation is agnostic to the internal representation of floats as long as the basic assumptions about sign + siginificant + exponent hold, which they do across basically all platforms, even those that do not follow IEEE-754 Commented Sep 24, 2024 at 19:05
  • 1
    IEEE-754 defines the exact bit order in what it calls the "binary interchange format" from MSB to LSB. If we assume that your hardware uses IEEE-754 for its internal representation and that the endianess of the integer unit fits the endianess of the floating point unit, then treating the encoding /decoding as a 32 or 64 bit integer endianess swap is acceptable. I guess those are assumptions that the author does not want to make. Commented Sep 24, 2024 at 19:40

1 Answer 1

5

Is Serializing Floats Necessary for Cross-Platform Network Code?

Yes. FP encoding has many variations across implementations including variations is size, endian, precision ,exponent range, sub-normal support (and possible even base).

So doesn't that mean the float has to already be in IEEE-754 format on the host machine for this to work?

No, the pack/unpack will "work" (see following problems) even if long double is not IEEE.

Is this just here to explain the format, or is this something you would actually do in real life?

Looks like learner code. I would not use the provided pack/unpack code, given its weaknesses (below) and especially the 2 very inefficient while loops. Loops may iterate thousands of times with binary128.

The code is a hole-riddled attempt to pack an arbitrary encoded long double into an IEEE binary64. It fails for values near 0.0, rounding, handle overflow and infinity/NAN well.


pack754() has at least these short-comings:

  • if (f == 0.0) return 0; loses information during serialization as it returns 0 for both +0.0 and -0.0. When testing the FP sign bit, do not use if (f < 0), but if (signbit(f)) to well extract the sign bit even if f is zero or NAN.

  • long double may be more than 64 bits so uint64_t pack754(long double f, unsigned bits, unsigned expbits) loses info in trying to pack into 64-bits. I suppose OP is tolerating this info loss.

  • 1LL<<significandbits is UB on overflow (significandbits >= 63). 1ULL<<significandbits has some advantage, yet overflow (significandbits >= 64) remains a problem.

  • Using float math with the later long double math is short sighted. ((1LL<<significandbits) + 0.5L) makes a little more sense.

  • Rather than while(fnorm >= 2.0) like code, use long double frexpl(long double value, int *p) to extract a normalized value and exponent. Use long double ldexpl(long double x, int p) to re-combine. while(fnorm >= 2.0) { fnorm /= 2.0; shift++; } risks an infinite loop when fnorm is infinity.

  • + 0.5f for rounding has many corners issues. Better to use lround() and friends.

  • ...


For simple cross platform exchange of FP values, I'd consider sprintf(buf, "%La", x) as a first step to pack and strtold() to unpack.

Packing a FP into a tight intN_t and maintaining precision/range faithfulness across many computer implementations are competing goals.
Which is more important: faithful conversions or small packet size?
Most systems I've worked with prize faithful conversions over small packet size.

Packing a long double, for portability, into a 64-bit is simply an unwise design.

Sign up to request clarification or add additional context in comments.

2 Comments

Looks like learner code +1 for wading through all that muck. At least he didn't literally go to 11 on his levels of nesting, like he did here. Ouch.
A very thorough explanation, tyvm :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.