C++ Best Practices: Serialization and Abstract Classes without library access

Question

Preface: Serialization libraries cannot be used due to restrictions in the development environment, and the latest version usable is C++ 11.

I have various struct that need to be serialized and deserialized so that they may be broadcast and received over UDP.

I would like to define an abstract class for serializable types that has serialization and deserialization methods to be implemented by various message types. This is what I've been working with, but it "smells" to me. Ideally it would be nice to make Deserialize static and have it return a concrete implementation of Serializable, but virtual methods cannot be static:

class Serializable {
public:
    virtual std::size_t Serialize(char* buffer, const unsigned int max_message_size) = 0;
    virtual bool Deserialize(const char* buffer, const unsigned int max_message_size) = 0;
};

An abstract UDP specific subclass of this may be as follows:

template<typename T>
class UdpMessageBase : public Serializable {
protected:
    UDPHeader header; // struct that contains primitive types for metadata about message
    T data; // struct that contains message data, subclasses will specialize, see below
    bool populated = false;
public:
    virtual ~UdpMessageBase() = 0; //abstract class
    T getData() {return data;}
    UDPHeader getHeader() {return header;}
};

And finally a specific implementation of this for a specific message:

class MySpecificUdpMessage : public UdpMessageBase<structForMySpecificUdpMessage> {
public:
    MySpecificUdpMessage() {
        initHeader();
    }

    MySpecificUdpMessage(structForMySpecificUdpMessage data) : data(data) {
        initHeader();
        populated = true;
    }
    
    std::size_t Serialize(char* buffer, const unsigned int max_message_size) {
        if (populated) {
            // serialize the header and data into buffer to be used by caller
            // return total size of serialized data
        }
        else { // throw error };
    }

    bool Deserialize(const char* buffer, const unsigned int max_message_size) {
        if (populated) {
            // throw error
        }
        else {
            // deserialize buffer into header and data of this instance of the object
            populated = true;
            // return true or false based on success
        }
    }

private:
    void initHeader() {
        //set header values specific to this message type
    }
};

Example usage:

// for outgoing udp message where we have the data
dataForMySpecificUdpMessage someDataToSend; // assume this was passed into this method
char* buffer; // buffer we want to serialize data into
MySpecificUdpMessage messageToSend = MySpecificUdpMessage(someDataToSend);
messageToSend.Serialize(buffer);
SendUDPMessage(buffer); //arbitrary interface that sends the buffer over UDP to client

//for incoming UDP message
char* buffer; //incoming populated data buffer
MySpecificUdpMessage incomingMessage; 
incomingMessage.Deserialize(buffer);
messageProcessor(incomingMessage); //arbitrary message processor
// or
dataProcessor(incomingMessage.getData()); //arbitrary data process for the struct

One alternative approach I've though of is a Serializer class that has a bunch of static overloaded Serialize methods that have different implementations based on the passed in struct. They would simply populate the header based on that info, and deserialize data based on deserialized header, but that class would grow with more messages and doesn't seem very decoupled. I just feel like I'm missing some obvious better practice.

Any tips greatly appreciated.

I'd leave UdpMessageBase out out of this. A class that serializes to a generic stream of bytes will prove more useful in the long run, and what if you find you have to serialize a class to UDP and to a file? Inheritance gets messy at that point. But if you just get an array of bytes you can pass that to pretty much any media handler and let the handler do the job inserting any extra protocol wrappings needed by that media. — user4581301
– user4581301, Commented Jul 11, 2024 at 20:46
@user4581301 So my thinking with UdpMessageBase was to enforce the udp-specific-message-header metadata be packaged with the data structure itself upon serialization or deserialization. The header data may be different or unneeded for non-udp variants of the message the serialize the data, and I don't want an implementer to leave out the header metadata for any udp serializations and only serialize the data in ignorance. The header data is also message specific, not protocol specific. — cma0014
– cma0014, Commented Jul 11, 2024 at 20:52
In my solution, the UDP-specific header information would be applied as a protocol wrapper around the message (or fragments of message if the UDP handler had to break the message up to ensure it fit the smallest MTU). That way you have the data and the media protocols totally separate. Makes the code more reusable and generally makes for easier debugging. — user4581301
– user4581301, Commented Jul 11, 2024 at 21:31

Wutz · Accepted Answer · 2024-07-12 00:10:01Z

The question is a little open, so I'll just give my preferred (and so far successful) way of implementing serialization/deserialization, along with the reasons why I like it. The idea is heavily inspired by how fmt (now std::format) does formatters.

Looking at serialize first:

// The customization point to add serializer implementations for your custom structs
template<typename T>
struct Serializer;

// Example: Serialization for integral types
template<std::integral Integer>
struct Serializer<Integer> {
    auto operator()(Integer value, auto out) const {
        // If you need certain endianness, take care of that here
        return std::copy_n(reinterpret_cast<const std::byte*>(&value), sizeof(value), out);
    }
};

// Returns the iterator past the last written byte
template<typename T, std::output_iterator<std::byte> Iterator>
Iterator serialize(const T& value, Iterator target) {
    return Serializer<T>{}(value, target);
}
// Helper overload that returns a vector, but can be more generalized to support other containers
template<typename T>
std::vector<std::byte> serialize(const T& value) {
    std::vector<std::byte> bytes{};
    serialize(value, std::back_inserter(bytes));
    return bytes;
}

Implementing a serializer for your own struct:

// MyType.hpp contains:

// ... your struct
struct MyType {
    int number;
    long long bignumber;
};

// ... and then its serializer, so you always have it available
template<>
struct Serializer<MyType> {
    auto operator()(const MyType& value, auto out) const {
        out = serialize(value.number, out);
        return serialize(value.bignumber, out);
    }
}

The plan for deserialize is similar. I went for contiguous_iterator in the following code, because I realized you probably don't need your serialization to work for non-contiguous containers.

template<typename T>
struct Deserializer;

template<std::integral Integer>
struct Deserializer<Integer> {
    template<std::contiguous_iterator Iterator, std::sized_sentinel_for<Iterator> Sentinel>
    auto operator()(Iterator begin, Sentinel end) const {
        if ((end - begin) < sizeof(Integer)) {
            throw std::runtime_error{"not enough data"};
        }
        Integer value;
        std::memcpy(&value, std::to_address(begin), sizeof(Integer));
        // You'll need certain endianness; if native != desired you'll need to byteswap
        return std::make_pair(value, begin + sizeof(Integer));
    }
};

template<typename T, std::contiguous_iterator Iterator, std::sized_sentinel_for<Iterator> Sentinel>
std::pair<T, Iterator> deserialize(Iterator begin, Sentinel end) {
    return Deserializer<T>{}(begin, end);
}
// Helper taking a range, guarantees all bytes are parsed, otherwise throws
template<typename T, std::ranges::contiguous_range Range>
T deserialize(const Range& range) {
    auto [value, pos] = deserialize(std::begin(range), std::end(range));
    if (pos != std::end(range) throw std::runtime_error{"Not all bytes were deserialized"};
    // move the returnvalue, because it's a structured binding
    // not sure this is still necessary in newer standards?
    return std::move(value);
}

Usage

MyRequest request{...};

// Creates a vector, which is not the most efficient thing
udpSocket.send(serialize(request));

// Alternatively: serialize to a buffer if the size is known ahead of time
// This can easily be made into a helper method alongside serialize
std::array<std::byte, sizeof(MyRequest)> buffer;
serialize(request, buffer.begin());
udpSocket.send(serialize(request));

// complexities of UDP are not part of this sample, just assume we received a full response struct...
auto bytes = udpSocket.receive();

auto response = deserialize<MyResponse>(bytes);

// here we know response was successfully parsed and used all input bytes

Advantages

Can add (de)serialization for any type, even ones you don't control (std::string, int, ...)
Can implement only the direction you need; no need to implement serialization for a type you only deserialize and vice versa
Can keep the types very simple - no virtual dtors etc
Composable; for example serialization of complex types can be implemented by calling serialize with simple types that it's made up of
Extra helper overloads of serialize/deserialize can easily be added as long as they're somehow implementable with iterators
Extra features, like sizes known at compile time, can be implemented to conditionally optimize implementations of serialize/deserialize for types that support them

If the concrete class of the message is not known at runtime, or there are possibly multiple output formats to serialize to, the visitor pattern could be helpful as well.
I appreciate the detailed response and it makes a lot of sense, but I foolishly left out another important caveat: I'm restricted to c++ 11. This uses a lot of newer std stuff that I'm not familiar with at all so it took some reading to follow. I'll edit the question, and appreciate your time. I think I can adapt the concept in general.

Collectives™ on Stack Overflow

C++ Best Practices: Serialization and Abstract Classes without library access

1 Answer 1

Usage

Advantages

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Usage

Advantages

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related