6

Here is a contrived example of how a lot of our classes return binary representations (to be read by C++) of themselves.

def to_binary(self):
    'Return the binary representation as a string.'
    data = []

    # Binary version number.
    data.append(struct.pack('<I', [2]))

    # Image size.
    data.append(struct.pack('<II', *self.image.size))

    # Attribute count.
    data.append(struct.pack('<I', len(self.attributes)))

    # Attributes.
    for attribute in self.attributes:

        # Id.
        data.append(struct.pack('<I', attribute.id))

        # Type.
        data.append(struct.pack('<H', attribute.type))

        # Extra Type.        
        if attribute.type == 0:
            data.append(struct.pack('<I', attribute.typeEx))

    return ''.join(data)

What I dislike:

  • Every line starts with data.append(struct.pack(, distracting from the unique part of the line.
  • The byte order ('<') is repeated over and over again.
  • You have to remember to return the boilerplate ''.join(data).

What I like:

  • The format specifiers appear near the attribute name. E.g., it's easy to see that self.image.size is written out as two unsigned ints.
  • The lines are (mostly) independent. E.g., To remove the Id field from an 'attribute', you don't have to touch more than one line of code.

Is there a more readable/pythonic way to do this?

8 Answers 8

4
from StringIO import StringIO
import struct

class BinaryIO(StringIO):
    def writepack(self, fmt, *values):
        self.write(struct.pack('<' + fmt, *values))

def to_binary_example():
    data = BinaryIO()
    data.writepack('I', 42)
    data.writepack('II', 1, 2)
    return data.getvalue()
Sign up to request clarification or add additional context in comments.

Comments

4

You can try to implement some sort of declarative syntax for your data.

Which may result in something like:

class Image(SomeClassWithMetamagic):
    type = PackedValue(2)
    attribute = PackedValue('attributes') # accessed via self.__dict__

#or using decorators
    @pack("<II")
    def get_size():
        pass

#and a generic function in the Superclass
    def get_packed():
        stuff

etc...

Other examples would be SQLAlchemy's declarative_base, ToscaWidgets and sprox

1 Comment

The declarative syntax is good if you don't need complex programmatic logic to build the serialization (i.e. lots of ifs and fors). I have used the declarative approach to specify the serialization, deserialization and automatically generated documentation in one go for a binary fileformat.
2

How about protocol buffers google's extensive cross language format and protocol of sharing data.

Comments

2

If you just want nicer syntax, you can abuse generators/decorators:

from functools import wraps    

def packed(g):
  '''a decorator that packs the list data items
     that is generated by the decorated function
  '''
  @wraps(g)
  def wrapper(*p, **kw):
    data = []
    for params in g(*p, **kw):
      fmt = params[0]
      fields = params[1:]
      data.append(struct.pack('<'+fmt, *fields))
    return ''.join(data)    
  return wrapper

@packed
def as_binary(self):
  '''just |yield|s the data items that should be packed
     by the decorator
  '''
  yield 'I', [2]
  yield 'II', self.image.size[0], self.image.size[1]
  yield 'I', len(self.attributes)

  for attribute in self.attributes:
    yield 'I', attribute.id
    yield 'H', attribute.type
    if attribute.type == 0:
      yield 'I', attribute.typeEx

Basically this uses the generator to implement a "monad", an abstraction usually found in functional languages like Haskell. It separates the generation of some values from the code that decides how to combine these values together. It's more a functional programming approach then "pythonic", but I think it improves readability.

1 Comment

+1. I was literally just seconds away of posting the exact same solution. One small improvement to enhance readability would be to encapsulate the datatype string in a function so yield 'I', attribute.id becomes yield UInt(attribute.id).
1
def to_binary(self):
    struct_i_pack = struct.Struct('<I').pack
    struct_ii_pack = struct.Struct('<II').pack
    struct_h_pack = struct.Struct('<H').pack
    struct_ih_pack = struct.Struct('<IH').pack
    struct_ihi_pack = struct.Struct('<IHI').pack

    return ''.join([
        struct_i_pack(2),
        struct_ii_pack(*self.image.size),
        struct_i_pack(len(self.attributes)),
        ''.join([
            struct_ih_pack(a.id, a.type) if a.type else struct_ihi_pack(a.id, a.type, a.typeEx)
            for a in attributes
        ])
    ])

Comments

0

You could refactor your code to wrap boilerplate in a class. Something like:

def to_binary(self):
    'Return the binary representation as a string.'
    binary = BinaryWrapper()

    # Binary version number.
    binary.pack('<I', [2])

    # alternatively, you can pass an array
    stuff = [
        ('<II', *self.image.size),          # Image size.
        ('<I', len(self.attributes)),       # Attribute count
    ]
    binary.pack_all(stuff)

    return binary.get_packed()

Comments

0

The worst problem is that you need corresponding code in C++ to read the output. Can you reasonably arrange to have both the reading and writing code mechanically derive from or use a common specification? How to go about that depends on your C++ needs as much as Python.

Comments

0

You can get rid of the repetition while still as readable easily like this:

def to_binary(self):     
    output = struct.pack(
        '<IIII', 2, self.image.size[0], self.image.size[1], len(self.attributes)
    )
    return output + ''.join(
        struct.pack('<IHI', attribute.id, attribute.type, attribute.typeEx)
        for attribute in self.attributes
    )

1 Comment

I think you missed "if attribute.type == 0:"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.