2

So file systems deal with bytes but I'm looking to read/write data to a file in bits.

I have a file that is ~ 850mb and the goal is to get it under 100 mb. I used delta + huffman encoding to generate a "code table" of binary. When you add all "bits" (aka the total number of 0s and 1s in the file) you get about 781,000,000 "bits" so theoretically I should be able to store these in about 90mb or so. This is where I'm running into a problem.

Based on other answers I've seen around SO, this is the closest I've gotten:

with open(r'encoded_file.bin', 'wb') as f:
    for val in filedict:
            int_val = int(val[::-1], base=2)
            bin_array = struct.pack('i', int_value)
            f.write(bin_array)

The val being passed along each iteration is the binary to be written. These do not have a fixed length and range from 10 from the most common to 111011001111001100 for the longest. The average code length is 5 bits. The above code generates a file of about 600mb, still way off the target.

Currently I am using Python 2.7, I can get to Python 3.x if I absolutely have to. Is it even possible in Python? Could a language like C or C++ do it easier?

5
  • keeping in mind that a file must contain a whole number of bytes (so you cannot actually write just 2 or 18 bits to a file) This is not directly possible but can be accomplished with some intermediate buffers. Commented May 13, 2016 at 19:06
  • So create buffers of 8 bits then go through some writing process? While padding the last one or something to that effect? Commented May 13, 2016 at 19:08
  • Are the values already in a binary format where you can just concatenate them together and later separate them unambiguously? Or do you have to encode extra data to indicate the boundaries? Commented May 13, 2016 at 19:20
  • @AlexHall Yes, its a prefix-free encoding so as long as the decoder starts reading from the beginning it'll be able to reconstruct the data unambiguously. Commented May 13, 2016 at 19:25
  • OK, so yes, AFAIK your first comment has the right idea. You manipulate the values with <</>>/& to pack pieces into 8-bit slots. I'm surprised I can't find any libraries that do this. It's probably rarely done in Python because it's the kind of thing that C[++] would do much faster. But I'm not an expert on these matters, so maybe wait for some more input. Commented May 13, 2016 at 19:39

1 Answer 1

2

Note: because the bytes object is just an alias to str in python 2 I wasn't able to find (decent) way of writing the following that worked for both versions without using if USING_VS_3.

As a minimal interface to go from a string of bits to bytes that can be written to a file you can use something like this:

def _gen_parts(bits):
    for start in range(0,len(bits),8):
        b = int(bits[start:start+8], base=2)
        if USING_VS_3:
            yield b #bytes takes an iterator of ints
        else:
            yield chr(b)

def bits_to_bytes(bits): # -> (bytes, "leftover")
    split_i = -(len(bits)%8)
    byte_gen = _gen_parts(bits[:split_i])
    if USING_VS_3:
        whole = bytes(byte_gen)
    else:
        whole = "".join(byte_gen)
    return whole, bits[split_i:]

So giving a string of binary data like '111011001111001100' tobits_to_bytes` will return a 2 item tuple of (byte data to write to file) and (left over bits).

Then a simple and un-optimized file interface to handle the partial-byte-buffer could be like this:

class Bit_writer:
    def __init__(self,file):
        self.file = file
        self.buffer = ""

    def write(self,bits):
        byte_data, self.buffer = bits_to_bytes(self.buffer + bits)
        self.file.write(byte_data)

    def close(self):
        #you may want to handle the padding differently?
        byte_data,_ = bits_to_bytes("{0.buffer:0<8}".format(self))
        self.file.write(byte_data)
        self.file.close()

    def __enter__(self): # This will let you use a 'with' block
        return self
    def __exit__(self,*unused):
        self.file.close()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.