So file systems deal with bytes but I'm looking to read/write data to a file in bits.
I have a file that is ~ 850mb and the goal is to get it under 100 mb. I used delta + huffman encoding to generate a "code table" of binary. When you add all "bits" (aka the total number of 0s and 1s in the file) you get about 781,000,000 "bits" so theoretically I should be able to store these in about 90mb or so. This is where I'm running into a problem.
Based on other answers I've seen around SO, this is the closest I've gotten:
with open(r'encoded_file.bin', 'wb') as f:
for val in filedict:
int_val = int(val[::-1], base=2)
bin_array = struct.pack('i', int_value)
f.write(bin_array)
The val being passed along each iteration is the binary to be written. These do not have a fixed length and range from 10 from the most common to 111011001111001100 for the longest. The average code length is 5 bits. The above code generates a file of about 600mb, still way off the target.
Currently I am using Python 2.7, I can get to Python 3.x if I absolutely have to. Is it even possible in Python? Could a language like C or C++ do it easier?
<</>>/&to pack pieces into 8-bit slots. I'm surprised I can't find any libraries that do this. It's probably rarely done in Python because it's the kind of thing that C[++] would do much faster. But I'm not an expert on these matters, so maybe wait for some more input.