11

The following code does not seem to read/write binary form correctly. It should read a binary file, bit-wise XOR the data and write it back to file. There are not any syntax errors but the data does not verify and I have tested the source data via another tool to confirm the xor key.

Update: per feedback in the comments, this is most likely due to the endianness of the system I was testing on.

xortools.py:

def four_byte_xor(buf, key):
    out = ''
    for i in range(0,len(buf)/4):
        c = struct.unpack("=I", buf[(i*4):(i*4)+4])[0]
        c ^= key
        out += struct.pack("=I", c)
    return out

Call to xortools.py:

from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
out_buf = open('outfile.bin','wb')
out_buf.write(four_byte_xor(in_buf, 0x01010101))
out_buf.close()

It appears that I need to read bytes per answer. How would the function above incorporate into the following as the function above manipulate multiple bytes? Or Does it not matter? Do I need to use struct?

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte:
        # Do stuff with byte.
        byte = f.read(1)

For an example the following file has 4 repeating bytes, 01020304:

before XOR

The data is XOR'd with a key of 01020304 which zeros the original bytes:

after XOR

Here is an attempt with the original function, in this case 05010501 is the result which is incorrect:

incorrect XOR attempt

2
  • Meant there are not any syntax errors. Question updated. Commented Jul 13, 2012 at 0:48
  • The problem is that the four_byte_xor() function doesn't xor the part of the buffer, if any, that's not a multiple of four bytes (hence its name). What would you like to do with any modulo 4 bytes in the buffer with respect to the key which it apparently expects to also be exactly four bytes long? Commented Jul 13, 2012 at 1:04

2 Answers 2

3

Here's a relatively easy solution (tested):

import sys
from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
orig_len = len(in_buf)
new_len = ((orig_len+3)//4)*4
if new_len > orig_len:
    in_buf += ''.join(['x\00']*(new_len-orig_len))
key = 0x01020304
if sys.byteorder == "little":  # adjust for endianess of processor
    key = struct.unpack(">I", struct.pack("<I", key))[0]
out_buf = four_byte_xor(in_buf, key)
f = open('outfile.bin','wb')
f.write(out_buf[:orig_len]) # only write bytes that were part of orig
f.close()

What it does is pad the length of the data up to a whole multiple of 4 bytes, xor's that with the four-byte key, but then only writes out data that was the length of the original.

This problem was a little tricky because the byte-order of the data for a 4-byte key depends on your processor but is always written with the high-byte first, but the byte order of string or bytearrays is always written low-byte first as shown in your hex dumps. To allow the key to be specified as a hex integer, it was necessary to add code to conditionally compensate for the differing representations -- i.e. to allow the key's bytes can be specified in the same order as the bytes appearing in the hex dumps.

Sign up to request clarification or add additional context in comments.

9 Comments

I received several syntax errors which trying to run this code.
@Astron: I won't be surprised, but suspect they're all trivial. I'll fix them when I have a chance a little later.
@Astron: Syntax errors are fixed now, but I won't have a chance to test it until later.
Some additional feedback: NameError: name 'out_buf' is not defined
@Astron: Ah, I was just able to reproduce the 05010501 output you were getting with 01020304 repeated data and a 01020304 key. The problem has to do with endianess. Considering the first 4 bytes of inbuf.bin as a 4 byte integer would result in a value of 0x04030201 on a big endian processor which would need a integer key of that value in order to produce the 00000000 after xor'ing you were expecting -- otherwise you end up with 05010105s in the outfile.bin.
|
2

Try this function:

def four_byte_xor(buf, key):
    outl = []
    for i in range(0, len(buf), 4):
        chunk = buf[i:i+4]
        v = struct.unpack(b"=I", chunk)[0]
        v ^= key
        outl.append(struct.pack(b"=I", v))
    return b"".join(outl)

I'm not sure you're actually taking the input by 4 bytes, but I didn't try to decipher it. This assumes your input is divisible by 4.

Edit, new function based in new input:

def four_byte_xor(buf, key):
    key = struct.pack(b">I", key)
    buf = bytearray(buf)
    for offset in range(0, len(buf), 4):
        for i, byte in enumerate(key):
            buf[offset + i] = chr(buf[offset + i] ^ ord(byte))
    return str(buf)

This could probably be improved, but it does provide the proper output.

4 Comments

Replaced the def with your and tried the original function but it appears that I am getting similar results.
Could you edit your question to specify more precisely what you are after? Perhaps some example data, input and output?
Is it possible for the bytearray(buf) to accept binary data? I have asked a new question based on this function and I am attempting to feed new data for every iteration. That said I removed the struc.pack() portion in attempt to feed binary data. Works on the first iteration and then dies for additional data.
Both str and bytearray work fine with binary data. They are strings of binary data. The bytearray is mutable, and allows in-place modification. They are byte streams. You mask is a binary number (more than 8 bits), so it has to converted into a byte sequence to properly align the xor operation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.