3

ok, So I thought it would be a good idea to get familiar with Python. (I have had experience with Java, php, perl, VB, etc. not a master of any, but intermediate knowledge)

so I am attempting to write a script that will take a the data from a socket, and translate it to the screen. rough beginning code to follow:

my code seems to correctly read the binary info from the socket, but I can't unpack it since I don't have access to the original structure.

I have the output for this stream with a different program, (which is terribly written which is why I am tackling this)

when I do print out the recv, it's like this...

b'L\x00k\x07vQ\n\x01\xffh\x00\x04NGIN\x04MAIN6Product XX finished reprocessing cdc XXXXX at jesadr 0c\x00k\x07vQ\n\x01\xffF\x00\x06CSSPRD\x0cliab_checkerCCheckpointed to XXXXXXXXXXXXXXXX:XXXXXXX.XXX at jesadr 0 (serial 0)[\x00l\x07vQ\n\x00\xff\x01\x00\x05MLIFE\x06dayendBdayend 1 Copyright XXXX XXXXXXX XXXXXXX XXXXX XXX XXXXXX XXXXXXXX.

from looking at this, and comparing it to the output of the other program, I would surmise that it should be broken up like..

b'L\x00k\x07vQ\n\x01\xffh\x00\x04NGIN\x04MAIN6Product XX finished reprocessing cdc XXXXX at jesadr 0'

with corresponding info

04-23
00:00:43
10
1
NGIN
MAIN
255
104
Product XX finished reprocessing cdc XXXXX at jesadr 0

Now, based on my research, it looks like I need to use the "struct" and unpack it, however I have no idea of the original structure of this, I only know what info is available from it, and to be honest, I'm having a hell of a time figuring this out.

I have used the python interpreter to attempt to unpack bits and pieces of the line, however it is an exercise in frustration.

If anyone can at least help me get started, I would very much appreciate it.

Thanks

13
  • Where is this stream coming from? If it's another program running on a system within your control, what do you know about it? Commented Apr 25, 2013 at 18:50
  • You can always start with .split('\n') it to have multilines. You would really need to get the format for the rest. Commented Apr 25, 2013 at 18:52
  • @Aya it is coming from another system within my organization, however, I have zero control over that system, I only can see the socket stream, and getting the output from the program that is provided (which is terrible) that's why I thought this would be a great project for me to get started with Python. Commented Apr 25, 2013 at 19:00
  • Well, I can see some Pascal-style strings there, but looking at the struct module, it doesn't support them very well, so you'll probably need to do it manually. Commented Apr 25, 2013 at 19:02
  • @Aya Manually? do you mean looping through the text and writing a parsing routine to parse different "pieces" of the line? This was starting to be the line of thought I was thinking I might have to travel Commented Apr 25, 2013 at 19:05

4 Answers 4

2

Okay. I think I've managed to decode it, although I'm not sure about the intermediate 16-bit value.

This Python 2.7 code...

from cStringIO import StringIO
import struct
import time

def decode(f):

    def read_le16(f):
        return struct.unpack('<h', f.read(2))[0]

    def read_timestamp(f):
        ts = struct.unpack('<l', f.read(4))[0]
        return time.ctime(ts)

    def read_byte(f):
        return ord(f.read(1))

    def read_pascal(f):
        l = ord(f.read(1))
        return f.read(l)

    result = []

    # Read total length
    result.append('Total message length is %d bytes' % read_le16(f))

    # Read timestamp
    result.append(read_timestamp(f))

    # Read 3 x byte
    result.append(read_byte(f))
    result.append(read_byte(f))
    result.append(read_byte(f))

    # Read 1 x LE16
    result.append(read_le16(f))

    # Read 3 x pascal string
    result.append(read_pascal(f))
    result.append(read_pascal(f))
    result.append(read_pascal(f))

    return result

s = 'L\x00k\x07vQ\n\x01\xffh\x00\x04NGIN\x04MAIN6Product XX finished reprocessing cdc XXXXX at jesadr 0c\x00k\x07vQ\n\x01\xffF\x00\x06CSSPRD\x0cliab_checkerCCheckpointed to XXXXXXXXXXXXXXXX:XXXXXXX.XXX at jesadr 0 (serial 0)[\x00l\x07vQ\n\x00\xff\x01\x00\x05MLIFE\x06dayendBdayend 1 Copyright XXXX XXXXXXX XXXXXXX XXXXX XXX XXXXXX XXXXXXXX.'

f = StringIO(s)
print decode(f)
print decode(f)
print decode(f)

...yields...

['Total message length is 76 bytes', 'Tue Apr 23 05:00:43 2013', 10, 1, 255, 104, 'NGIN', 'MAIN', 'Product XX finished reprocessing cdc XXXXX at jesadr 0']
['Total message length is 99 bytes', 'Tue Apr 23 05:00:43 2013', 10, 1, 255, 70, 'CSSPRD', 'liab_checker', 'Checkpointed to XXXXXXXXXXXXXXXX:XXXXXXX.XXX at jesadr 0 (serial 0)']
['Total message length is 91 bytes', 'Tue Apr 23 05:00:44 2013', 10, 0, 255, 1, 'MLIFE', 'dayend', 'dayend 1 Copyright XXXX XXXXXXX XXXXXXX XXXXX XXX XXXXXX XXXXXXXX.']

The timestamps are out by 5 hours, so I'm assuming it's a timezone thing.

Sign up to request clarification or add additional context in comments.

6 Comments

Awesome, this gives me a super starting point. I suspect the first 6 bytes have to do with date/time. I'll see if I can figure those out. Thanks so much!!!!!
@EagleKen Did you not notice I already updated the answer to deal with those bytes already?
LOL, I did not!!! sorry, trying to do too much. I am eternally grateful!!! this is just awesome!!!
and I do believe it's a timezone thing, when I ran it on my data here, I got the correct times. so you have given me lots!! now to workout buffering, and more python goodies!! Anytime you want me to purchase a beverage for you, just let me know, 1st one is one me. :)
@EagleKen Well, based on the 5 hour difference, I'm guessing you're in the EST timezone, which is a bit of a trek from the UK just for a beverage, but I appreciate the offer. :)
|
0

I'd say you're right in using struct but what sucks about struct is that afaik you'll always have to know the original structure.

Maybe reading the tcp specs and isos will help all though it's still going to be hell of a time figuring it out :/

1 Comment

ah lovely, the answer that I didn't want, however expected... :/ at least some of it is ascii, which does give me a starting point of sorts.... kinda, maybe.. :)
0

Without knowing the structure of the binary stream, it's difficult to parse though given enough time to reverse engineer it, you might get close or lucky.

Though if the client program used pickle protocol, you are in luck.

Comments

0

So far I only reverse-engineered code, not binary streams, so I'm far from an expert for your challenge. However, I'd like to share my thoughts on how I would try to tackle your problem. Maybe someone out there finds that useful (maybe I by myself at some time).

TL;DR related educational video: Harald Welte at 27C3

Roadmap

1. Context

Get as much information you can about the program (programming languguage, known serializers/serialization formats, known quirks etc.), the domain, any specifications in that area,...

2. Collection

Collect a long enough part of the stream or, if you know what a message looks like (any begin-of-message/end-of-message markers) a proper bunch of messages. Also collect the corresponding output of your reference program.

3. Low-hanging fruits

Try to identify strings and numbers that can easily be spotted both in the wire protocol and the corresponding ouput. Note down which parts are not well understood.

4. Keep your eyes open

Extend your knowledge by looking for repetitions and differences across the messages of the wire protocol. Try to match those "interesting spots" to repetitions and differences in the recorded output.

5. Hypothesis

Create hypothesis/es about the wire format based on what you know. Especially take into account information from step #1 that might help you what is going on. Also, think about what could be a timestamp, sequence number, checksum, message header, metadata, stuff like that.

6. Validation

Implement your hypothesis in code. Run it against your recorded data set to test that it works as you expect. Then, run in against a (maybe longer) bunch of fresh samples - that even you haven't seen before - to support your hypothesis. If something fails, go back to step #5.

7. Do it all over

Loop through the above steps as required until you can extract all the information you need and maybe a little more.

Closing remarks

I think it will be a good idea to test intensively in order to gain confidence that your understanding of the wire protocol is correct. This includes A) unittests to make sure that you didn't break things in your fragile code while testing a new hypothesis and B) as mentioned in the validation step, throwing new samples to the code and checking that your expectations are still valid.

But even then, you might be wrong. Even with thorough testing, this is not a guarantee that your assumptions are correct. Be always prepared to think in new ways, as it might turn out that what seemed so clear is actually completely different.

If necessary, go wild about what else could be hidden in the wire format or how things are arranged and/or related.

After this probably useless text of mine, let me finish with a video: Harald Welte at 27C3 about reverse engineering a real-world RFID payment system.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.