3

I need to extract financial price data from a binary file. This price data is normally extracted by a piece of C# code. The biggest problem I'm having is getting a meaningful datetime.

The binary data looks like this:

'\x14\x11\x00\x00{\x14\xaeG\xe1z(@\x9a\x99\x99\x99\x99\x99(@q=\n\xd7\xa3p(@\x9a\x99\x99\x99\x99\x99(@\xac\x00\x19\x00\x00\x00\x00\x00\x08\x01\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'

The C# code that extracts it correctly is:

StockID = reader.ReadInt32();
Open = reader.ReadDouble();
High = reader.ReadDouble();
Low = reader.ReadDouble();
Close = reader.ReadDouble();
Volume = reader.ReadInt64();
TotalTrades = reader.ReadInt32();
Timestamp = reader.ReadDateTime();

This is where I've gotten in python. I have a couple concerns about it.

In [1]: barlength = 56; barformat = 'i4dqiq'
In [2]: pricebar = f.read(barlength)
In [3]: pricebar
Out[3]: '\x95L\x00\x00)\\\x8f\xc2\xf5\xc8N@D\x1c\xeb\xe26\xcaN@\x7fj\xbct\x93\xb0N@\xd7\xa3p=\n\xb7N@\xf6\xdb\x02\x00\x00\x00\x00\x00J\x03\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'
In [4]: struct.unpack(barformat, pricebar)
Out[4]: 
(19605,                # stock id
 61.57,                # open
 61.579800000000006,   # high
 61.3795,              # low
 61.43,                # close
 187382,               # volume -- seems reasonable
 842,                  # TotalTrades -- seems reasonable
 634124502600000000L   # datetime -- no idea what this means!
)

I used python's built in struct module but have some concerns about it.

  1. I'm not sure what format characters correspond to Int32 vs Int64 in the C# code, though several different tries returned the same python tuple.

  2. I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)

  3. I can't make any sense of the date return field. This is actually my biggest problem.

1
  • Can you post the source code of the C# reader class? Commented Jul 2, 2010 at 23:04

2 Answers 2

3

As far as I know, .net timestamps are ticks (stored as a 62-bit value with the upper 2 bits if the timestamp is UTC or Local) since 0001-01-01T00:00:00Z where a tick is 100 nanoseconds. So:

>>> x = 634124502600000000
>>> x = x & 0x3FFFFFFFFFFFFFFF
>>> secs = x / 10.0 ** 7
>>> secs
63412450260.0
>>> import datetime
>>> delta = datetime.timedelta(seconds=secs)
>>> delta
datetime.timedelta(733940, 34260)
>>> ts = datetime.datetime(1,1,1) + delta
>>> ts
datetime.datetime(2010, 6, 18, 9, 31)
>>>

The date part is 2010-06-18. Are you in a timezone that's 9.5 hours away from UTC? It would be rather useful in verifying this calculation if you were to supply TWO timestamp values together with the expected answers.

Addressing your concern """I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)""": They are not sensitive because (1) "long" and "int" mean the same (32 bits) and (2) the smaller half of all possible unsigned numbers have the same representation as signed numbers. For example, in 8-bit numbers, the numbers 0 to 127 inclusive have the same bit pattern whether signed or unsigned.

Sign up to request clarification or add additional context in comments.

2 Comments

thank you for the explanation about signed/unsigned integers. i didn't really know, but now i'm fairly certain that i should be using the unsigned ones, since total trades should never be negative.
The sample code provided has a bug that will cause the Python datetime conversion to throw an OverflowError. The issue is that while .NET System.DateTime objects take up 64-bits, the upper 2 bits specify if the timestamp is relative to UTC or Local timezones, or unspecified. The value for the number of ticks is only the lower 62-bits. Failing to mask out the upper two bits causes an overflow error whenever the upper 2-bit DateTimeKind is anything other than unspecified.
0

Without seeing the C# source containing the ReadInt32, ReadDouble, ReadDateTime etc methods it will be impossible to give a definitive answer, but...

  1. I'm not really sure what the difference is between the i and l format characters, but I think you're correct in using i/l for Int32 and q for Int64.

  2. Again, I don't know the difference between the i/l or I/L format characters, but since they all represent 32-bit integers then their binary representation should be the same for all values between 0 and 2147483647 inclusive. If it's possible for TotalTrades to be negative, or exceed 2147483647, then you should investigate further. If not then don't worry about it.

  3. It looks to me like your serialized date field is probably equivalent to DateTime.Ticks.

    If that's the case then the serialized value will be the number of ticks -- that is, the number of 100 nanosecond intervals -- since 00:00:00 on 1 January 0001.

    By that reckoning, the value shown in your question -- 634124502600000000 -- would represent 09:31:00 on 18 June 2010.

2 Comments

i/I and l/L are for signed/unsigned int and long respectively. thanks for the response.
@Arthur: I meant that I wasn't sure of the difference between i and l (both described as signed 32-bit integers) or between I and L (both described as unsigned 32-bit integers). I guess the naming is a throwback to C/C++ where the size of ints and longs is implementation-dependent, although as far as the struct module is concerned they appear to be exactly the same.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.