3

I have a bz2 compressed binary (big endian) file containing an array of data. Uncompressing it with external tools and then reading the file in to Numpy works:

import numpy as np
dim = 3
rows = 1000
cols = 2000
mydata = np.fromfile('myfile.bin').reshape(dim,rows,cols)

However, since there are plenty of other files like this I cannot extract each one individually beforehand. Thus, I found the bz2 module in Python which might be able to directly decompress it in Python. However I get an error message:

dfile = bz2.BZ2File('myfile.bz2').read()
mydata = np.fromfile(dfile).reshape(dim,rows,cols)

>>IOError: first argument must be an open file

Obviously, the BZ2File function does not return a file object. Do you know what is the correct way read the compressed file?

1 Answer 1

5

BZ2File does return a file-like object (although not an actual file). The problem is that you're calling read() on it:

dfile = bz2.BZ2File('myfile.bz2').read()

This reads the entire file into memory as one big string, which you then pass to fromfile.

Depending on your versions of numpy and python and your platform, reading from a file-like object that isn't an actual file may not work. In that case, you can use the buffer you read in with frombuffer.

So, either this:

dfile = bz2.BZ2File('myfile.bz2')
mydata = np.fromfile(dfile).reshape(dim,rows,cols)

… or this:

dbuf = bz2.BZ2File('myfile.bz2').read()
mydata = np.frombuffer(dbuf).reshape(dim,rows,cols)

(Needless to say, there are a slew of other alternatives that might be better than reading the whole buffer into memory. But if your file isn't too huge, this will work.)

Sign up to request clarification or add additional context in comments.

3 Comments

frombuffer() doesn't seem to work in python2.7. It fails with AttributeError: 'ExFileObject' object has no attribute '__buffer__'. Any idea why?
Never mind, I was using zipfl = bz2.BZ2File('myfile.bz2').open('file_memver_in_archive') because I wanted to get dbuf.name and other attributes of the member. However, if one aclutally needs zipfl to be an ExFileObjcet, like I do, one can simply do mydata = np.frombuffer(zipfl.read()) and have the best of both worlds.
BTW, using np.fromfile directly to the bz2 file, does not work for me, but np.frombuffer works fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.