6

I have an memory map, which contains a 2D array and I would like to make a numpy array from it. Ideally, i would like to avoid copying, since the involved array can be big.

My code looks like this:

n_bytes = 10000
tagname = "Some Tag from external System"
map = mmap.mmap(-1, n_bytes, tagname)
offsets = [0, 5000]

columns = []
for offset in offsets:
   #type and count vary in the real code, but for this dummy code I simply made them up. But I know the count and type for every column.
   np_type = np.dtype('f4')
   column_data = np.frombuffer(map, np_type, count=500, offset=offset)
   columns.append(column_data)

# this line seems to copy the data, which I would like to avoid
data = np.array(columns).T
4
  • Have you tried reading the whole file as a big 1D array, and then reshape it to a 2D array? Commented Aug 23, 2016 at 5:37
  • Do you know in advance the size of your final array? Commented Aug 23, 2016 at 5:38
  • @kennytm The data can habe different dtypes per column ( e.g. the first block is a float, the second an int), which I cannot express in the buffer method Commented Aug 23, 2016 at 5:51
  • @ Julien Bernu Jes, I know how many columns, rows and bytes there are- Commented Aug 23, 2016 at 5:51

2 Answers 2

8

Assuming you have a byte array and you know it's dimensions the answer is very simple. imagine you raw RGB data of an image (24 bit per pixel) in a buffer (named 'buff') dimensions are 1024x768

#read the buffer into 1D byte array
arr = numpy.frombuffer(buff, dtype=numpy.uint8)
#now shape the array as you please
arr.shape = (768,1024,3)
Sign up to request clarification or add additional context in comments.

Comments

1

I haven't used frombuffer much, but I think the np.array works with those arrays as it does with conventionally constructed ones.

Each column_data array will have its own data buffer - the mmap you assigned it. But np.array(columns) reads the values from each array in the list, and constructs a new array from them, with its own data buffer.

I like to use x.__array_interface__ to look at the data buffer location (and to see other key attributes). Compare that dictionary for each element of columns and for data.

You can construct a 2d array from a mmap - using a contiguous block. Just make the 1d frombuffer array, and reshape it. Even transpose will continue to use that buffer (with F order). Slices and views also use it.

But unless you are real careful you'll quickly get copies that put the data elsewhere. Simply data1 = data+1 makes a new array, or advance indexing data[[1,3,5],:]. Same for any concatenation.

2 arrays from bytestring buffers:

In [534]: x=np.frombuffer(b'abcdef',np.uint8)
In [535]: y=np.frombuffer(b'ghijkl',np.uint8)

a new array by joining them

In [536]: z=np.array((x,y))

In [538]: x.__array_interface__
Out[538]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (6,),
 'strides': None,
 'typestr': '|u1',
 'version': 3}
In [539]: y.__array_interface__['data']
Out[539]: (3013089608, True)
In [540]: z.__array_interface__['data']
Out[540]: (180817384, False)

the data buffer locations for x,y,z are totally different

But the data for reshaped x doesn't change

In [541]: x.reshape(2,3).__array_interface__['data']
Out[541]: (3013090040, True)

nor does the 2d transpose

In [542]: x.reshape(2,3).T.__array_interface__
Out[542]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (3, 2),
 'strides': (1, 3),
 'typestr': '|u1',
 'version': 3}

Same data, different view

In [544]: x
Out[544]: array([ 97,  98,  99, 100, 101, 102], dtype=uint8)
In [545]: x.reshape(2,3).T
Out[545]: 
array([[ 97, 100],
       [ 98, 101],
       [ 99, 102]], dtype=uint8)
In [546]: x.reshape(2,3).T.view('S1')
Out[546]: 
array([[b'a', b'd'],
       [b'b', b'e'],
       [b'c', b'f']], 
      dtype='|S1')

2 Comments

Thank you for the great answer! Do you know how I can use frombuffer method when the column sizes vary? e.g. my first block contains f4, but the second f8 - I would have to do some reshaping after building the 2d array?
Structured arrays allow different dtypes in fields. But in such an array an f4 element will next to a f8, etc, in records, not as separate columns (blocks of f4, separate blocks of f8). I don't know of a way of mixing columns and dtypes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.