2

I have a memory usage problem in python but haven't been able to find a satisfying solution yet.

The problem is quite simple : I have collection of images as numpy arrays of shape (n_samples, size_image). I need to slice each image in the same way and feed these slices to a classification algorithm all at once. How do you take numpy array slices without duplicating data in memory? Naively, as slices are simple "views" of the original data, I assume that there must be a way to do the slicing without copying data in the memory. The problem being critical when dealing with large datasets such as the MNIST handwritten digits dataset.

I have tried to find a solution using numpy.lib.stride_tricks.as_strided but struggle to get it work on collections of images.

A similar toy problem would be to slice the scikit handwritten digits in a memory-friendly way.

from sklearn.datasets import load_digits
digits = load_digits()
X = digits.data

X has shape (1797, 64) , i.e. the picture is a 8x8 element. With a window size of 6x6 it will give (8-6+1)*(8-6+1) = 9 slices of size 36 per image resulting in an array sliced_Xof shape (16173, 36).

Now the question is how do you get from X to sliced_Xwithout using too much memory???

6
  • 1
    Would you be okay with an output of shape (1797, 3, 3, 6, 6)? Commented May 6, 2017 at 16:00
  • 1
    If the images are in separate arrays, you can't make a new array that contains their slices without copying values. If they are already collected into one large array, then it is possible to take views of various sorts without copying. That includes strided views. But be ware that multi-dimensional striding is tricky, and can produce an explosion in memory use. Practice with small arrays first. Commented May 6, 2017 at 18:54
  • @Divakar: I don't think the OP has any choice. That is the only way to construct the strided view, else the strides between windows would have to vary Commented May 6, 2017 at 20:51
  • @Divakar I'd be interested to know how you get an output of shape (1797, 3, 3, 6, 6) Commented May 7, 2017 at 9:57
  • @hpaulj Typically the input is a big 2D array containing the flattened images. For the specific scikit digits toy model it is 1797 pictures each originally of size 8x8. Then it should possible to take views, right? Commented May 7, 2017 at 10:02

1 Answer 1

2

I would start off assuming that the input array is (M,n1,n2) (if it's not we can always reshape it). Here's an implementation to have a sliding windowed view into it with an output array of shape (M,b1,b2,n1-b1+1,n2-b2+1) with the block size being (b1,b2) -

def strided_lastaxis(a, blocksize):
    d0,d1,d2 = a.shape
    s0,s1,s2 = a.strides

    strided = np.lib.stride_tricks.as_strided

    out_shp = (d0,) + tuple(np.array([d1,d2]) - blocksize + 1) + blocksize
    return strided(a, out_shp, (s0,s1,s2,s1,s2))

Being a view it won't occupy anymore of memory space, so we are doing okay on memory. But keep in mind that we shouldn't reshape, as that would force a memory copy.

Here's a sample run to make things with a manual check -

Setup input and get output :

In [72]: a = np.random.randint(0,9,(2, 6, 6))

In [73]: out = strided_lastaxis(a, blocksize=(4,4))

In [74]: np.may_share_memory(a, out) # Verify this is a view
Out[74]: True

In [75]: a
Out[75]: 
array([[[1, 7, 3, 5, 6, 3],
        [3, 2, 3, 0, 1, 5],
        [6, 3, 5, 5, 3, 5],
        [0, 7, 0, 8, 2, 4],
        [0, 3, 7, 3, 4, 4],
        [0, 1, 0, 8, 8, 1]],

       [[4, 1, 4, 5, 0, 8],
        [0, 6, 5, 6, 6, 7],
        [6, 3, 1, 8, 6, 0],
        [0, 1, 1, 7, 6, 8],
        [6, 3, 3, 1, 6, 1],
        [0, 0, 2, 4, 8, 3]]])

In [76]: out.shape
Out[76]: (2, 3, 3, 4, 4)

Output values :

In [77]: out[0,0,0]
Out[77]: 
array([[1, 7, 3, 5],
       [3, 2, 3, 0],
       [6, 3, 5, 5],
       [0, 7, 0, 8]])

In [78]: out[0,0,1]
Out[78]: 
array([[7, 3, 5, 6],
       [2, 3, 0, 1],
       [3, 5, 5, 3],
       [7, 0, 8, 2]])

In [79]: out[0,0,2]
Out[79]: 
array([[3, 5, 6, 3],
       [3, 0, 1, 5],
       [5, 5, 3, 5],
       [0, 8, 2, 4]]) # ............

In [80]: out[1,2,2] # last block
Out[80]: 
array([[1, 8, 6, 0],
       [1, 7, 6, 8],
       [3, 1, 6, 1],
       [2, 4, 8, 3]])
Sign up to request clarification or add additional context in comments.

5 Comments

"we shouldn't ... transpose/permute axes of the output" - why? That doesn't sound like it would incur a copy to me
I have been able to have a quite similar result using list comprehension and as_strided like this sliced_X = [as_strided(img, (3,3,6,6), (64,8,64,8)) for img in X] but the problem is that it gives a list and reshaping it will copy the data.
@Pierre-YvesLablanche Why would you use a list comprehension? Is X an array? If so, you don't need to use a list comprehension. If not, your question as it stands is different.
@Divakar X is indeed an array. I used list comprehension by habit but it ended up being not such a great idea. The question is still the same :) I am actually thinking that your solution should be enough and I might not need to reshape the data before feeding it to the classifier. I'll keep you posted on that.
@Divakar I have done more investigation on numpy memory usage and tried to find a solution to my problem and apparently there is no solution : As I am using scikit random forest implementation the input must be of shape [n_samples, size_sample]and thus the as_stride solution is not enough. It is apparently impossible to reshape without copying as the new array would be non-contiguous. I'll write a longer answer to my own question. Thanks for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.