4

I have a big NumPy array that I want to divide into many subarrays by moving a window of a particular size, here's my code in the case of subarrays of size 11:

import numpy as np

x = np.arange(10000)
T = np.array([])

for i in range(len(x)-11):
    s = x[i:i+11]
    T = np.concatenate((T, s), axis=0)

But it is very slow for arrays having more than 1 million entries, is there any tip to make it faster?

3
  • 1
    what is your x variable? Commented Dec 11, 2019 at 2:23
  • I had a small error but I corrected it, x is the big array Commented Dec 11, 2019 at 2:28
  • I don't know what your overall objective is. But you should probably start with numpy.asarray and from there if you can numpy.split if you want sub-arrays or numpy.reshape instead of whatever concatenation you're doing. Commented Dec 11, 2019 at 2:34

2 Answers 2

3

Actually, this is a case for as_strided:

from numpy.lib.stride_tricks import as_strided

# set up
x = np.arange(1000000); windows = 11

# strides of x
stride = x.strides;

T = as_strided(x, shape=(len(x)-windows+1, windows), strides=(stride, stride))

Output:

array([[     0,      1,      2, ...,      8,      9,     10],
       [     1,      2,      3, ...,      9,     10,     11],
       [     2,      3,      4, ...,     10,     11,     12],
       ...,
       [999987, 999988, 999989, ..., 999995, 999996, 999997],
       [999988, 999989, 999990, ..., 999996, 999997, 999998],
       [999989, 999990, 999991, ..., 999997, 999998, 999999]])

Performance:

5.88 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

Comments

2

I think your current method does not produce what you are describing. Here is a faster method which splits a long array into many sub arrays using list comprehension:

Code Fix:

import numpy as np 

x = np.arange(10000)
T = np.array([])

T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])

Speed Comparison:

sample_1 = '''
import numpy as np 

x = np.arange(10000)
T = np.array([])

for i in range(len(x)-11):
    s = x[i:i+11]
    T = np.concatenate((T, s),axis=0)

'''    

sample_2 = '''
import numpy as np 

x = np.arange(10000)
T = np.array([])

T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])
'''

# Testing the times
import timeit
print(timeit.timeit(sample_1, number=1))
print(timeit.timeit(sample_2, number=1))

Speed Comparison Output:

5.839815437000652   # Your method
0.11047088200211874 # List Comprehension

I only checked 1 iteration as the difference is quite significant and many iterations would not change the overall outcome.

Output Comparison:

# Your method:
[  0.00000000e+00   1.00000000e+00   2.00000000e+00 ...,   9.99600000e+03
   9.99700000e+03   9.99800000e+03]

# Using List Comprehension:
[[   0    1    2 ...,    8    9   10]
 [   1    2    3 ...,    9   10   11]
 [   2    3    4 ...,   10   11   12]
 ..., 
 [9986 9987 9988 ..., 9994 9995 9996]
 [9987 9988 9989 ..., 9995 9996 9997]
 [9988 9989 9990 ..., 9996 9997 9998]]

You can see that my method actually produces sub-arrays, unlike what your provided code does.

Note:

These tests were carried out on x which was just a list of ordered numbers from 0 to 10000.

2 Comments

Also see that range() automatically starts at 0 so there is no need to specify that. Furthermore, your code produced a 1D array due to the concatenation rather than a 2D array (array of arrays).
Thank you it worked well and faster, I was reshaping my output to get the same result as you.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.