2

I have an n row, m column numpy array, and would like to create a new k x m array by selecting k random elements from each column of the array. I wrote the following python function to do this, but would like to implement something more efficient and faster:

def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
    v = np.ndarray.tolist(v)
    subv = random.sample(v, nelements)
    vmat = vmat + [subv]
return(np.array(vmat).T) 

One question is whether there's a way to loop over each column without transposing the array (and then transposing back). More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?

3 Answers 3

1

One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:

arr = np.random.randn(1000, 5000)  # arbitrary
k = 10  # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (k, m))
new = np.take_along_axis(arr, idx, axis=0)

Output (shape):

in [215]: new.shape    
out[215]: (10, 500)  # (k x m)
Sign up to request clarification or add additional context in comments.

Comments

1

To sample each column without replacement just like your original solution

import numpy as np

matrix = np.arange(4*3).reshape(4,3)
matrix

Output

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)

Output

array([[ 9,  1,  2],
       [ 3,  4, 11]])

Comments

0

I would

  1. Pre-allocate the result array, and fill in columns, and
  2. Use numpy index based indexing
def sample_array_cols(matrix, n_result):
    (n,m) = matrix.shape
    vmat = numpy.array([n_result, m], dtype= matrix.dtype)
    for c in range(m):
        random_indices = numpy.random.randint(0, n, n_result)
        vmat[:,c] = matrix[random_indices, c]
    return vmat

Not quite fully vectorized, but better than building up a list, and the code scans just like your description.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.