Randomly select rows from numpy array based on a condition

Question

Let's say I have 2 arrays of arrays, labels is 1D and data is 5D note that both arrays have the same first dimension.

To simplify things let's say labels contain only 3 arrays :

labels=np.array([[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]])

And let's say I have a datalist of data arrays (length=3) where each array has a 5D shape where the first dimension of each one is the same as the arrays of the labels array.

In this example, datalist has 3 arrays of shapes : (8,3,100,10,1), (5,3,100,10,1) and (10,3,100,10,1) respectively. Here, the first dimension of each of these arrays is the same as the lengths of each array in label.

Now I want to reduce the number of zeros in each array of labels and keep the other values. Let's say I want to keep only 3 zeros for each array. Therefore, the length of each array in labels as well as the first dimension of each array in data will be 6, 4 and 8.

In order to reduce the number of zeros in each array of labels, I want to randomly select and keep only 3. Now these same random selected indexes will be used then to select the correspondant rows from data.

For this example, the new_labels array will be something like this :

new_labels=np.array([[0,0,1,1,2,0],[4,0,0,0],[0,3,2,1,0,1,7,0]])

Here's what I have tried so far :

all_ind=[] #to store indexes where value=0 for all arrays
indexes_to_keep=[] #to store the random selected indexes
new_labels=[] #to store the final results

for i in range(len(labels)):
    ind=[] #to store indexes where value=0 for one array
    for j in range(len(labels[i])):
        if (labels[i][j]==0):
            ind.append(j)
    all_ind.append(ind)

for k in range(len(labels)):   
    indexes_to_keep.append(np.random.choice(all_ind[i], 3))
    aux= np.zeros(len(labels[i]) - len(all_ind[i]) + 3)
    ....
    .... 
    Here, how can I fill **aux** with the values ?
    ....
    .... 
    new_labels.append(aux)

Any suggestions ?

mathfux · Accepted Answer · 2020-11-10 14:06:38Z

1

Playing with numpy arrays of different lenghts is not a good idea therefore you are required to iterate each item and perform some method on it. Assuming you want to optimize that method only, masking might work pretty well here:

def specific_choice(x, n):
    '''leaving n random zeros of the list x'''
    x = np.array(x)
    mask = x != 0
    idx = np.flatnonzero(~mask)
    np.random.shuffle(idx) #dynamical change of idx value, quite fast
    idx = idx[:n]
    mask[idx] = True
    return x[mask] # or mask if you need it

Iteration of list is faster than one of array so effective usage would be:

labels = [[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]]
output = [specific_choice(n, 3) for n in labels]

Output:

[array([0, 1, 1, 2, 0, 0]), array([0, 4, 0, 0]), array([0, 3, 0, 2, 1, 1, 7, 0])]

edited Nov 10, 2020 at 14:06

answered Nov 10, 2020 at 11:10

mathfux

5,9792 gold badges20 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

mathfux Over a year ago

Glad to hear it. np.random.shuffle is really a fast option.

Mejdi Dallel Over a year ago

How can I use those exact random indexes to select the corresponding rows from data since the first dimension of each array in data is the same as the arrays in labels ?

mathfux Over a year ago

@MejdiDallel Seems like you could modify a definition of method to collect masks instead, something like: output_of_masks = [specific_choice_masks(n, 3) for n in labels] and then do a following comprehension just like so: [data[mask] for mask in output_of_masks].

Mejdi Dallel Over a year ago

Excellent ! I will try that ! And sorry I'm new in Python x)

mathfux Over a year ago

@MejdiDallel Alright. More to say, - Pythonic machinery of list or other kind of iterables doesn't allow this kind of indexing. It's a feature of numpylibray - it's like an interface that allows to work in Python and perform operations in C level simultaneously.

Collectives™ on Stack Overflow

Randomly select rows from numpy array based on a condition

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related