0

Let's say I have 2 arrays of arrays, labels is 1D and data is 5D note that both arrays have the same first dimension.

To simplify things let's say labels contain only 3 arrays :

labels=np.array([[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]])

And let's say I have a datalist of data arrays (length=3) where each array has a 5D shape where the first dimension of each one is the same as the arrays of the labels array.

In this example, datalist has 3 arrays of shapes : (8,3,100,10,1), (5,3,100,10,1) and (10,3,100,10,1) respectively. Here, the first dimension of each of these arrays is the same as the lengths of each array in label.

Now I want to reduce the number of zeros in each array of labels and keep the other values. Let's say I want to keep only 3 zeros for each array. Therefore, the length of each array in labels as well as the first dimension of each array in data will be 6, 4 and 8.

In order to reduce the number of zeros in each array of labels, I want to randomly select and keep only 3. Now these same random selected indexes will be used then to select the correspondant rows from data.

For this example, the new_labels array will be something like this :

new_labels=np.array([[0,0,1,1,2,0],[4,0,0,0],[0,3,2,1,0,1,7,0]])

Here's what I have tried so far :

all_ind=[] #to store indexes where value=0 for all arrays
indexes_to_keep=[] #to store the random selected indexes
new_labels=[] #to store the final results

for i in range(len(labels)):
    ind=[] #to store indexes where value=0 for one array
    for j in range(len(labels[i])):
        if (labels[i][j]==0):
            ind.append(j)
    all_ind.append(ind)

for k in range(len(labels)):   
    indexes_to_keep.append(np.random.choice(all_ind[i], 3))
    aux= np.zeros(len(labels[i]) - len(all_ind[i]) + 3)
    ....
    .... 
    Here, how can I fill **aux** with the values ?
    ....
    .... 
    new_labels.append(aux)

Any suggestions ?

1 Answer 1

1

Playing with numpy arrays of different lenghts is not a good idea therefore you are required to iterate each item and perform some method on it. Assuming you want to optimize that method only, masking might work pretty well here:

def specific_choice(x, n):
    '''leaving n random zeros of the list x'''
    x = np.array(x)
    mask = x != 0
    idx = np.flatnonzero(~mask)
    np.random.shuffle(idx) #dynamical change of idx value, quite fast
    idx = idx[:n]
    mask[idx] = True
    return x[mask] # or mask if you need it

Iteration of list is faster than one of array so effective usage would be:

labels = [[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]]
output = [specific_choice(n, 3) for n in labels]

Output:

[array([0, 1, 1, 2, 0, 0]), array([0, 4, 0, 0]), array([0, 3, 0, 2, 1, 1, 7, 0])]
Sign up to request clarification or add additional context in comments.

5 Comments

Glad to hear it. np.random.shuffle is really a fast option.
How can I use those exact random indexes to select the corresponding rows from data since the first dimension of each array in data is the same as the arrays in labels ?
@MejdiDallel Seems like you could modify a definition of method to collect masks instead, something like: output_of_masks = [specific_choice_masks(n, 3) for n in labels] and then do a following comprehension just like so: [data[mask] for mask in output_of_masks].
Excellent ! I will try that ! And sorry I'm new in Python x)
@MejdiDallel Alright. More to say, - Pythonic machinery of list or other kind of iterables doesn't allow this kind of indexing. It's a feature of numpylibray - it's like an interface that allows to work in Python and perform operations in C level simultaneously.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.