0

I have the given dataset:

data = np.array([
    [1, 2, 1, 3, 1, 2, 1],
    [3, 4, 1, 5, 2, 7, 2],
    [2, 1, 2, 1, 1, 4, 5],
    [6, 1, 2 ,3, 1, 3, 1]])

cols_idx = np.array([0, 0, 1, 0, 1, 0, 0])

I want to return columns from data where cols_idx == 1. For that I used:

data[:, np.nonzero(cols_idx)]

But it returns a 3D instead a 2D array:

data[:, np.nonzero(cols_idx)]
array([[[1, 1]],    
       [[1, 2]],
       [[2, 1]],    
       [[2, 1]]])

data[:, np.nonzero(cols_idx)].shape
(4, 1, 2)

I would like the output to be:

data[:, np.nonzero(cols_idx)]
array([[1, 1],    
       [1, 2],
       [2, 1],    
       [2, 1]])

data[:, np.nonzero(cols_idx)].shape
(4, 2)

How can I achieve that?

3
  • Look at np.nonzero(cols_idx) by itself. Commented Nov 24, 2019 at 5:59
  • found np.flatnonzero() that fix the "issue" Commented Nov 24, 2019 at 6:00
  • flatnonzero extracts the array from the nonzero tuple, with a [0] indexing. Look at its code! You want to index with the array nonzero produces, not the tuple it is wrapped in. That tuple if great for indexing by itself. Commented Nov 24, 2019 at 6:02

2 Answers 2

1

print(np.nonzero(cols_idx)) gives (array([2, 4]),) (a tuple rather than just an array)

So you should use np.nonzero(cols_idx)[0] # gives [2 4] to get what you want:

Full code:

import numpy as np 
data = np.array([
    [1, 2, 1, 3, 1, 2, 1],
    [3, 4, 1, 5, 2, 7, 2],
    [2, 1, 2, 1, 1, 4, 5],
    [6, 1, 2 ,3, 1, 3, 1]])

cols_idx = np.array([0, 0, 1, 0, 1, 0, 0])
new_data = data[:, np.nonzero(cols_idx)[0]]
print(new_data)
'''[[1 1]                                                                                                                        
 [1 2]                                                                                                                        
 [2 1]                                                                                                                        
 [2 1]]'''
print(new_data.shape) # (4,2)
Sign up to request clarification or add additional context in comments.

Comments

1

From numpy documentation:

While the nonzero values can be obtained with a[nonzero(a)], it is recommended to use x[x.astype(bool)] or x[x != 0] instead, which will correctly handle 0-d arrays.

So it's better to use:

data[:, cols_idx.astype(bool)]

or

data[:, cols_idx != 0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.