Retuning columns in a numpy array given a boolean index

Question

I have the given dataset:

data = np.array([
    [1, 2, 1, 3, 1, 2, 1],
    [3, 4, 1, 5, 2, 7, 2],
    [2, 1, 2, 1, 1, 4, 5],
    [6, 1, 2 ,3, 1, 3, 1]])

cols_idx = np.array([0, 0, 1, 0, 1, 0, 0])

I want to return columns from data where cols_idx == 1. For that I used:

data[:, np.nonzero(cols_idx)]

But it returns a 3D instead a 2D array:

data[:, np.nonzero(cols_idx)]
array([[[1, 1]],    
       [[1, 2]],
       [[2, 1]],    
       [[2, 1]]])

data[:, np.nonzero(cols_idx)].shape
(4, 1, 2)

I would like the output to be:

data[:, np.nonzero(cols_idx)]
array([[1, 1],    
       [1, 2],
       [2, 1],    
       [2, 1]])

data[:, np.nonzero(cols_idx)].shape
(4, 2)

How can I achieve that?

flatnonzero extracts the array from the nonzero tuple, with a [0] indexing. Look at its code! You want to index with the array nonzero produces, not the tuple it is wrapped in. That tuple if great for indexing by itself. — hpaulj
– hpaulj, Commented Nov 24, 2019 at 6:02

lbragile · Accepted Answer · 2019-11-24 06:13:51Z

print(np.nonzero(cols_idx)) gives (array([2, 4]),) (a tuple rather than just an array)

So you should use np.nonzero(cols_idx)[0] # gives [2 4] to get what you want:

Full code:

import numpy as np 
data = np.array([
    [1, 2, 1, 3, 1, 2, 1],
    [3, 4, 1, 5, 2, 7, 2],
    [2, 1, 2, 1, 1, 4, 5],
    [6, 1, 2 ,3, 1, 3, 1]])

cols_idx = np.array([0, 0, 1, 0, 1, 0, 0])
new_data = data[:, np.nonzero(cols_idx)[0]]
print(new_data)
'''[[1 1]                                                                                                                        
 [1 2]                                                                                                                        
 [2 1]                                                                                                                        
 [2 1]]'''
print(new_data.shape) # (4,2)

Mykola Zotko · Accepted Answer · 2019-11-24 08:37:35Z

1

From numpy documentation:

While the nonzero values can be obtained with a[nonzero(a)], it is recommended to use x[x.astype(bool)] or x[x != 0] instead, which will correctly handle 0-d arrays.

So it's better to use:

data[:, cols_idx.astype(bool)]

or

data[:, cols_idx != 0]

answered Nov 24, 2019 at 8:37

Mykola Zotko

18.2k6 gold badges88 silver badges90 bronze badges

Collectives™ on Stack Overflow

Retuning columns in a numpy array given a boolean index

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related