How to split numpy array into numpy arrays based on columns?

Question

I want to split numpy array based on columns if all values of column are zero. If sequence of columns has only 0 like first two columns of sample array, this group should discard.

Is there any efficient solution?

Sample input numpy array:

[[0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   255. 0.   0.  ]
 [0.   0.   255. 255.   0.   0.     0.   0.   255.]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
]

Expected output numpy array:

[[
  [0.    255.]
  [0.    255.]
  [0.    255.]
  [255.  255.]
  [0.    255.]
  [0.    255.]
  [0.    255.]
 ]
 [
  [255. 0.  ]
  [255. 0.  ]
  [255. 255.]
  [0.   0.  ]
  [255. 0.  ]
  [255. 0.  ]
  [255. 0.  ]
 ]
 [
  [0.  ]
  [0.  ]
  [0.  ] 
  [255.] 
  [0.  ]
  [0.  ]
  [0.  ]
 ]
]

You want the output in a numpy array? If so what would the shape() be? — gnodab
– gnodab, Commented Apr 10, 2020 at 20:15

Andy L. · Accepted Answer · 2020-04-10 20:47:20Z

2

You may use itertools.groupby and list comprehension

from itertools import groupby

m = a.any(0)
out = [a[:,[*g]] for k, g in groupby(np.arange(len(m)), lambda x: m[x] != 0) if k]

Out[180]:
[array([[  0, 255],
        [  0, 255],
        [  0, 255],
        [255, 255],
        [  0, 255],
        [  0, 255],
        [  0, 255]]),
 array([[255,   0],
        [255,   0],
        [255, 255],
        [  0,   0],
        [255,   0],
        [255,   0],
        [255,   0]]),
 array([[  0],
        [  0],
        [  0],
        [255],
        [  0],
        [  0],
        [  0]])]

Note: a is your sample array

As in the comment, if you want to discard columns having only one non-zero value, you only need change m to different mask to handle both all 0 and one non-zero

m = (a != 0).sum(0) > 1    
out = [a[:,[*g]] for k, g in groupby(np.arange(len(m)), lambda x: m[x] != 0) if k]

Out[204]:
[array([[255],
        [255],
        [255],
        [255],
        [255],
        [255],
        [255]]),
 array([[255],
        [255],
        [255],
        [  0],
        [255],
        [255],
        [255]])]

edited Apr 10, 2020 at 20:47

answered Apr 10, 2020 at 20:24

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user13051721 Over a year ago

It works as I expect, thanks! Is it possible if just one element of column is not equal to 0, can you say how I discard that column too?

Andy L. Over a year ago

like columns: 2, 6, 8 (counting from 0). You want to discard them?

user13051721 Over a year ago

Yes, if just one value is different from 0.

Andy L. Over a year ago

You don't actually need additional mask, just change mask m to handle both all zero and one non-zero. Check my latest updated answer

user13051721 Over a year ago

@AndyL,Thanks! That is what I want.

|

Mister Brainley · Accepted Answer · 2020-04-10 18:34:59Z

2

One solution is to leverage the scipy.ndimage library to label columns with any non-zero elements, then split your array using those labels.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.label.html

For example ...

import numpy as np
from scipy import ndimage

# convert input string to numpy array
s = '''[[0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   255. 0.   0.  ]
 [0.   0.   255. 255.   0.   0.     0.   0.   255.]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
 [0.   0.   0.   255.   0.   255.   0.   0.   0.  ]
]'''
X = eval("np.array(%s)" % s.replace('.', ',').replace('\n', ',')).astype(float)

# label continuous columns with any non-zero values
labels, N = ndimage.label((X > 0).any(0))

# display the column splits
for n in range(1, N+1):
    mask = labels == n
    print(X[:, mask])

Should display ...

[[  0. 255.]
 [  0. 255.]
 [  0. 255.]
 [255. 255.]
 [  0. 255.]
 [  0. 255.]
 [  0. 255.]]
[[255.   0.]
 [255.   0.]
 [255. 255.]
 [  0.   0.]
 [255.   0.]
 [255.   0.]
 [255.   0.]]
[[  0.]
 [  0.]
 [  0.]
 [255.]
 [  0.]
 [  0.]
 [  0.]]

edited Apr 10, 2020 at 18:34

answered Apr 10, 2020 at 18:24

Mister Brainley

6901 gold badge7 silver badges20 bronze badges

1 Comment

user13051721 Over a year ago

Thanks and sorry because I forgot to say without scipy.

Ethan · Accepted Answer · 2020-04-10 19:18:33Z

1

If you can’t or don’t want to use scipy this also works:

def twoCols():
    arr = np.array([[0. ,  0.,   0.,   255. ,  0.,   255.,   0.  , 0. ,  0.  ],[0.  , 0. ,  0. ,  255. ,  0.,   255.,   0.  , 0. ,  0.  ],[0.,   0. ,  0.  , 255.,   0.  , 255. ,  255. ,0. ,  0.  ],[0.   ,0. ,  255., 255. ,  0. ,  0. ,    0. ,  0. ,  255.],[0.,   0. ,  0. ,  255. ,  0.,   255. ,  0. ,  0.,   0.  ],[0.,   0. ,  0. ,  255.,   0. ,  255. ,  0. ,  0.,   0.  ],[0.,   0. ,  0. ,  255. ,  0. ,  255. ,  0. ,  0. ,  0.  ]], dtype=np.float64)
    arrs = []   
    for c in range(arr.shape[1]):
        if c == arr.shape[1]-1:
            if sum(arr[:,c]) > 0:
                    arrs.append(arr[:,c:])              
        elif sum(arr[:,c]) > 0 and sum(arr[:,c+1]) > 0:
            arrs.append(arr[:,c:c+2])                           
    return arrs

>>> twoCols()
[array([[   0.,  255.],
       [   0.,  255.],
       [   0.,  255.],
       [ 255.,  255.],
       [   0.,  255.],
       [   0.,  255.],
       [   0.,  255.]]), array([[ 255.,    0.],
       [ 255.,    0.],
       [ 255.,  255.],
       [   0.,    0.],
       [ 255.,    0.],
       [ 255.,    0.],
       [ 255.,    0.]]), array([[   0.],
       [   0.],
       [   0.],
       [ 255.],
       [   0.],
       [   0.],
       [   0.]])]

edited Apr 10, 2020 at 19:18

answered Apr 10, 2020 at 18:52

Ethan

1,3831 gold badge8 silver badges8 bronze badges

2 Comments

user13051721 Over a year ago

How can I access these arrays separately? 'for _ in arrs' returns one dimensional arrays multi-dimensional arrays.

Ethan Over a year ago

arrs is just a regular python list, so you can access each array with arrs[0], arrs[1], etc...

gnodab · Accepted Answer · 2020-04-10 20:09:42Z

This is not in the same format you asked for. But I like the elegance:

import numpy as np
data = np.array([
 [0.,   0.,   0.,   255.,   0.,   255.,   0.,   0.,   0.  ],
 [0.,   0.,   0.,   255.,   0.,   255.,   0.,   0.,   0.  ],
 [0.,   0.,   0.,   255.,   0.,   255.,   255., 0.,   0.  ],
 [0.,   0.,   255., 255.,   0.,   0.,     0.,   0.,   255.],
 [0.,   0.,   0.,   255.,   0.,   255.,   0.,   0.,   0.  ],
 [0.,   0.,   0.,   255.,   0.,   255.,   0.,   0.,   0.  ],
 [0.,   0.,   0.,   255.,   0.,   255.,   0.,   0.,   0.  ],
])

a=np.zeros((data.shape[0]))            
cols = np.any(data!=0, axis=0)
for i, c in enumerate(cols):
    if c:
        a = np.column_stack((a, data[:,i]))
a = a[:,1:]
print(a)

Output:


[[  0. 255. 255.   0.   0.]
 [  0. 255. 255.   0.   0.]
 [  0. 255. 255. 255.   0.]
 [255. 255.   0.   0. 255.]
 [  0. 255. 255.   0.   0.]
 [  0. 255. 255.   0.   0.]
 [  0. 255. 255.   0.   0.]]

Collectives™ on Stack Overflow

How to split numpy array into numpy arrays based on columns?

4 Answers 4

7 Comments

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related