Python / numpy: Remove empty (zeroes) border of 3D array

Question

I have a 3D numpy array. This can be thought of as an image (to be exact it's values of field points). I want to remove the border (0 values, note that there are negative values possible) in all dimensions. The restriction is that the dimension remains the same for all molecules, eg. I only want to remove the border so far as that the "largest" entry in that dimension is still within the border. So the whole data set (small, size of it is not an issue) needs to be taken into account.

Example in 2D:

0  0  0  0  0
0  1  0  0  0
0  1  1  0  0
0  0  0  0  0
0  0  0  0  0

0  0  0  0  0
0  0  0  0  0
0  0  1  0  0
0  0  0  1  0
0  0  0  1  0

Here the top row, and left and right most columns should be removed. Over the whole data set, they only contain 0 values.

The result would be below:

Since I'm not a numpy expert I'm having trouble defining an algorithm to achieve my need. I will need to find the min and max index in each dimension which is not 0 and then use that to trim the array.

Similar to this but in 3D and the cropping must take into account the whole data set.

How can I achieve this?

UPDATE 13th Feb 2019:

So I have tried 3 answers here (one which seems to have been removed which was using zip),Martins and norok2s answer. The output dimensions are the same so I assume all of them work.

I choose Martins solution because I can easily extract the bounding box to apply it to test set.

UPDATE Feb 25th:

If anyone still is observing this I would like to have further input. As said these aren't actually images but "field values" meaning float and not greyscale images (uint8) which means I need to use at least float16 and this simply needs too much memory. (I have 48gb available but that's not enough even for 50% of the training set).

Where should the smaller arrays be placed in relation to the largest one? What I mean is, in 1D, assuming that the largest object is e.g. [1, 0, 1, 1] and a smaller one (reduced) being [1, 1] should it become [0, 0, 1, 1] (end), [0, 1, 1, 0] (middle) or [1, 1, 0, 0] (beginning)? — norok2
– norok2, Commented Feb 7, 2019 at 9:50
Initial everything has the same size. In the end result "relative" coordinates for each remaining values/pixels should remain the same. — beginner_
– beginner_, Commented Feb 7, 2019 at 10:07
@beginner_ Check my newest edit. It must be working as you wished now — Martin
– Martin, Commented Feb 7, 2019 at 17:10
@Martin it's busy here. Didn't get a chance yet to verify which answer works best — beginner_
– beginner_, Commented Feb 12, 2019 at 11:24

Martin · Accepted Answer · 2019-02-07 17:11:52Z

5

Try this: - its a main algorithm. I dont understand exactly which sides you want extract from your examples, but the below algorithm should be very easy for you to modify according to your needs

Note: This algorithm extracts CUBE where all zero value borders are 'deleted'. So on each side of cube is some value != 0

import numpy as np

# testing dataset
d = np.zeros(shape = [5,5,5]) 

# fill some values
d[3,2,1]=1
d[3,3,1]=1
d[1,3,1]=1
d[1,3,4]=1

# find indexes in all axis
xs,ys,zs = np.where(d!=0) 
# for 4D object
# xs,ys,zs,as = np.where(d!=0) 

# extract cube with extreme limits of where are the values != 0
result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1] 
# for 4D object
# result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1,min(as):max(as)+1]

>>> result.shape
(3, 2, 4)

Case 1:

d = np.zeros(shape = [5,5,5])

d[3,2,1]=1
# ...  just one value

>>> result.shape # works

(1,1,1)

Case 2: # error case - only zeros - resulting 3D has no dimensions -> error

d = np.zeros(shape = [5,5,5]) # no values except zeros
>>> result.shape


Traceback (most recent call last):
  File "C:\Users\zzz\Desktop\py.py", line 7, in <module>
    result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1]
ValueError: min() arg is an empty sequence

EDIT: Because my solution didnt get enough love and understanding, I will provide example to 4th dimensionl body, where 3 Dimensions are free for image and 4th dimension is where images are stored

import numpy as np


class ImageContainer(object):
    def __init__(self,first_image):
        self.container =  np.uint8(np.expand_dims(np.array(first_image), axis=0))

    def add_image(self,image):
        #print(image.shape)
        temp = np.uint8(np.expand_dims(np.array(image), axis=0))
        #print(temp.shape)
        self.container  = np.concatenate((self.container,temp),axis = 0)
        print('container shape',self.container.shape)

# Create image container storage

image = np.zeros(shape = [5,5,3]) # some image
image[2,2,1]=1 # put something random in it
container = ImageContainer(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,2,2]=1
container.add_image(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,3,0]=1    # if we set [2,2,0] = 1, we can expect all images will have just 1x1 pixel size
container.add_image(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,2,1]=1
container.add_image(image)
>>> container.container.shape
('container shape', (4, 5, 5, 3)) # 4 images, size 5x5, 3 channels


# remove borders to all images at once
xs,ys,zs,zzs = np.where(container.container!=0) 
# for 4D object

# extract cube with extreme limits of where are the values != 0
result = container.container[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1,min(zzs):max(zzs)+1]

>>> print('Final shape:',result.shape) 


('Final shape', (4, 1, 2, 3)) # 4 images, size: 1x2, 3 channels

edited Feb 7, 2019 at 17:11

answered Feb 7, 2019 at 9:38

Martin

3,3952 gold badges21 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

beginner_ Over a year ago

Bottom border should not be removed because the second image has a 1 in the bottom row. The cropping should take into account all images and "relative" coordinates of each pixel should stay the same.

Martin Over a year ago

ok then. I got confused. Now it shouls be correct. Put in it 3D body, and array in 'result' should be a minimal cube where extreme values!=0 are touching the side

beginner_ Over a year ago

Either I'm missing something or this doesn't work for the whole data set. If I have 10 3D images I will need to find the bounding box that will include all non-zero values over all the 10 images.

Martin Over a year ago

If you have 10 3D images, than you will just put that array in my script and it should work

Martin Over a year ago

I thought I was clear with testing dataset in my script

|

norok2 · Accepted Answer · 2021-02-08 12:01:26Z

3

You could see your problem as trimming for a specific bounding box on the array formed by putting all shapes you have together in one array.

Therefore, if you have an n-dimensional trimming function, the solution is just to apply that.

One way of implementing this would be:

import numpy as np

def trim(arr, mask):
    bounding_box = tuple(
        slice(np.min(indexes), np.max(indexes) + 1)
        for indexes in np.where(mask))
    return arr[bounding_box]

A slightly more flexible solution (where you could indicate which axis to act on) is available in FlyingCircus (Disclaimer: I am the main author of the package).

So, if you have your list of n-dim arrays (in arrs), you could first stack them using np.stack() and then trim the result:

import numpy as np

arr = np.stack(arrs, -1)
trimmed_arr = trim(arr, arr != 0)

which could then by separated back using np.split(), e.g.:

trimmed_list = np.split(trimmed_arr, arr.shape[-1], -1)

EDIT:

I just realized that this is using substantially the same approach as the other answers, except that it looks much cleaner to me.

edited Feb 8, 2021 at 12:01

answered Feb 7, 2019 at 11:01

norok2

27.1k6 gold badges83 silver badges110 bronze badges

1 Comment

Konchog Over a year ago

This is pretty cool. I prefer it as a one-liner for single arrays: return arr[tuple(slice(np.min(idx), np.max(idx) + 1) for idx in np.where(arr != 0))]

mbornstein · Accepted Answer · 2019-02-08 10:02:41Z

2

Update:

Based on Martin's solution using min/max and np.where, but generalizing it to any dimension, you can do it in this way:

def bounds_per_dimension(ndarray):
    return map(
        lambda e: range(e.min(), e.max() + 1),
        np.where(ndarray != 0)
    )

def zero_trim_ndarray(ndarray):
    return ndarray[np.ix_(*bounds_per_dimension(ndarray))]

d = np.array([[
    [0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0],
    [0, 1, 1, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
], [
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0],
]])

zero_trim_ndarray(d)

edited Feb 8, 2019 at 10:02

answered Feb 7, 2019 at 10:23

mbornstein

2643 silver badges7 bronze badges

1 Comment

norok2 Over a year ago

Using range() and np.ix_() for this is going to be unnecessarily slow. If you would time that code against the slice() / arr[] approach (as used in my answer), you would get, even for this simple example using d as input, ~2x speed difference.

Collectives™ on Stack Overflow

Python / numpy: Remove empty (zeroes) border of 3D array

3 Answers 3

11 Comments

EDIT:

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

11 Comments

EDIT:

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related