3

I have a 3D numpy array. This can be thought of as an image (to be exact it's values of field points). I want to remove the border (0 values, note that there are negative values possible) in all dimensions. The restriction is that the dimension remains the same for all molecules, eg. I only want to remove the border so far as that the "largest" entry in that dimension is still within the border. So the whole data set (small, size of it is not an issue) needs to be taken into account.

Example in 2D:

0  0  0  0  0
0  1  0  0  0
0  1  1  0  0
0  0  0  0  0
0  0  0  0  0

0  0  0  0  0
0  0  0  0  0
0  0  1  0  0
0  0  0  1  0
0  0  0  1  0

Here the top row, and left and right most columns should be removed. Over the whole data set, they only contain 0 values.

The result would be below:

1  0  0
1  1  0
0  0  0
0  0  0

0  0  0
0  1  0
0  0  1
0  0  1

Since I'm not a numpy expert I'm having trouble defining an algorithm to achieve my need. I will need to find the min and max index in each dimension which is not 0 and then use that to trim the array.

Similar to this but in 3D and the cropping must take into account the whole data set.

How can I achieve this?

UPDATE 13th Feb 2019:

So I have tried 3 answers here (one which seems to have been removed which was using zip),Martins and norok2s answer. The output dimensions are the same so I assume all of them work.

I choose Martins solution because I can easily extract the bounding box to apply it to test set.

UPDATE Feb 25th:

If anyone still is observing this I would like to have further input. As said these aren't actually images but "field values" meaning float and not greyscale images (uint8) which means I need to use at least float16 and this simply needs too much memory. (I have 48gb available but that's not enough even for 50% of the training set).

6
  • Where should the smaller arrays be placed in relation to the largest one? What I mean is, in 1D, assuming that the largest object is e.g. [1, 0, 1, 1] and a smaller one (reduced) being [1, 1] should it become [0, 0, 1, 1] (end), [0, 1, 1, 0] (middle) or [1, 1, 0, 0] (beginning)? Commented Feb 7, 2019 at 9:50
  • Initial everything has the same size. In the end result "relative" coordinates for each remaining values/pixels should remain the same. Commented Feb 7, 2019 at 10:07
  • @beginner_ Check my newest edit. It must be working as you wished now Commented Feb 7, 2019 at 17:10
  • @beginner_ Have your question been answered? Commented Feb 11, 2019 at 13:45
  • @Martin it's busy here. Didn't get a chance yet to verify which answer works best Commented Feb 12, 2019 at 11:24

3 Answers 3

5

Try this: - its a main algorithm. I dont understand exactly which sides you want extract from your examples, but the below algorithm should be very easy for you to modify according to your needs

Note: This algorithm extracts CUBE where all zero value borders are 'deleted'. So on each side of cube is some value != 0

import numpy as np

# testing dataset
d = np.zeros(shape = [5,5,5]) 

# fill some values
d[3,2,1]=1
d[3,3,1]=1
d[1,3,1]=1
d[1,3,4]=1

# find indexes in all axis
xs,ys,zs = np.where(d!=0) 
# for 4D object
# xs,ys,zs,as = np.where(d!=0) 

# extract cube with extreme limits of where are the values != 0
result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1] 
# for 4D object
# result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1,min(as):max(as)+1]

>>> result.shape
(3, 2, 4)

Case 1:

d = np.zeros(shape = [5,5,5])

d[3,2,1]=1
# ...  just one value

>>> result.shape # works

(1,1,1)

Case 2: # error case - only zeros - resulting 3D has no dimensions -> error

d = np.zeros(shape = [5,5,5]) # no values except zeros
>>> result.shape


Traceback (most recent call last):
  File "C:\Users\zzz\Desktop\py.py", line 7, in <module>
    result = d[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1]
ValueError: min() arg is an empty sequence

EDIT: Because my solution didnt get enough love and understanding, I will provide example to 4th dimensionl body, where 3 Dimensions are free for image and 4th dimension is where images are stored

import numpy as np


class ImageContainer(object):
    def __init__(self,first_image):
        self.container =  np.uint8(np.expand_dims(np.array(first_image), axis=0))

    def add_image(self,image):
        #print(image.shape)
        temp = np.uint8(np.expand_dims(np.array(image), axis=0))
        #print(temp.shape)
        self.container  = np.concatenate((self.container,temp),axis = 0)
        print('container shape',self.container.shape)

# Create image container storage

image = np.zeros(shape = [5,5,3]) # some image
image[2,2,1]=1 # put something random in it
container = ImageContainer(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,2,2]=1
container.add_image(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,3,0]=1    # if we set [2,2,0] = 1, we can expect all images will have just 1x1 pixel size
container.add_image(image)
image = np.zeros(shape = [5,5,3]) # some image
image[2,2,1]=1
container.add_image(image)
>>> container.container.shape
('container shape', (4, 5, 5, 3)) # 4 images, size 5x5, 3 channels


# remove borders to all images at once
xs,ys,zs,zzs = np.where(container.container!=0) 
# for 4D object

# extract cube with extreme limits of where are the values != 0
result = container.container[min(xs):max(xs)+1,min(ys):max(ys)+1,min(zs):max(zs)+1,min(zzs):max(zzs)+1]

>>> print('Final shape:',result.shape) 


('Final shape', (4, 1, 2, 3)) # 4 images, size: 1x2, 3 channels
Sign up to request clarification or add additional context in comments.

11 Comments

Bottom border should not be removed because the second image has a 1 in the bottom row. The cropping should take into account all images and "relative" coordinates of each pixel should stay the same.
ok then. I got confused. Now it shouls be correct. Put in it 3D body, and array in 'result' should be a minimal cube where extreme values!=0 are touching the side
Either I'm missing something or this doesn't work for the whole data set. If I have 10 3D images I will need to find the bounding box that will include all non-zero values over all the 10 images.
If you have 10 3D images, than you will just put that array in my script and it should work
I thought I was clear with testing dataset in my script
|
3

You could see your problem as trimming for a specific bounding box on the array formed by putting all shapes you have together in one array.

Therefore, if you have an n-dimensional trimming function, the solution is just to apply that.

One way of implementing this would be:

import numpy as np

def trim(arr, mask):
    bounding_box = tuple(
        slice(np.min(indexes), np.max(indexes) + 1)
        for indexes in np.where(mask))
    return arr[bounding_box]

A slightly more flexible solution (where you could indicate which axis to act on) is available in FlyingCircus (Disclaimer: I am the main author of the package).

So, if you have your list of n-dim arrays (in arrs), you could first stack them using np.stack() and then trim the result:

import numpy as np

arr = np.stack(arrs, -1)
trimmed_arr = trim(arr, arr != 0)

which could then by separated back using np.split(), e.g.:

trimmed_list = np.split(trimmed_arr, arr.shape[-1], -1)

EDIT:

I just realized that this is using substantially the same approach as the other answers, except that it looks much cleaner to me.

1 Comment

This is pretty cool. I prefer it as a one-liner for single arrays: return arr[tuple(slice(np.min(idx), np.max(idx) + 1) for idx in np.where(arr != 0))]
2

Update:

Based on Martin's solution using min/max and np.where, but generalizing it to any dimension, you can do it in this way:

def bounds_per_dimension(ndarray):
    return map(
        lambda e: range(e.min(), e.max() + 1),
        np.where(ndarray != 0)
    )

def zero_trim_ndarray(ndarray):
    return ndarray[np.ix_(*bounds_per_dimension(ndarray))]

d = np.array([[
    [0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0],
    [0, 1, 1, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
], [
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0],
]])

zero_trim_ndarray(d)

1 Comment

Using range() and np.ix_() for this is going to be unnecessarily slow. If you would time that code against the slice() / arr[] approach (as used in my answer), you would get, even for this simple example using d as input, ~2x speed difference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.