3

I essentially want to crop an image with numpy—I have a 3-dimension numpy.ndarray object, ie:

[ [0,0,0,0], [255,255,255,255], ....]
  [0,0,0,0], [255,255,255,255], ....] ]

where I want to remove whitespace, which, in context, is known to be either entire rows or entire columns of [0,0,0,0].

Letting each pixel just be a number for this example, I'm trying to essentially do this:

Given this: *EDIT: chose a slightly more complex example to clarify

[ [0,0,0,0,0,0] [0,0,1,1,1,0] [0,1,1,0,1,0] [0,0,0,1,1,0] [0,0,0,0,0,0]]

I'm trying to create this:

[ [0,1,1,1], [1,1,0,1], [0,0,1,1] ]

I can brute force this with loops, but intuitively I feel like numpy has a better means of doing this.

0

3 Answers 3

6

In general, you'd want to look into scipy.ndimage.label and scipy.ndimage.find_objects to extract the bounding box of contiguous regions fulfilling a condition.

However, in this case, you can do it fairly easily with "plain" numpy.

I'm going to assume you have a nrows x ncols x nbands array here. The other convention of nbands x nrows x ncols is also quite common, so have a look at the shape of your array.

With that in mind, you might do something similar to:

mask = im == 0
all_white = mask.sum(axis=2) == 0
rows = np.flatnonzero((~all_white).sum(axis=1))
cols = np.flatnonzero((~all_white).sum(axis=0))

crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1, :]

For your 2D example, it would look like:

import numpy as np

im = np.array([[0,0,0,0,0,0],
               [0,0,1,1,1,0],
               [0,1,1,0,1,0],
               [0,0,0,1,1,0],
               [0,0,0,0,0,0]])

mask = im == 0
rows = np.flatnonzero((~mask).sum(axis=1))
cols = np.flatnonzero((~mask).sum(axis=0))

crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1]
print crop

Let's break down the 2D example a bit.

In [1]: import numpy as np

In [2]: im = np.array([[0,0,0,0,0,0],
   ...:                [0,0,1,1,1,0],
   ...:                [0,1,1,0,1,0],
   ...:                [0,0,0,1,1,0],
   ...:                [0,0,0,0,0,0]])

Okay, now let's create a boolean array that meets our condition:

In [3]: mask = im == 0

In [4]: mask
Out[4]:
array([[ True,  True,  True,  True,  True,  True],
       [ True,  True, False, False, False,  True],
       [ True, False, False,  True, False,  True],
       [ True,  True,  True, False, False,  True],
       [ True,  True,  True,  True,  True,  True]], dtype=bool)

Also, note that the ~ operator functions as logical_not on boolean arrays:

In [5]: ~mask
Out[5]:
array([[False, False, False, False, False, False],
       [False, False,  True,  True,  True, False],
       [False,  True,  True, False,  True, False],
       [False, False, False,  True,  True, False],
       [False, False, False, False, False, False]], dtype=bool)

With that in mind, to find rows where all elements are false, we can sum across columns:

In [6]: (~mask).sum(axis=1)
Out[6]: array([0, 3, 3, 2, 0])

If no elements are True, we'll get a 0.

And similarly to find columns where all elements are false, we can sum across rows:

In [7]: (~mask).sum(axis=0)
Out[7]: array([0, 1, 2, 2, 3, 0])

Now all we need to do is find the first and last of these that are not zero. np.flatnonzero is a bit easier than nonzero, in this case:

In [8]: np.flatnonzero((~mask).sum(axis=1))
Out[8]: array([1, 2, 3])

In [9]: np.flatnonzero((~mask).sum(axis=0))
Out[9]: array([1, 2, 3, 4])

Then, you can easily slice out the region based on min/max nonzero elements:

In [10]: rows = np.flatnonzero((~mask).sum(axis=1))

In [11]: cols = np.flatnonzero((~mask).sum(axis=0))

In [12]: im[rows.min():rows.max()+1, cols.min():cols.max()+1]
Out[12]:
array([[0, 1, 1, 1],
       [1, 1, 0, 1],
       [0, 0, 1, 1]])
Sign up to request clarification or add additional context in comments.

1 Comment

Is it possible to remove empty rows and columns if they are in between the data with this code. To me it looks as if the code is only trimming the outer 'rim' of zeros.
1

One way of implementing this for arbitrary dimensions would be:

import numpy as np

def trim(arr, mask):
    bounding_box = tuple(
        slice(np.min(indexes), np.max(indexes) + 1)
        for indexes in np.where(mask))
    return arr[bounding_box]

A slightly more flexible solution (where you could indicate which axis to act on) is available in FlyingCircus (Disclaimer: I am the main author of the package).

1 Comment

To make the maks use: mask = arr != 0
0

You could use np.nonzero function to find your zero values, then slice nonzero elements from your original array and reshape to what you want:

import numpy as np
n = np.array([ [0,0,0,0,0,0],
   [0,0,1,1,1,0],
   [0,0,1,1,1,0],
   [0,0,1,1,1,0],
   [0,0,0,0,0,0]])

elems = n[n.nonzero()]

In [415]: elems
Out[415]: array([1, 1, 1, 1, 1, 1, 1, 1, 1])

In [416]: elems.reshape(3,3)
Out[416]: 
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

5 Comments

That assumes the region is a known rectangular shape.
what about columns with some zero entries?
Well it will be a rectangle insofar as I am only looking to remove entirely empty rows/columns.
@Jonline - Yes, but you may have some white regions inside the region you want to crop. This won't work in that case. You'd also have to know the size beforehand.
@JoeKington Do you mean I'd have to know the size of the region I intend to keep? Because that is indeed a dilemma.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.