numpy setting multidimensional arrays with index arrays containing NaNs

Question

I have a problem that has been driving me insane for about a week. I am starting off with a large array of shape (2700, 1000, 3) called A and then have 2 arrays of shape (800, 600), called B and C. A and B are filled with indices that are of interest for the larger array as such

A[B[i][j]][C[i][j]].shape

is a 1d array of 3 values such as [0, 0, 0] at the indices given by B[i][j] and C[i][j]. Now I want to set this equal to another array of shape (800, 600) called D. This works out if I use the following method:

D[:] = A[B, C]

However, I am now introducing NaN terms into B and C. This means that A[B][C] returns an error when this is encountered. I cannot simply do the following:

B = np.where(np.logical_or(B>0, C>0), B, 0)

As that will make the NaN values be replaced by 0s, what I ultimately want is when the indices representing B or C are NaN:

D[i][j] = [0, 0, 0]

My most recent attempt was implementing something like this:

D = np.where(np.logical_or(np.isnan(A), np.isnan(B)), self.pix[A, B], [0,0,0])

However the NaN indices are still passing through. Sorry if this post doesnt parse well, I am trying to explain in as well as I can.

Here is a simplified version of what I am trying to achieve, however it does not work yet:

import numpy as np

import numpy as np

coords = np.array([[[3, 4, 2], [2, 1, np.nan]], [[2,3,2],[1, 0, 2]]])
x = np.divide(coords[0], 2)
y = np.divide(coords[1], 2)
a = np.array([1, 1, 1])
a1 = a*1
a2 = a*2
a3 = a*3
a4 = a *4

A =  np.array([[a1, a2, a2, a1], [a2, a3, a3, a4],  [a3, a4, a4, a1],  [a3, a4, a1, a1]])
D =  np.array([[a1, a2, a4], [a1, a3, a2]])

print(np.where(np.isnan(x)))
D = (np.where(x>0, A[x.astype(int), y.astype(int)], [0, 0, 0]))

I cannot even put a NaN in an index array, because NaN is float not integer. Example a = np.arange(10) a[0] = np.nan raises a ValueError. — Paul Panzer
– Paul Panzer, Commented Apr 3, 2018 at 21:09
Can't you keep the masking information separate? Then you could do something along the lines D = np.zeros((*B.shape, 3), A.dtype) D[mask, :] = A[B[mask], C[mask]] — Paul Panzer
– Paul Panzer, Commented Apr 3, 2018 at 21:14
@PaulPanzer Thats because arange(10) initiates the array as dtype=int because you pass it an int, mine is initiated as a float array as it is declared with floats/np.nan — James Driver
– James Driver, Commented Apr 3, 2018 at 21:15
You are, of course, right. What I mean is that that's not an optimal representation, because as floats are not allowed as indices you'll have to ultimately strip the NaNs and convert to int. As this seems unnecessarily cumbersome, why not store the mask separately from the get go? — Paul Panzer
– Paul Panzer, Commented Apr 3, 2018 at 21:24

Divakar · Accepted Answer · 2018-04-03 21:33:38Z

1

You could use separate boolean masks for the NaNs in the indexing arrays and then extend the combined mask to 3D with a new axis using np.newaxis/None and use it with np.where -

B_nanmask = np.isnan(B)
C_nanmask = np.isnan(C)
BC_nanmask = B_nanmask | C_nanmask

# Replace NaNs with zeros to have a *valid* array w/o NaNs
B[B_nanmask] = 0
C[C_nanmask] = 0

out = np.where(BC_nanmask[...,None], 0, A[B.astype(int),C.astype(int)])

Alternatively, assign into indexed array -

out = A[B.astype(int),C.astype(int)]
out[BC_nanmask] = 0

If you don't want to disturb the indexing arrays, we could setup their integer versions separately -

B_int = np.where(B_nanmask, 0, B.astype(int))
C_int = np.where(C_nanmask, 0, C.astype(int))
out = np.where(BC_nanmask[...,None], 0, A[B_int, C_int])

edited Apr 3, 2018 at 21:33

answered Apr 3, 2018 at 21:16

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

numpy setting multidimensional arrays with index arrays containing NaNs

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related