I'm assuming arr is sparse:
- you say the clusters are small, and 1000 clusters isn't going to tile an array that big
- you iterate over
np.unique(arr)[1:], so I assume the first unique value is 0
In this case I would recommend leveraging a scipy.sparse.csr_matrix
from scipy.sparse import csr_matrix
sp_arr = csr_matrix(arr.reshape(1,-1))
This turns your big dense array into a one-row compressed sparse row array. Since sparse arrays don't like more than 2 dimensions, this tricks it into using ravelled indices. Now sp_arr has data (the cluster labels), indices (the ravelled indices), and indptr (which is trivial here since we only have one row). So,
for i in np.unique(sp_arr.data): # as a bonus this `unique` call should be faster too
x, y, z = np.unravel_index(sp_arr.indices[sp_arr.data == i], arr.shape)
Should much more efficiently give equivalent coordinates to
for i in np.unique(arr)[:1]:
x, y, z = np.nonzero(arr == i)
where x, y, z are the indices of the True values in mask. From there you can either reconstruct mask or work off the indices (recommended).
You could also do this purely with numpy, and still have a boolean mask at the end, but a bit less memory efficient:
all_mask = arr != 0 # points assigned to any cluster
data = arr[all_mask] # all cluster labels
for i in np.unique(data):
mask = all_mask.copy()
mask[mask] = data == i # now mask is same as before