7

given an np.array of shape (n_days, n_lat, n_lon), I'd like to compute a histogram with fixed bins for each lat-lon cell (ie the distribution of daily values).

A simple solution to the problem is to loop over the cells and invoke np.histogram for each cell::

bins = np.linspace(0, 1.0, 10)
B = np.rand(n_days, n_lat, n_lon)
H = np.zeros((n_bins, n_lat, n_lon), dtype=np.int32)
for lat in range(n_lat):
    for lon in range(n_lon):
        H[:, lat, lon] = np.histogram(A[:, lat, lon], bins=bins)[0]
# note: code not tested

but this is quite slow. Is there a more efficient solution that does not involve a loop?

I looked into np.searchsorted to get the bin indices for each value in B and then use fancy indexing to update H::

bin_indices = bins.searchsorted(B)
H[bin_indices.ravel(), idx[0], idx[1]] += 1  # where idx is a index grid given by np.indices
# note: code not tested

but this does not work because the in-place add operator (+=) doesn't seem to support multiple updates of the same cell.

thx, Peter

4
  • seems like github.com/numpy/numpy/pull/2821 addressed the fancy indexing and inplace issue. The reason why numpy doesn't allow multiple updates is that a[idx] += 1 would not be the same as a[idx] = a[idx] + 1 . Commented Sep 17, 2013 at 13:59
  • Use np.histogram2d with the weights keyword argument. Commented Sep 17, 2013 at 15:45
  • @Jaime how would I use the weights? I don't want to do a 2d histogram. Commented Sep 19, 2013 at 12:09
  • There's also a np.histogramdd function. Commented Sep 19, 2013 at 12:54

2 Answers 2

4

You can use numpy.apply_along_axis() to eliminate the loop.

import numpy as np

hist, bin_edges = np.apply_along_axis(lambda x: np.histogram(x, bins=bins), 0, B)
Sign up to request clarification or add additional context in comments.

1 Comment

@PeterPrettenhofer just fixed a typo. lambda had letters transposed. Hope this works for you.
0

Maybe this works?:

import numpy as np
n_days=31
n_lat=10
n_lon=10
n_bins=10
bins = np.linspace(0, 1.0, n_bins)
B = np.random.rand(n_days, n_lat, n_lon)


# flatten to 1D
C=np.reshape(B,n_days*n_lat*n_lon)
# use digitize to get the index of the bin to which the numbers belong
D=np.digitize(C,bins)-1
# reshape the results back to the original shape
result=np.reshape(D,(n_days, n_lat, n_lon))

1 Comment

this gives me basically the same as bins.searchsorted(B), an array of shape (n_days, n_lat, n_lon) but the tricky part is how to transform it into (n_bins, n_lat, n_lon).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.