1

How can I get an element-wise count of each element's number of occurrences in a numpy array, along a given axis? By "element-wise," I mean each value of the array should be converted to the number of times it appears.

Simple 2D input:

[[1, 1, 1],
 [2, 2, 2],
 [3, 4, 5]]

Should output:

[[3, 3, 3],
 [3, 3, 3],
 [1, 1, 1]]

The solution also needs to work relative to a given axis. For example, if my input array a has shape (4, 2, 3, 3), which I think of as "a 4x2 matrix of 3x3 matrices," running solution(a) should spit out a (4, 2, 3, 3) solution of the form above, where each 3x3 "submatrix" contains counts of the corresponding elements relative to that submatrix alone, rather than the entire numpy array at large.

More complex example: suppose I take the example input above a and call skimage.util.shape.view_as_windows(a, (2, 2)). This gives me array b of shape (2, 2, 2, 2):

[[[[1 1]
   [2 2]]

  [[1 1]
   [2 2]]]


 [[[2 2]
   [3 4]]

  [[2 2]
   [4 5]]]]

Then solution(b) should output:

[[[[2 2]
   [2 2]]

  [[2 2]
   [2 2]]]


 [[[2 2]
   [1 1]]

  [[2 2]
   [1 1]]]]

So even though the value 1 occurs 3 times in a and 4 times in b, it only occurs twice in each 2x2 window.

3
  • Elaborate on - element-wise count along axis of values in numpy array? What exactly are you counting? Commented Nov 6, 2017 at 5:16
  • @Divakar I want to count the number of occurrences of each element. I'll edit the question to make it more clear. Related to the question you cleverly answered yesterday. Commented Nov 6, 2017 at 5:23
  • 1
    @CurtF. Looping along the relevant axes and constructing a new array using regular python loops is fairly straightforward, but too slow. I looked at using np.histogram and np.bincount but neither seem well-suited for the task, as they require flattened arrays. Commented Nov 6, 2017 at 5:28

1 Answer 1

3

Starting off approach

We can use np.unique to get the counts of occurrences and also tag each element from 0 onwards, letting us index into those counts with the tags for the desired output, like so -

In [43]: a
Out[43]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 4, 5]])

In [44]: _,ids,c = np.unique(a, return_counts=True, return_inverse=True)

In [45]: c[ids].reshape(a.shape)
Out[45]: 
array([[3, 3, 3],
       [3, 3, 3],
       [1, 1, 1]])

For positive integers numbers in input array, we can also use np.bincount -

In [73]: c = np.bincount(a.ravel())

In [74]: c[a]
Out[74]: 
array([[3, 3, 3],
       [3, 3, 3],
       [1, 1, 1]])

For negative integers numbers, simply offset by the minimum in it.

Extending to generic n-dims

Let's use bincount for this -

In [107]: ar
Out[107]: 
array([[[1, 1, 1],
        [2, 2, 2],
        [3, 4, 5]],

       [[2, 3, 5],
        [4, 3, 4],
        [3, 1, 2]]])

In [104]: ar2D = ar.reshape(-1,ar.shape[-2]*ar.shape[-1])

# bincount2D_vectorized from https://stackoverflow.com/a/46256361/ @Divakar
In [105]: c = bincount2D_vectorized(ar2D)

In [106]: c[np.arange(ar2D.shape[0])[:,None], ar2D].reshape(ar.shape)
Out[106]: 
array([[[3, 3, 3],
        [3, 3, 3],
        [1, 1, 1]],

       [[2, 3, 1],
        [2, 3, 2],
        [3, 1, 2]]])
Sign up to request clarification or add additional context in comments.

4 Comments

Awesome!! I just edited my post to give an example for the generic n-dims solution. I'm gonna play around with your "generic n-dims" solution for a few minutes and see if I can massage it to match my example.
@CaptainStiggz For performance, play around with other options as well to do binned counting at that post - stackoverflow.com/a/46256361.
This is brilliant! Can you make any reading recommendations for developing a better intuition for numpy basics? Some of the reshaping you're doing still feels like black magic to a numpy beginner like me. The docs are a bit sparse when it comes to more complex use cases. For example, the docs on np.shape don't cover negative axes, like you use in ar.reshape(-1,ar.shape[-2]*ar.shape[-1])
@CaptainStiggz Well ar.shape is the shape tuple. And, ar.shape[-1] gets us the last element of the tuple, i.e. length of the last axis of the array. -2 is the second last element, hence the length of array along secon last axis. The idea being we need to get the combined length along the last two axes for the reshaping. Also, the -1 in ar.reshape(-1,..) basically means compute the remaining length automatically, while keeping the reshaped array as 2D. For reference, I think the official docs are pretty good.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.