Counting number of occurrences in numpy 2D array

Question

I have a 2D numpy array as follows:

import numpy as np

a=np.array([[1,2],[1,1], [2,1],[2,2],[3,2],[3,2], [3,1], [4,2],[4,1]])
print(a)

I need to count how many values of 1 or 2 occur in column 2 for each value in column 1. For example when x=3 in column 1, there are two instances of the value 2 and one instance of the value 1 in column 2.

Any direction on how to complete this would be appreciated! I think I could do some sort of for loop with np.unique but I am not sure...

Andy L. · Accepted Answer · 2020-10-12 12:23:34Z

1

As in your comment, if you want list of lists format, try this:

out = [[k, *np.unique(a[a[:,0] == k,1], return_counts=True)[1]] 
                                              for k in np.unique(a[:,0])]

Out[838]: [[1, 1, 1], [2, 1, 1], [3, 1, 2], [4, 1, 1]]

For 2D-array

out = np.array([[k, *np.unique(a[a[:,0] == k,1], return_counts=True)[1]] 
                                                 for k in np.unique(a[:,0])])

Out[850]:
array([[1, 1, 1],
       [2, 1, 1],
       [3, 1, 2],
       [4, 1, 1]], dtype=int64)

A simple way is using dict comprehension with collections.Counter and np.unique

from collections import Counter

out = {k: Counter(a[a[:,0] == k,1]) for k in np.unique(a[:,0])}

Out[821]:
{1: Counter({2: 1, 1: 1}),
 2: Counter({1: 1, 2: 1}),
 3: Counter({2: 2, 1: 1}),
 4: Counter({2: 1, 1: 1})}

edited Oct 12, 2020 at 12:23

answered Oct 12, 2020 at 11:39

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

okvoyce Over a year ago

I am going to apply this to a bigger set of data so do you know how to get this into an array format : x value, number of 1s, number of 2s?

Andy L. Over a year ago

could you add your desired output to your question to make it clearer on array format you mention?

okvoyce Over a year ago

yes, sorry! [1 1 1] [2 1 1] [3 1 2] [4 1 1] I'm not sure if it makes a difference but in my "real" data set, the steps in the first column aren't incremental

Andy L. Over a year ago

@okvoyce: check my edited answer. This answer doesn't depend on the order of first column, so you don't have to worry about the incremental step in first columns

okvoyce Over a year ago

Thankyou for this! Rather than having a list of lists could it be a 2D array?

javidcf · Accepted Answer · 2020-10-12 12:08:58Z

1

Assuming your values in the first column go from 1 to N and in the second column from 1 to M, this is one very simple and fast way to do that:

import numpy as np

a = np.array([[1, 2], [1, 1], [2, 1], [2, 2], [3, 2], [3, 2], [3, 1], [4, 2], [4, 1]])
c = np.zeros(a.max(0), np.int32)
np.add.at(c, tuple(a.T - 1), 1)
# c[i, j] contains the number of times
# the second column value is j + 1 when
# the first column value is i + 1

# Print result
for i in range(c.shape[0]):
    print(f'Count result for {i + 1}')
    for j in range(c.shape[1]):
        print(f'    Number of {j + 1}s: {c[i, j]}')

Output:

Count result for 1
    Number of 1s: 1
    Number of 2s: 1
Count result for 2
    Number of 1s: 1
    Number of 2s: 1
Count result for 3
    Number of 1s: 1
    Number of 2s: 2
Count result for 4
    Number of 1s: 1
    Number of 2s: 1

This works simply by making an array c of zeros and then basically adding one to every row/column of c indicated by each row of a. Conceptually, it is equivalent to c[a[:, 0] - 1, a[:, 1] - 1] += 1. However, doing that will probably not work, because a contains repeated rows, so NumPy ends up counting only one of those. To do that correctly, you need to use the at method of the np.add ufunc (this method is available in other ufuncs too, see Universal functions (ufuncs)). This adds the given value at each position (tuple(a.T - 1) makes a tuple with the row indices and the column indices) counting repeated positions correctly.

edited Oct 12, 2020 at 12:08

answered Oct 12, 2020 at 11:50

javidcf

59.9k7 gold badges87 silver badges134 bronze badges

3 Comments

javidcf Over a year ago

@okvoyce You should just need to concatenate the initial column like: result = np.concatenate([np.arange(1, len(c) + 1)[:, np.newaxis], c], axis=1).

okvoyce Over a year ago

Thats great, thank you! Could you please explain what each bit of the code is doing? I want to make sure I understand it :)

javidcf Over a year ago

@okvoyce I added an explanation, hope that helps.

Chetan Ameta · Accepted Answer · 2020-10-12 11:22:59Z

0

You can filter np array with the condition then use unique method to get count

try below solution:

import numpy as np

a = np.array(
    [[1, 2], [1, 1], [2, 1], [2, 2], [3, 2], [3, 2], [3, 1], [4, 2], [4, 1]])

b = a[np.any(a == 3, axis=1)]

print(len(b[np.any(b == 2, axis=1)])) #output: 2
print(len(b[np.any(b == 1, axis=1)])) #output: 1

unique, counts = np.unique(b, return_counts=True)

print(dict(zip(unique, counts))) #output: {1: 1, 2: 2, 3: 3}

Short solution:

unique, counts = np.unique(a[np.any(a == 3, axis=1)], return_counts=True) #replace 3 with x

print(dict(zip(unique, counts)))

output:

{1: 1, 2: 2, 3: 3}

edited Oct 12, 2020 at 11:22

answered Oct 12, 2020 at 11:14

Chetan Ameta

7,8943 gold badges34 silver badges46 bronze badges

Collectives™ on Stack Overflow

Counting number of occurrences in numpy 2D array

3 Answers 3

5 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related