3

I have a 2D numpy array as follows:

import numpy as np

a=np.array([[1,2],[1,1], [2,1],[2,2],[3,2],[3,2], [3,1], [4,2],[4,1]])
print(a)

I need to count how many values of 1 or 2 occur in column 2 for each value in column 1. For example when x=3 in column 1, there are two instances of the value 2 and one instance of the value 1 in column 2.

Any direction on how to complete this would be appreciated! I think I could do some sort of for loop with np.unique but I am not sure...

3 Answers 3

1

As in your comment, if you want list of lists format, try this:

out = [[k, *np.unique(a[a[:,0] == k,1], return_counts=True)[1]] 
                                              for k in np.unique(a[:,0])]

Out[838]: [[1, 1, 1], [2, 1, 1], [3, 1, 2], [4, 1, 1]]

For 2D-array

out = np.array([[k, *np.unique(a[a[:,0] == k,1], return_counts=True)[1]] 
                                                 for k in np.unique(a[:,0])])

Out[850]:
array([[1, 1, 1],
       [2, 1, 1],
       [3, 1, 2],
       [4, 1, 1]], dtype=int64)

A simple way is using dict comprehension with collections.Counter and np.unique

from collections import Counter

out = {k: Counter(a[a[:,0] == k,1]) for k in np.unique(a[:,0])}

Out[821]:
{1: Counter({2: 1, 1: 1}),
 2: Counter({1: 1, 2: 1}),
 3: Counter({2: 2, 1: 1}),
 4: Counter({2: 1, 1: 1})}
Sign up to request clarification or add additional context in comments.

5 Comments

I am going to apply this to a bigger set of data so do you know how to get this into an array format : x value, number of 1s, number of 2s?
could you add your desired output to your question to make it clearer on array format you mention?
yes, sorry! [1 1 1] [2 1 1] [3 1 2] [4 1 1] I'm not sure if it makes a difference but in my "real" data set, the steps in the first column aren't incremental
@okvoyce: check my edited answer. This answer doesn't depend on the order of first column, so you don't have to worry about the incremental step in first columns
Thankyou for this! Rather than having a list of lists could it be a 2D array?
1

Assuming your values in the first column go from 1 to N and in the second column from 1 to M, this is one very simple and fast way to do that:

import numpy as np

a = np.array([[1, 2], [1, 1], [2, 1], [2, 2], [3, 2], [3, 2], [3, 1], [4, 2], [4, 1]])
c = np.zeros(a.max(0), np.int32)
np.add.at(c, tuple(a.T - 1), 1)
# c[i, j] contains the number of times
# the second column value is j + 1 when
# the first column value is i + 1

# Print result
for i in range(c.shape[0]):
    print(f'Count result for {i + 1}')
    for j in range(c.shape[1]):
        print(f'    Number of {j + 1}s: {c[i, j]}')

Output:

Count result for 1
    Number of 1s: 1
    Number of 2s: 1
Count result for 2
    Number of 1s: 1
    Number of 2s: 1
Count result for 3
    Number of 1s: 1
    Number of 2s: 2
Count result for 4
    Number of 1s: 1
    Number of 2s: 1

This works simply by making an array c of zeros and then basically adding one to every row/column of c indicated by each row of a. Conceptually, it is equivalent to c[a[:, 0] - 1, a[:, 1] - 1] += 1. However, doing that will probably not work, because a contains repeated rows, so NumPy ends up counting only one of those. To do that correctly, you need to use the at method of the np.add ufunc (this method is available in other ufuncs too, see Universal functions (ufuncs)). This adds the given value at each position (tuple(a.T - 1) makes a tuple with the row indices and the column indices) counting repeated positions correctly.

3 Comments

@okvoyce You should just need to concatenate the initial column like: result = np.concatenate([np.arange(1, len(c) + 1)[:, np.newaxis], c], axis=1).
Thats great, thank you! Could you please explain what each bit of the code is doing? I want to make sure I understand it :)
@okvoyce I added an explanation, hope that helps.
0

You can filter np array with the condition then use unique method to get count

try below solution:

import numpy as np

a = np.array(
    [[1, 2], [1, 1], [2, 1], [2, 2], [3, 2], [3, 2], [3, 1], [4, 2], [4, 1]])

b = a[np.any(a == 3, axis=1)]

print(len(b[np.any(b == 2, axis=1)])) #output: 2
print(len(b[np.any(b == 1, axis=1)])) #output: 1

unique, counts = np.unique(b, return_counts=True)

print(dict(zip(unique, counts))) #output: {1: 1, 2: 2, 3: 3}

Short solution:

unique, counts = np.unique(a[np.any(a == 3, axis=1)], return_counts=True) #replace 3 with x

print(dict(zip(unique, counts)))

output:

{1: 1, 2: 2, 3: 3}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.