Counting occurances in numpy arrays

Question

I have following two arrays of the same dimension of tags and tag categories. I want to group tags according to categories and count occurrences of tags.

As you can see tags can share same categories ('world', 'hello').

I know this can be easily done with loops but I'm sure numpy has some nifty ways of doing it more efficiently. Any help would be greatly appreciated.

# Tag category
A = [10, 10, 20, 10, 10, 10, 20, 10, 20, 20]
# Tags
B = ['hello', 'world', 'how', 'are', 'you', 'world', 'you', 'how', 'hello', 'hello']

Expected result:

[(10, (('hello', 1), ('are', 1), ('you', 1), ('world', 2))), (20, (('how', 1), ('you', 1), ('hello', 2)))]

Pandas may be more suitable for this.

Ashwini Chaudhary
– Ashwini Chaudhary

2014-11-07 15:16:17 +00:00
Commented Nov 7, 2014 at 15:16 — Ashwini Chaudhary
– Ashwini Chaudhary, Commented Nov 7, 2014 at 15:16

Ashwini Chaudhary · Accepted Answer · 2014-11-07 14:45:48Z

2

You can use nested collections.defaultdict for this.

Here we are going to use the integers from A as key of the outer dict and and for each inner dict we'll use the words from B as key, and their value will be their count.

>>> from collections import defaultdict
>>> from pprint import pprint
>>> d = defaultdict(lambda: defaultdict(int))
>>> for k, v in zip(A, B):
        d[k][v] += 1

Now d contains(I converted it to normal dict, because its output is less confusing):

>>> pprint({k: dict(v) for k, v in d.items()})
{10: {'are': 1, 'hello': 1, 'how': 1, 'world': 2, 'you': 1},
 20: {'hello': 2, 'how': 1, 'you': 1}}

Now we need to loop through the outer dict and call tuple(.iteritems()) on the outer list to get the desired output:

>>> pprint([(k, tuple(v.iteritems())) for k, v in d.items()])
[(10, (('world', 2), ('you', 1), ('hello', 1), ('how', 1), ('are', 1))),
 (20, (('how', 1), ('you', 1), ('hello', 2)))]

edited Nov 7, 2014 at 14:45

answered Nov 7, 2014 at 14:39

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

marcin_koss Over a year ago

I'm not following this part defaultdict(lambda: defaultdict(int)) can you explain in more detail. thanks!

Ashwini Chaudhary Over a year ago

@marcin_koss This creates a nested dictionary structure, where the outermost keys will have a dictionary as value and the inner dictionary will have integer value(default 0).

marcin_koss Over a year ago

Got it, thanks. Now, what would be the best way to also order tag tuples by count?

Ashwini Chaudhary Over a year ago

@marcin_koss You can replace v.iteritems() with sorted(v.iteritems(), key=itemgetter(1)), where itemgetter is operator.itemgetter.

marcin_koss Over a year ago

Perfect, thanks you! I will look into Pandas as well.

Alex Riley · Accepted Answer · 2014-11-07 15:44:07Z

Since it's been mentioned, here's a way to aggregate the values with Pandas.

Setting up a DataFrame...

>>> import pandas as pd
>>> df = pd.DataFrame({'A': A, 'B': B})
>>> df
    A      B
0  10  hello
1  10  world
2  20    how
3  10    are
4  10    you
5  10  world
6  20    you
7  10    how
8  20  hello
9  20  hello

Pivoting to aggregate values...

>>> table = pd.pivot_table(df, rows='B', cols='A', aggfunc='size')
>>> table
A      10  20
B            
are     1 NaN
hello   1   2
how     1   1
world   2 NaN
you     1   1

Converting back to a dictionary...

>>> table.to_dict()
{10: {'are': 1.0, 'hello': 1.0, 'how': 1.0, 'world': 2.0, 'you': 1.0},
 20: {'are': nan, 'hello': 2.0, 'how': 1.0, 'world': nan, 'you': 1.0}}

From here you could use Python to adjust the dictionary to a desired format (e.g. a list).

Irshad Bhat · Accepted Answer · 2014-11-07 15:05:14Z

0

Here is one way:

>>> from collections import Counter
>>> A = np.array([10, 10, 20, 10, 10, 10, 20, 10, 20, 20])
>>> B = np.array(['hello', 'world', 'how', 'are', 'you', 'world', 'you', 'how', 'hello','hello'])
>>> [(i,Counter(B[np.where(A==i)]).items()) for i in set(A)]
[(10, [('world', 2), ('you', 1), ('hello', 1), ('how', 1), ('are', 1)]), (20, [('how', 1), ('you', 1), ('hello', 2)])]

answered Nov 7, 2014 at 15:05

Irshad Bhat

8,7792 gold badges31 silver badges37 bronze badges

1 Comment

Ashwini Chaudhary Over a year ago

This won't scale well as you're doing this in quadratic time.

Alleo · Accepted Answer · 2015-12-04 23:02:09Z

but I'm sure numpy has some nifty ways of doing it more efficiently

and you're right! Here is the code

# convert to integer
category_lookup, categories = numpy.unique(A, return_inverse=True)
tag_lookup, tags = numpy.unique(B, return_inverse=True)

statistics = numpy.zeros([len(category_lookup), len(tag_lookup)])
numpy.add.at(statistics, [categories, tags], 1)

result = {}
for category, stat in zip(category_lookup, statistics):
    result[category] = dict(zip(tag_lookup[stat != 0], stat[stat != 0]))

For explanation see numpy tips and tricks. This gives expected answer:

{10: {'are': 1.0, 'hello': 1.0, 'how': 1.0, 'world': 2.0, 'you': 1.0}, 20: {'hello': 2.0, 'how': 1.0, 'you': 1.0}}

I shall admit, this is not the most clear way to do this (see pandas solution), but it is really fast when you have huge amount of data.

ide3p · Accepted Answer · 2020-11-18 10:37:39Z

0

Python: NumPy Made Counting Occurrences Easy:

#import NumPy

import numpy as np

arr = np.array([0,1,2,2,3,3,7,3,4,0,4,4,0,4,5,0,5,9,5,9,5,8,5]) print(np.sum(arr==4)) #Test occurance of number 4

unique, counts = np.unique(arr,return_counts=True) print(unique,counts)

[0 1 2 3 4 5 7 8 9] [4 1 2 3 4 5 1 1 2]

The above is the output

answered Nov 18, 2020 at 10:37

ide3p

1

Collectives™ on Stack Overflow

Counting occurances in numpy arrays

5 Answers 5

5 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related