Item frequency count in Python

Question

Assume I have a list of words, and I want to find the number of times each word appears in that list.

An obvious way to do this is:

words = "apple banana apple strawberry banana lemon"
uniques = set(words.split())
freqs = [(item, words.split().count(item)) for item in uniques]
print(freqs)

But I find this code not very good, because the program runs through the word list twice, once to build the set, and a second time to count the number of appearances.

Of course, I could write a function to run through the list and do the counting, but that wouldn't be so Pythonic. So, is there a more efficient and Pythonic way?

You may be interested in: stackoverflow.com/a/20308657/2534876 for issues of performance. — JDong
– JDong, Commented Dec 31, 2014 at 5:31

sykloid · Accepted Answer · 2019-04-25 21:53:14Z

153

The Counter class in the collections module is purpose built to solve this type of problem:

from collections import Counter
words = "apple banana apple strawberry banana lemon"
Counter(words.split())
# Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})

edited Apr 25, 2019 at 21:53

user3064538

answered May 21, 2009 at 15:16

sykloid

102k12 gold badges67 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JDong Over a year ago

According to stackoverflow.com/a/20308657/2534876, this is fastest on Python3 but slow on Python2.

Tommy Over a year ago

do you know if there is a flag to convert this to a percentage freq_dict? E.g., 'apple' : .3333 (2/6),

user3064538 Over a year ago

@Tommy total = sum(your_counter_object.values()) then freq_percentage = {k: v/total for k, v in your_counter_object.items()}

Kenan Banks · Accepted Answer · 2009-05-21 15:10:59Z

95

defaultdict to the rescue!

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"

d = defaultdict(int)
for word in words.split():
    d[word] += 1

This runs in O(n).

answered May 21, 2009 at 15:10

Kenan Banks

213k36 gold badges160 silver badges176 bronze badges

1 Comment

user3064538 Over a year ago

This is a very old answer. Use Counter instead.

hopla · Accepted Answer · 2009-06-11 20:32:08Z

12

freqs = {}
for word in words:
    freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

I think this results to the same as Triptych's solution, but without importing collections. Also a bit like Selinap's solution, but more readable imho. Almost identical to Thomas Weigel's solution, but without using Exceptions.

This could be slower than using defaultdict() from the collections library however. Since the value is fetched, incremented and then assigned again. Instead of just incremented. However using += might do just the same internally.

edited Jun 11, 2009 at 20:32

answered Jun 11, 2009 at 20:21

hopla

3,3924 gold badges30 silver badges26 bronze badges

Comments

Community · Accepted Answer · 2019-10-08 06:37:54Z

11

Standard approach:

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"
words = words.split()
result = defaultdict(int)
for word in words:
    result[word] += 1

print result

Groupby oneliner:

from itertools import groupby

words = "apple banana apple strawberry banana lemon"
words = words.split()

result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result

edited Oct 8, 2019 at 6:37

CommunityBot

11 silver badge

answered May 21, 2009 at 15:11

nosklo

224k58 gold badges300 silver badges299 bronze badges

2 Comments

Daniyar Over a year ago

Is there a difference in complexity? Does groupby use sorting? Then it seems to need O(nlogn) time?

Daniyar Over a year ago

Oops, it seems Nick Presta below has pointed out that the groupby approach uses O(nlogn).

Nick Presta · Accepted Answer · 2009-05-21 15:09:57Z

7

If you don't want to use the standard dictionary method (looping through the list incrementing the proper dict. key), you can try this:

>>> from itertools import groupby
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

It runs in O(n log n) time.

answered May 21, 2009 at 15:09

Nick Presta

28.8k6 gold badges60 silver badges76 bronze badges

Comments

tzot · Accepted Answer · 2009-05-21 22:36:56Z

3

Without defaultdict:

words = "apple banana apple strawberry banana lemon"
my_count = {}
for word in words.split():
    try: my_count[word] += 1
    except KeyError: my_count[word] = 1

edited May 21, 2009 at 22:36

tzot

96.6k30 gold badges151 silver badges210 bronze badges

answered May 21, 2009 at 15:59

Thomas Weigel

1591 silver badge2 bronze badges

3 Comments

nosklo Over a year ago

Seems slower than defaultdict in my tests

Kenan Banks Over a year ago

splitting by a space is redundant. Also, you should use the dict.set_default method instead of the try/except.

hopla Over a year ago

It's a lot slower because you are using Exceptions. Exceptions are very costly in almost any language. Avoid using them for logic branches. Look at my solution for an almost identical method, but without using Exceptions: stackoverflow.com/questions/893417/…

dB_19 · Accepted Answer · 2021-08-07 07:10:09Z

2

user_input = list(input().split(' '))

for word in user_input:

    print('{} {}'.format(word, user_input.count(word)))

answered Aug 7, 2021 at 7:10

dB_19

212 bronze badges

Comments

user2922935 · Accepted Answer · 2019-11-16 14:04:54Z

1

words = "apple banana apple strawberry banana lemon"
w=words.split()
e=list(set(w))       
word_freqs = {}
for i in e:
    word_freqs[i]=w.count(i)
print(word_freqs)

Hope this helps!

edited Nov 16, 2019 at 14:04

user2922935

4494 silver badges12 bronze badges

answered Nov 12, 2017 at 16:17

Varun Shaandhesh

791 silver badge10 bronze badges

Comments

Antonio · Accepted Answer · 2011-04-07 05:36:08Z

0

Can't you just use count?

words = 'the quick brown fox jumps over the lazy gray dog'
words.count('z')
#output: 1

answered Apr 7, 2011 at 5:36

Antonio

11

1 Comment

Daniyar Over a year ago

The question already uses "count", and asks for better alternatives.

Jaffer Wilson · Accepted Answer · 2015-06-26 06:56:40Z

0

I happened to work on some Spark exercise, here is my solution.

tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

**#output of the above **

{'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}

edited Jun 26, 2015 at 6:56

Jaffer Wilson

7,30313 gold badges77 silver badges161 bronze badges

answered Jun 26, 2015 at 6:02

javaidiot

1

Comments

PanamaPHat · Accepted Answer · 2021-10-11 01:11:02Z

0

list = input()  # Providing user input passes multiple tests
text = list.split()

for word in text:
    freq = text.count(word) 
    print(word, freq)

answered Oct 11, 2021 at 1:11

PanamaPHat

511 silver badge3 bronze badges

Comments

theherk · Accepted Answer · 2021-10-15 10:03:40Z

0

Use reduce() to convert the list to a single dict.

from functools import reduce

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

edited Oct 15, 2021 at 10:03

theherk

7,6173 gold badges32 silver badges62 bronze badges

answered Feb 23, 2016 at 18:03

Gadi

1,1629 silver badges6 bronze badges

Comments

B1029 · Accepted Answer · 2022-04-17 01:35:43Z

0

I had a similar assignment on Zybook, this is the solution that worked for me.

def build_dictionary(words):
    counts = dict()
    for word in words:
        if word in counts:
             counts[word] += 1
        else:
             counts = 1
    return counts
if __name__ == '__main__':
    words = input().split()
    your_dictionary = build_dictionary(words)
    sorted_keys = sorted(your_dictionary.keys())
    for key in sorted_keys:
        print(key + ':' + str(your_dictionary[key]))

answered Apr 17, 2022 at 1:35

B1029

11 bronze badge

Comments

Kovy Jacob · Accepted Answer · 2025-02-24 20:00:42Z

0

Here is my solution. No imports, just a simple nested loop.

words = input().split(" ")

for word in words:

    word_count = 0
    for word2 in words:

       if word2.lower() == word.lower():
           word_count += 1
    print(f'{word} {word_count}')

edited Feb 24 at 20:00

Kovy Jacob

1,17911 silver badges26 bronze badges

answered Feb 24 at 12:31

jesse

1

1 Comment

joanis Feb 24 at 16:59

Simple but much slower: the double loop means quadratic time. The linear time solutions will be much faster.

Jesse · Accepted Answer · 2013-02-27 02:38:37Z

-1

The answer below takes some extra cycles, but it is another method

def func(tup):
    return tup[-1]


def print_words(filename):
    f = open("small.txt",'r')
    whole_content = (f.read()).lower()
    print whole_content
    list_content = whole_content.split()
    dict = {}
    for one_word in list_content:
        dict[one_word] = 0
    for one_word in list_content:
        dict[one_word] += 1
    print dict.items()
    print sorted(dict.items(),key=func)

edited Feb 27, 2013 at 2:38

Jesse

8,7597 gold badges49 silver badges57 bronze badges

answered Feb 27, 2013 at 2:17

Prabhu S

345 bronze badges

Collectives™ on Stack Overflow

Item frequency count in Python

15 Answers 15

3 Comments

1 Comment

Comments

2 Comments

Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

15 Answers 15

3 Comments

1 Comment

Comments

2 Comments

Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related