71

Assume I have a list of words, and I want to find the number of times each word appears in that list.

An obvious way to do this is:

words = "apple banana apple strawberry banana lemon"
uniques = set(words.split())
freqs = [(item, words.split().count(item)) for item in uniques]
print(freqs)

But I find this code not very good, because the program runs through the word list twice, once to build the set, and a second time to count the number of appearances.

Of course, I could write a function to run through the list and do the counting, but that wouldn't be so Pythonic. So, is there a more efficient and Pythonic way?

2
  • 1
    Not twice, it looks like O(N*N) complexity Commented May 21, 2009 at 15:10
  • You may be interested in: stackoverflow.com/a/20308657/2534876 for issues of performance. Commented Dec 31, 2014 at 5:31

15 Answers 15

153

The Counter class in the collections module is purpose built to solve this type of problem:

from collections import Counter
words = "apple banana apple strawberry banana lemon"
Counter(words.split())
# Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})
Sign up to request clarification or add additional context in comments.

3 Comments

According to stackoverflow.com/a/20308657/2534876, this is fastest on Python3 but slow on Python2.
do you know if there is a flag to convert this to a percentage freq_dict? E.g., 'apple' : .3333 (2/6),
@Tommy total = sum(your_counter_object.values()) then freq_percentage = {k: v/total for k, v in your_counter_object.items()}
95

defaultdict to the rescue!

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"

d = defaultdict(int)
for word in words.split():
    d[word] += 1

This runs in O(n).

1 Comment

This is a very old answer. Use Counter instead.
12
freqs = {}
for word in words:
    freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

I think this results to the same as Triptych's solution, but without importing collections. Also a bit like Selinap's solution, but more readable imho. Almost identical to Thomas Weigel's solution, but without using Exceptions.

This could be slower than using defaultdict() from the collections library however. Since the value is fetched, incremented and then assigned again. Instead of just incremented. However using += might do just the same internally.

Comments

11

Standard approach:

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"
words = words.split()
result = defaultdict(int)
for word in words:
    result[word] += 1

print result

Groupby oneliner:

from itertools import groupby

words = "apple banana apple strawberry banana lemon"
words = words.split()

result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result

2 Comments

Is there a difference in complexity? Does groupby use sorting? Then it seems to need O(nlogn) time?
Oops, it seems Nick Presta below has pointed out that the groupby approach uses O(nlogn).
7

If you don't want to use the standard dictionary method (looping through the list incrementing the proper dict. key), you can try this:

>>> from itertools import groupby
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

It runs in O(n log n) time.

Comments

3

Without defaultdict:

words = "apple banana apple strawberry banana lemon"
my_count = {}
for word in words.split():
    try: my_count[word] += 1
    except KeyError: my_count[word] = 1

3 Comments

Seems slower than defaultdict in my tests
splitting by a space is redundant. Also, you should use the dict.set_default method instead of the try/except.
It's a lot slower because you are using Exceptions. Exceptions are very costly in almost any language. Avoid using them for logic branches. Look at my solution for an almost identical method, but without using Exceptions: stackoverflow.com/questions/893417/…
2
user_input = list(input().split(' '))

for word in user_input:

    print('{} {}'.format(word, user_input.count(word)))

Comments

1
words = "apple banana apple strawberry banana lemon"
w=words.split()
e=list(set(w))       
word_freqs = {}
for i in e:
    word_freqs[i]=w.count(i)
print(word_freqs)   

Hope this helps!

Comments

0

Can't you just use count?

words = 'the quick brown fox jumps over the lazy gray dog'
words.count('z')
#output: 1

1 Comment

The question already uses "count", and asks for better alternatives.
0

I happened to work on some Spark exercise, here is my solution.

tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

**#output of the above **

{'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}

Comments

0
list = input()  # Providing user input passes multiple tests
text = list.split()

for word in text:
    freq = text.count(word) 
    print(word, freq)

Comments

0

Use reduce() to convert the list to a single dict.

from functools import reduce

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

Comments

0

I had a similar assignment on Zybook, this is the solution that worked for me.

def build_dictionary(words):
    counts = dict()
    for word in words:
        if word in counts:
             counts[word] += 1
        else:
             counts = 1
    return counts
if __name__ == '__main__':
    words = input().split()
    your_dictionary = build_dictionary(words)
    sorted_keys = sorted(your_dictionary.keys())
    for key in sorted_keys:
        print(key + ':' + str(your_dictionary[key])) 

Comments

0

Here is my solution. No imports, just a simple nested loop.

words = input().split(" ")

for word in words:

    word_count = 0
    for word2 in words:

       if word2.lower() == word.lower():
           word_count += 1
    print(f'{word} {word_count}')

1 Comment

Simple but much slower: the double loop means quadratic time. The linear time solutions will be much faster.
-1

The answer below takes some extra cycles, but it is another method

def func(tup):
    return tup[-1]


def print_words(filename):
    f = open("small.txt",'r')
    whole_content = (f.read()).lower()
    print whole_content
    list_content = whole_content.split()
    dict = {}
    for one_word in list_content:
        dict[one_word] = 0
    for one_word in list_content:
        dict[one_word] += 1
    print dict.items()
    print sorted(dict.items(),key=func)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.