40

As an example, lets say I wanted to list the frequency of each letter of the alphabet in a string. What would be the easiest way to do it?

This is an example of what I'm thinking of... the question is how to make allTheLetters equal to said letters without something like allTheLetters = "abcdefg...xyz". In many other languages I could just do letter++ and increment my way through the alphabet, but thus far I haven't come across a way to do that in python.

def alphCount(text):
  lowerText = text.lower()
  for letter in allTheLetters:  
    print letter + ":", lowertext.count(letter)

10 Answers 10

72

The question you've asked (how to iterate through the alphabet) is not the same question as the problem you're trying to solve (how to count the frequency of letters in a string).

You can use string.lowercase, as other posters have suggested:

import string
allTheLetters = string.lowercase

To do things the way you're "used to", treating letters as numbers, you can use the "ord" and "chr" functions. There's absolutely no reason to ever do exactly this, but maybe it comes closer to what you're actually trying to figure out:

def getAllTheLetters(begin='a', end='z'):
    beginNum = ord(begin)
    endNum = ord(end)
    for number in xrange(beginNum, endNum+1):
        yield chr(number)

You can tell it does the right thing because this code prints True:

import string
print ''.join(getAllTheLetters()) == string.lowercase

But, to solve the problem you're actually trying to solve, you want to use a dictionary and collect the letters as you go:

from collections import defaultdict    
def letterOccurrances(string):
    frequencies = defaultdict(lambda: 0)
    for character in string:
        frequencies[character.lower()] += 1
    return frequencies

Use like so:

occs = letterOccurrances("Hello, world!")
print occs['l']
print occs['h']

This will print '3' and '1' respectively.

Note that this works for unicode as well:

# -*- coding: utf-8 -*-
occs = letterOccurrances(u"héĺĺó, ẃóŕĺd!")
print occs[u'l']
print occs[u'ĺ']

If you were to try the other approach on unicode (incrementing through every character) you'd be waiting a long time; there are millions of unicode characters.

To implement your original function (print the counts of each letter in alphabetical order) in terms of this:

def alphCount(text):
    for character, count in sorted(letterOccurrances(text).iteritems()):
        print "%s: %s" % (character, count)

alphCount("hello, world!")
Sign up to request clarification or add additional context in comments.

6 Comments

you really should use string.ascii_lowercase instead of writing your own getAllTheLetters. also, that is a horribly unpythonic name for a function!
Your letterOccurrances() function will also count whitespace and punctuation, perhaps not intentionally.
Actually the number of Unicode characters is still under a million. Also a few of them are non-alphabetic, so you want to exclude those when printing out frequencies.
"string.ascii_lowercase" -- I hope there's a unicode_lowercase to handle Cyrillic, Greek, etc. I hope it knows how to downcase Turkish I's correctly depending on the current locale.
Rather than collections.defaultdict(lambda: 0), using collections.defaultdict(int) will do the same thing, and is clearer IMO.
|
14

the question is how to make allTheLetters equal to said letters without something like allTheLetters = "abcdefg...xyz"

That's actually provided by the string module, it's not like you have to manually type it yourself ;)

import string

allTheLetters = string.ascii_lowercase

def alphCount(text):
  lowerText = text.lower()
  for letter in allTheLetters:  
    print letter + ":", lowertext.count(letter)

3 Comments

This solution is slow, since it has nested iterations (lowertext.count() iterates over the string in order to find the count)
However, the specific question was answered. Other problems are the original posters problem.
or you can get all the lowercase letters by doing an iteration over the following list: allTheLetters=[chr(i+97) for i in range(26)]
9

If you just want to do a frequency count of a string, try this:

s = 'hi there'
f = {}

for c in s:
        f[c] = f.get(c, 0) + 1

print f

3 Comments

This is a very goot solution as it only iterates once over the given string, and thus is O(n) as opposed to using nested iterations. event better if you use f = defaultdict(int) and the simply f[c]+=1
Is the get member O(1)? If it's O(n), then the whole thing is O(n^2).
@Pax Diablo: Mappings are hashed. Dictionary gets are O(1).
4

For counting objects, the obvious solution is the Counter

from collections import Counter
import string

c = Counter()
for letter in text.lower():
    c[letter] += 1

for letter in string.lowercase:
    print("%s: %d" % (letter, c[letter]))

1 Comment

Even easier, you can replace the assignment loop with: c = Counter(text.lower())
3

Do you mean using:

import string
string.ascii_lowercase

then,

counters = dict()
for letter in string.ascii_lowercase:
    counters[letter] = lowertext.count(letter)

All lowercase letters are accounted for, missing counters will have zero value.

using generators:

counters = 
    dict( (letter,lowertext.count(letter)) for letter in string.ascii_lowercase )

Comments

3

Something like this?

for letter in range(ord('a'), ord('z') + 1):
  print chr(letter) + ":", lowertext.count(chr(letter))

4 Comments

I think your "letter" inside the count() should be "chr(letter)"
Since you fixed it (and didn't have my off-by-one bug resulting in only checking up to 'y' :-), I've deleted my answer and upvoted yours.
@Adam: I temporarily voted it down to remove it from the top position and elevate Matthew's answer. It's also not very Pythonic code.
@John: oooh, market manipulation. Does the SEC monitor these forums? :-)
2

Main question is "iterate through the alphabet":

import string
for c in string.lowercase:
    print c

How get letter frequencies with some efficiency and without counting non-letter characters:

import string

sample = "Hello there, this is a test!"
letter_freq = dict((c,0) for c in string.lowercase)

for c in [c for c in sample.lower() if c.isalpha()]:
    letter_freq[c] += 1

print letter_freq

Comments

0

How about this, to use letters, figures and punctuation (all usable to form a Django key):

import random
import string

chars = string.letters + string.digits + string.punctuation
chars_len = len(chars)
n = 40

print(''.join([chars[random.randint(0, chars_len)] for i in range(n)]))

Example result: coOL:V!D+P,&S*hzbO{a0_6]2!{4|OIbVuAbq0:

Comments

0

Just use:

import string
string.lowercase  
string.uppercase

or

string.letters[:26]  
string.letters[26:]

Comments

-1

This is what I do:

import string
for x in list(string.lowercase):
    print x

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.