Mapping a function over all the letters of a token in python

Question

The purpose of this program is to read in an array of tokens, remove the punctuation, turn all the letters lower case, and then print the resulting array. the readTokens and depunctuateTokens functions both work correctly. My problem is with the decapitalizeTokens function. When I run the program I receive this error:

the name of the program is words.py
['hello', 'hello1', 'hello2']
Traceback (most recent call last):
  File "words.py", line 41, in <module>
    main()    
  File "words.py", line 10, in main
    words = decapitalizeTokens(cleanTokens)
  File "words.py", line 35, in decapitalizeTokens
    if (ord(ch) <= ord('Z')):
TypeError: ord() expected string of length 1, but list found

My question is what formal parameters I should put into the decapitalizeTokens function in order to return the array resulting from the depunctuateTokens function, but with all the letters lowercase.

This is my program:

import sys
from scanner import *
arr=[]
def main():
    print("the name of the program is",sys.argv[0])
    for i in range(1,len(sys.argv),1):
        print("   argument",i,"is", sys.argv[i])
    tokens = readTokens("text.txt")
    cleanTokens = depunctuateTokens(arr)
    words = decapitalizeTokens(cleanTokens)

def readTokens(s):
    s=Scanner("text.txt")
    token=s.readtoken()
    while (token != ""):
        arr.append(token)
        token=s.readtoken()
    s.close()
    return arr

def depunctuateTokens(arr):
    result=[]
    for i in range(0,len(arr),1):
        string=arr[i]
        cleaned=""
        punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
        for i in range(0,len(string),1):
            if string[i] not in punctuation:
                cleaned += string[i]
        result.append(cleaned)
    print(result)
    return result

def decapitalizeTokens(result):
    if (ord(result) <= ord('Z')):
        return chr(ord(result) + ord('a') - (ord('A')))
    else:
        print(result)
        return result


main()

As a side note, using a global variable arr, and then also returning it from readTokens but storing that copy in tokens, is doubly confusing. Get rid of the global; move the arr = [] into the first line of readTokens, and just use tokens instead of arr inside main, and it will be a lot clearer. — abarnert
– abarnert, Commented Feb 18, 2014 at 1:13
Are lower() and sub() so mean, they do not deserve your friendship? — Cilyan
– Cilyan, Commented Feb 18, 2014 at 1:16
Also, you almost never want to write a loop over range(len(s)) and then use s[i] within the loop. Just do for char in s:, and use char. — abarnert
– abarnert, Commented Feb 18, 2014 at 1:20
Also, you don't need to write range(0, foo, 1); range(foo) does the same thing. — abarnert
– abarnert, Commented Feb 18, 2014 at 1:28
Yeah this is my first month so I'm still learning. Also, the teacher said we shouldn't use the lower or sub methods for this project — user3321218
– user3321218, Commented Feb 18, 2014 at 3:26

abarnert · Accepted Answer · 2014-02-18 01:22:54Z

2

Your decapitalizeTokens function works on a single character. You're passing it a list of strings. If you want to call it on every character of every string in that list, you need to loop over the list, and then loop over each string, somewhere.

You can do this with explicit loop statements, like this:

words = []
for token in tokens:
    word = ''
    for char in token:
        word += decaptializeTokens(char)
    words += word

… or by using comprehensions:

words = [''.join(decapitalizeTokens(char) for char in token) 
         for token in cleanTokens]

However, I think it would make far more sense to move the loops into the decapitalizeTokens function—both based on its plural name, and on the fact that you have exactly the same loops in the similarly-named depunctuateTokens function. If you build decapitalizeTokens the same way you built depunctuateTokens, then your existing call works fine:

words = decapitalizeTokens(cleanTokens)

As a side note, the built-in lower method on strings already does what you want, so you could replace this whole mess with:

words = [token.lower() for token in cleanTokens]

… which would also fix a nasty bug in your attempt. Consider what, say, decapitalizeTokens would do with a digit or a space.

And, likewise, depunctuateTokens can be similarly replaced by a call to the translate method. For example (slightly different for Python 2.x, but you can read the docs and figure it out):

punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
punctmap = {ord(char): None for char in punctuation}
cleanTokens = [token.translate(punctmap) for token in cleanTokens]

edited Feb 18, 2014 at 1:22

answered Feb 18, 2014 at 1:16

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3321218 Over a year ago

Thanks for all the help! My teacher also said not to use the lower method which is why I didn't use it.

abarnert Over a year ago

@user3321218: That's a good reason. But as a general hint, whenever a teacher says "don't use the lower method", the first thing you should do is look at the lower method and figure out how to write a function with the exact same interface. First, the builtin methods are generally designed to be easy to use, so if you build a function with the same interface, it'll also be easy to use. Second, it's a lot easier to test your function when there's an already-working function that does the exact same thing.

Luis Masuelli · Accepted Answer · 2014-02-18 01:15:14Z

-1

cleanTokens = depunctuateTokens(...) #returns an array into cleantokens.
words = decapitalizeTokens(cleanTokens) #takes an array and returns... whatever.

the fact is that in

def decapitalizeTokens(result):
    if (ord(result) <= ord('Z')):
        return chr(ord(result) + ord('a') - (ord('A')))
    else:
        print(result)
        return result

result is an array (cleanTokens), and ord(result) fails since it expects a string, and not an array.

perhaps doing words = map(decapitalizeTokens, cleanTokens) can help you

answered Feb 18, 2014 at 1:15

Luis Masuelli

12.4k11 gold badges52 silver badges89 bronze badges

1 Comment

abarnert Over a year ago

That won't solve the problem, because decapitalizeTokens only works on a single character. To use it as written, you have to loop twice—over each token in cleanTokens, and also over each character in each token.

Hugh Bothwell · Accepted Answer · 2014-02-18 01:37:28Z

-1

import scanner
import string
import sys

def read_tokens(fname):
    res = []
    with scanner.Scanner(fname) as sc:
        tok = sc.readtoken()
        while tok:
            res.append(tok)
            tok = sc.readtoken()
    return res

def depunctuate(s):
    return s.translate(None, string.punctuation)

def decapitalize(s):
    return s.lower()

def main():
    print("The name of the program is {}.".format(sys.argv[0]))
    for arg in enumerate(sys.argv[1:], 1):
        print("  Argument {} is {}".format(i, arg))

    tokens = read_tokens("text.txt")
    clean_tokens = [depunctuate(decapitalize(tok)) for tok in tokens]

if __name__=="__main__":
    main()

edited Feb 18, 2014 at 1:37

answered Feb 18, 2014 at 1:16

Hugh Bothwell

57k9 gold badges91 silver badges103 bronze badges

5 Comments

abarnert Over a year ago

translate doesn't work like that; you'll just get a TypeError.

abarnert Over a year ago

More importantly, you've told him how to fix the function that already works as written, not how to fix the one that's broken.

Hugh Bothwell Over a year ago

@abarnert: yes, I was looking at the wrong function, and yes, str.translate chokes if you pass a set instead of a string. I have repaired both problems and made his code considerably more Pythonic; please take another look.

abarnert Over a year ago

OK, that version works in Python 2.6-2.7, but does not work in 3.x (which the OP is most likely using—note the print as function), or in 2.5 and earlier either. In 3.x, there is no deletechars argument; you handle it by mapping to None. In 2.5, the table argument cannot be None; you handle it by mapping every ord to itself.

abarnert Over a year ago

More importantly, instead of fixing the OP's code, you've now written completely different code, with no explanation. There's no way he's going to figure out how all of these pieces correspond to what he'd written (especially since most of them don't directly correspond to anything), so he's likely not going to learn anything at all from this, and he's definitely not going to figure out which part of this fixes the problem he was asking about, or how it does so.

Collectives™ on Stack Overflow

Mapping a function over all the letters of a token in python

3 Answers 3

2 Comments

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related