1

The purpose of this program is to read in an array of tokens, remove the punctuation, turn all the letters lower case, and then print the resulting array. the readTokens and depunctuateTokens functions both work correctly. My problem is with the decapitalizeTokens function. When I run the program I receive this error:

the name of the program is words.py
['hello', 'hello1', 'hello2']
Traceback (most recent call last):
  File "words.py", line 41, in <module>
    main()    
  File "words.py", line 10, in main
    words = decapitalizeTokens(cleanTokens)
  File "words.py", line 35, in decapitalizeTokens
    if (ord(ch) <= ord('Z')):
TypeError: ord() expected string of length 1, but list found

My question is what formal parameters I should put into the decapitalizeTokens function in order to return the array resulting from the depunctuateTokens function, but with all the letters lowercase.

This is my program:

import sys
from scanner import *
arr=[]
def main():
    print("the name of the program is",sys.argv[0])
    for i in range(1,len(sys.argv),1):
        print("   argument",i,"is", sys.argv[i])
    tokens = readTokens("text.txt")
    cleanTokens = depunctuateTokens(arr)
    words = decapitalizeTokens(cleanTokens)

def readTokens(s):
    s=Scanner("text.txt")
    token=s.readtoken()
    while (token != ""):
        arr.append(token)
        token=s.readtoken()
    s.close()
    return arr

def depunctuateTokens(arr):
    result=[]
    for i in range(0,len(arr),1):
        string=arr[i]
        cleaned=""
        punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
        for i in range(0,len(string),1):
            if string[i] not in punctuation:
                cleaned += string[i]
        result.append(cleaned)
    print(result)
    return result

def decapitalizeTokens(result):
    if (ord(result) <= ord('Z')):
        return chr(ord(result) + ord('a') - (ord('A')))
    else:
        print(result)
        return result


main()
5
  • As a side note, using a global variable arr, and then also returning it from readTokens but storing that copy in tokens, is doubly confusing. Get rid of the global; move the arr = [] into the first line of readTokens, and just use tokens instead of arr inside main, and it will be a lot clearer. Commented Feb 18, 2014 at 1:13
  • Are lower() and sub() so mean, they do not deserve your friendship? Commented Feb 18, 2014 at 1:16
  • Also, you almost never want to write a loop over range(len(s)) and then use s[i] within the loop. Just do for char in s:, and use char. Commented Feb 18, 2014 at 1:20
  • Also, you don't need to write range(0, foo, 1); range(foo) does the same thing. Commented Feb 18, 2014 at 1:28
  • Yeah this is my first month so I'm still learning. Also, the teacher said we shouldn't use the lower or sub methods for this project Commented Feb 18, 2014 at 3:26

3 Answers 3

2

Your decapitalizeTokens function works on a single character. You're passing it a list of strings. If you want to call it on every character of every string in that list, you need to loop over the list, and then loop over each string, somewhere.

You can do this with explicit loop statements, like this:

words = []
for token in tokens:
    word = ''
    for char in token:
        word += decaptializeTokens(char)
    words += word

… or by using comprehensions:

words = [''.join(decapitalizeTokens(char) for char in token) 
         for token in cleanTokens]

However, I think it would make far more sense to move the loops into the decapitalizeTokens function—both based on its plural name, and on the fact that you have exactly the same loops in the similarly-named depunctuateTokens function. If you build decapitalizeTokens the same way you built depunctuateTokens, then your existing call works fine:

words = decapitalizeTokens(cleanTokens)

As a side note, the built-in lower method on strings already does what you want, so you could replace this whole mess with:

words = [token.lower() for token in cleanTokens]

… which would also fix a nasty bug in your attempt. Consider what, say, decapitalizeTokens would do with a digit or a space.

And, likewise, depunctuateTokens can be similarly replaced by a call to the translate method. For example (slightly different for Python 2.x, but you can read the docs and figure it out):

punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
punctmap = {ord(char): None for char in punctuation}
cleanTokens = [token.translate(punctmap) for token in cleanTokens]
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for all the help! My teacher also said not to use the lower method which is why I didn't use it.
@user3321218: That's a good reason. But as a general hint, whenever a teacher says "don't use the lower method", the first thing you should do is look at the lower method and figure out how to write a function with the exact same interface. First, the builtin methods are generally designed to be easy to use, so if you build a function with the same interface, it'll also be easy to use. Second, it's a lot easier to test your function when there's an already-working function that does the exact same thing.
-1

cleanTokens = depunctuateTokens(...) #returns an array into cleantokens.
words = decapitalizeTokens(cleanTokens) #takes an array and returns... whatever.

the fact is that in

def decapitalizeTokens(result):
    if (ord(result) <= ord('Z')):
        return chr(ord(result) + ord('a') - (ord('A')))
    else:
        print(result)
        return result

result is an array (cleanTokens), and ord(result) fails since it expects a string, and not an array.

perhaps doing words = map(decapitalizeTokens, cleanTokens) can help you

1 Comment

That won't solve the problem, because decapitalizeTokens only works on a single character. To use it as written, you have to loop twice—over each token in cleanTokens, and also over each character in each token.
-1
import scanner
import string
import sys

def read_tokens(fname):
    res = []
    with scanner.Scanner(fname) as sc:
        tok = sc.readtoken()
        while tok:
            res.append(tok)
            tok = sc.readtoken()
    return res

def depunctuate(s):
    return s.translate(None, string.punctuation)

def decapitalize(s):
    return s.lower()

def main():
    print("The name of the program is {}.".format(sys.argv[0]))
    for arg in enumerate(sys.argv[1:], 1):
        print("  Argument {} is {}".format(i, arg))

    tokens = read_tokens("text.txt")
    clean_tokens = [depunctuate(decapitalize(tok)) for tok in tokens]

if __name__=="__main__":
    main()

5 Comments

translate doesn't work like that; you'll just get a TypeError.
More importantly, you've told him how to fix the function that already works as written, not how to fix the one that's broken.
@abarnert: yes, I was looking at the wrong function, and yes, str.translate chokes if you pass a set instead of a string. I have repaired both problems and made his code considerably more Pythonic; please take another look.
OK, that version works in Python 2.6-2.7, but does not work in 3.x (which the OP is most likely using—note the print as function), or in 2.5 and earlier either. In 3.x, there is no deletechars argument; you handle it by mapping to None. In 2.5, the table argument cannot be None; you handle it by mapping every ord to itself.
More importantly, instead of fixing the OP's code, you've now written completely different code, with no explanation. There's no way he's going to figure out how all of these pieces correspond to what he'd written (especially since most of them don't directly correspond to anything), so he's likely not going to learn anything at all from this, and he's definitely not going to figure out which part of this fixes the problem he was asking about, or how it does so.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.