0

The execution of a simple script is not going as thought.

notAllowed = {"â":"a", "à":"a", "é":"e", "è":"e", "ê":"e",
              "î":"i", "ô":"o", "ç":"c", "û":"u"}

word = "dôzerté"
print word

for char in word:
    if char in notAllowed.keys():
        print "hooray"
        word = word.replace(char, notAllowed[char])


print word
print "finished"

The output return the word unchanged, even though it should have changed "ô" and "é" to o and e, thus returning dozerte...

Any ideas?

2 Answers 2

2

How about:

# -*- coding: utf-8 -*-
notAllowed = {u"â":u"a", u"à":u"a", u"é":u"e", u"è":u"e", u"ê":u"e",
          u"î":u"i", u"ô":u"o", u"ç":u"c", u"û":u"u"}

word = u"dôzerté"
print word

for char in word:
if char in notAllowed.keys():
    print "hooray"
    word = word.replace(char, notAllowed[char])


print word
print "finished"

Basically, if you want to assign an unicode string to some variable you need to use:

u"..." 
#instead of just
"..."

to denote the fact that this is the unicode string.

Sign up to request clarification or add additional context in comments.

3 Comments

It might have (not very familiar with Py3), but I tried that in 2.7 and after adding unicode marks it worked for me :)
Thanks kgr. Your fix worked great! :) edit: sorry, i's python 2.7
@Joey: Python 3 still has byte strings and character strings same as Python 2. There is nothing wrong with byte strings per se; you still need them in many scenarios where you are dealing with binary data and non-Unicode interfaces. All Python 3 changed is that (a) it made the unprefixed-string-literal syntax refer to char strings instead of byte strings, and (b) used char strings for several interfaces that were previously bytes but work equally well as bytes or chars.
2

Iterating a string iterates its bytes, not necessarily its characters. If the encoding of your python source file is utf-8, len(word) will be 9 insted of 7 (both special characters have a two-byte encoding). Iterating a unicode string (u"dôzerté") iterates characters, so that should work.

May I also suggest you use unidecode for the task you're trying to achieve?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.