7

I'm trying to send a POST request to a web app. I'm using the mechanize module (itself a wrapper of urllib2). Anyway, when I try to send a POST request, I get UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128). I tried putting the unicode(string), the unicode(string, encoding="utf-8"), unicode(string).encode() etc, nothing worked - either returned the error above, or the TypeError: decoding Unicode is not supported

I looked at the other SO answers to similar questions, but none helped.

Thanks in advance!

EDIT: Example that produces an error:

prda = "šđćč" #valid UTF-8 characters
prda # typing in python shell 
'\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d'
print prda # in shell
šđćč
prda.encode("utf-8") #in shell
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
unicode(prda)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
2
  • I would help if you showed a small, self-contained example that produces the error. Commented Jan 7, 2012 at 23:46
  • @ekhumoro added example, hope it clears it up Commented Jan 8, 2012 at 0:37

3 Answers 3

9

I assume you're using Python 2.x.

Given a unicode object:

myUnicode = u'\u4f60\u597d'

encode it using utf-8:

mystr = myUnicode.encode('utf-8')

Note that you need to specify the encoding explicitly. By default it'll (usually) use ascii.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply. How would I go about converting it to an unicode object if I have a string variable (instead of the string)? It's buried deep into the code for me to simply add u' prefix before the string variable is assigned.
1

In your example, you use a non-unicode string literal containing non-ascii characters, which results in prda becoming a bytes string.

To achieve this, python uses sys.stdin.encoding to automatically encode the string. In your case, this means the string is gets encoded as "utf-8".

To convert prda to a unicode object, you need to decode it using the appropriate encoding:

>>> print prda.decode('utf-8')
šđćč

Note that, in a script or module, you cannot rely on python to automatically guess the encoding - you would need to explicitly delare the encoding at the top of the file, like this:

# -*- coding: utf-8 -*-

Whenever you encounter unicode errors in Python 2, it is very often because your code is mixing bytes strings with unicode strings. So you should always check what kind of string is causing the error, by using type(string).

If the string object is <type 'str'>, but you need unicode, decode it using the appropriate encoding. If the string object is <type 'unicode'>, but you need bytes, encode it using the appropriate encoding.

Comments

0

You don't need to wrap your chars in unicode calls, because they're already encoded :) if anything, you need to DE-code it to get a unicode object:

>>> s = '\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d'   # your string
>>> s.decode('utf-8')
u'\u0161\u0111\u0107\u010d'
>>> type(s.decode('utf-8'))
<type 'unicode'>

I don't know mechanize so I don't know exactly whether it handles it correctly or not, I'm afraid.

What I'd do with a regular urllib2 POST call, would be to use urlencode :

>>> from urllib import urlencode
>>> postData = urlencode({'test': s })   # note I'm NOT decoding it
>>> postData
'test=%C5%A1%C4%91%C4%87%C4%8D'
>>> urllib2.urlopen(url, postData)   # etc etc etc

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.