5

I'm trying this code:

s = "سلام"
'{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))

but this error occurs:

'{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd3 in position 0: ordinal not in range(128)

I tried '{:b}'.format(int(s.encode('utf-8').encode('hex'), 16)) but nothing changed.

what should I do?

7
  • Please copy and paste the text of a traceback, not a screenshot. Commented Oct 8, 2013 at 21:25
  • I copied an pasted it... Commented Oct 8, 2013 at 21:26
  • You have a bytestring, not unicode. s is already encoded in whatever codec your terminal uses. Commented Oct 8, 2013 at 21:26
  • yes, if I change it to s = u'سلام' everything solves but it's a variable which I receive from user by a simple input. It's not a static string. how can I put different strings in u'' ? Commented Oct 8, 2013 at 21:31
  • 1
    Input in the terminal is encoded with the sys.stdin.encoding codec. You can use that to decode to Unicode. Commented Oct 8, 2013 at 21:37

1 Answer 1

7

Since you're using python 2, s = "سلام" is a byte string (in whatever encoding your terminal uses, presumably utf8):

>>> s = "سلام"
>>> s
'\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85'

You cannot encode byte strings (as they are already "encoded"). You're looking for unicode ("real") strings, which in python2 must be prefixed with u:

>>> s = u"سلام"
>>> s
u'\u0633\u0644\u0627\u0645'
>>> '{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))
'1101100010110011110110011000010011011000101001111101100110000101'

If you're getting a byte string from a function such as raw_input then your string is already encoded - just skip the encode part:

'{:b}'.format(int(s.encode('hex'), 16))

or (if you're going to do anything else with it) convert it to unicode:

s = s.decode('utf8')

This assumes that your input is UTF-8 encoded, if this might not be the case, check sys.stdin.encoding first.

i10n stuff is complicated, here are two articles that will help you further:

Sign up to request clarification or add additional context in comments.

2 Comments

it's a variable which I receive from user. It's not a static string. how can I put different strings in u'' ?
yes, so what should I do? how can I convert it to unicode string?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.