1

Good day! I'm having trouble with decoding text to unicode. I need to convert str which is equal to

    '\u4038' # or something like that       

in ASCII and I need to convert this string to ONE unicode symbol. Can you please explain< how to do that? The

    len(unicode('\u4038')) 

prints 6, so this is not a solution:(

If it's needed, the resulting symbol is cyrillic at the most cases.

3
  • Do you mean you have a string '\\u4038' ? Commented Mar 2, 2014 at 12:55
  • 1
    Why do you have this string? Where does it come from? What do you see if you print it? If this is coming from JSON, you want the json module. Commented Mar 2, 2014 at 13:04
  • If you need unicode-escape then something is broken in your data pipeline. Find the source of '\u4038' and fix it instead of using unicode-escape encoding. Commented Mar 2, 2014 at 18:06

2 Answers 2

3

If you mean you have a string '\\u4038', you can use unicode-escape encoding:

>>> s = b'\\u4038' # == br'\u4038'

>>> print(s)
\u4038
>>> len(s)
6

>>> print(s.decode('unicode-escape'))
䀸
>>> len(s.decode('unicode-escape'))
1
Sign up to request clarification or add additional context in comments.

Comments

2

There's probably a better way, but here is one:

In [27]: s = r'\u4038'

In [28]: len(ast.literal_eval('u"' + s + '"'))
Out[28]: 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.