Convert str to unicode in python

Question

Good day! I'm having trouble with decoding text to unicode. I need to convert str which is equal to

    '\u4038' # or something like that

in ASCII and I need to convert this string to ONE unicode symbol. Can you please explain< how to do that? The

    len(unicode('\u4038'))

prints 6, so this is not a solution:(

If it's needed, the resulting symbol is cyrillic at the most cases.

Why do you have this string? Where does it come from? What do you see if you print it? If this is coming from JSON, you want the json module. — user2357112
– user2357112, Commented Mar 2, 2014 at 13:04
If you need unicode-escape then something is broken in your data pipeline. Find the source of '\u4038' and fix it instead of using unicode-escape encoding. — jfs
– jfs, Commented Mar 2, 2014 at 18:06

falsetru · Accepted Answer · 2014-03-02 12:58:54Z

3

If you mean you have a string '\\u4038', you can use unicode-escape encoding:

>>> s = b'\\u4038' # == br'\u4038'

>>> print(s)
\u4038
>>> len(s)
6

>>> print(s.decode('unicode-escape'))
䀸
>>> len(s.decode('unicode-escape'))
1

answered Mar 2, 2014 at 12:58

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

NPE · Accepted Answer · 2014-03-02 12:56:29Z

2

There's probably a better way, but here is one:

In [27]: s = r'\u4038'

In [28]: len(ast.literal_eval('u"' + s + '"'))
Out[28]: 1

answered Mar 2, 2014 at 12:56

NPE

503k114 gold badges970 silver badges1k bronze badges