Python3: Decode UTF-8 bytes converted as string

Question

Suppose I have something like:

a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a)

which returns string in form:

b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'

Now it's send as simple string (I get it as assertion from eval function). How the heck can I now get normal UTF-8 form of starting word? If there is some better compression than str(bytes(x)) then I would be glad to hear.

I don't know what you mean by "I get it as assertion from eval function", but that sounds like you're doing something that's a very bad idea in code right outside the code that you showed us… — abarnert
– abarnert, Commented Jul 5, 2018 at 23:07

abarnert · Accepted Answer · 2018-07-05 23:06:45Z

If you want to encode and decode text, that's what the encode and decode methods are for:

>>> a = "Gżegżółka"
>>> b = a.encode('utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = b.decode('utf-8')
>>> c
'Gżegżółka'

Also, notice that UTF-8 is already the default, so you can just do this:

>>> b = a.encode()
>>> c = b.decode()

The only reason you need to specify arguments is:

You need to use some other encoding instead of UTF-8,
You need to specify a specific error handler, like 'surrogatereplace' instead of 'strict', or
Your code has to run in Python 3.0-3.1 (which almost nobody used).

However, if you really want to, you can do what you were already doing; you just need to explicitly specify the encoding in the str call, just as you did in the bytes call:

>>> a = "Gżegżółka"
>>> b = bytes(a, 'utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = str(b, 'utf-8')
>>> c

Calling str on a bytes object without an encoding, as you were doing, doesn't decode it, and doesn't raise an exception like calling bytes on a str without an encoding, because the main job of str is to give you a string representation of the object—and the best string representation of a bytes object is that b'…'.

Ch3shire · Accepted Answer · 2018-07-06 06:41:55Z

0

I found it. The simplest way to convert string representation of bytes to bytes again is through the eval statement:

a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a) #this is the input we deal with

a = eval(a) #that's how we transform a into bytes
a = str(a, 'utf-8') #...and now we convert it into string

print(a)

answered Jul 6, 2018 at 6:41

Ch3shire

1,1162 gold badges15 silver badges39 bronze badges

3 Comments

Stop harming Monica Over a year ago

As @abarnert commented it looks like you are trying to fix the wrong code but if you are going to do this at least use ast.literal_eval instaed of eval.

Ch3shire Over a year ago

Welp, now I found that I just load script without utf-8 encoding. But still, I think that trying to decipher str(bytes(x)) is interesting problem :)

Stop harming Monica Over a year ago

I don't know what you are talking about, there is nothing like that in your question. As you found out you can reconstruct a bytes object from its string representation but you cannot do that with arbitrary objects so I would say a more interesting problem is how you managed to paint yourself into that corner and how to avoid it.

Collectives™ on Stack Overflow

Python3: Decode UTF-8 bytes converted as string

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related