1

Suppose I have something like:

a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a)

which returns string in form:

b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'

Now it's send as simple string (I get it as assertion from eval function). How the heck can I now get normal UTF-8 form of starting word? If there is some better compression than str(bytes(x)) then I would be glad to hear.

1
  • 2
    I don't know what you mean by "I get it as assertion from eval function", but that sounds like you're doing something that's a very bad idea in code right outside the code that you showed us… Commented Jul 5, 2018 at 23:07

2 Answers 2

7

If you want to encode and decode text, that's what the encode and decode methods are for:

>>> a = "Gżegżółka"
>>> b = a.encode('utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = b.decode('utf-8')
>>> c
'Gżegżółka'

Also, notice that UTF-8 is already the default, so you can just do this:

>>> b = a.encode()
>>> c = b.decode()

The only reason you need to specify arguments is:

  • You need to use some other encoding instead of UTF-8,
  • You need to specify a specific error handler, like 'surrogatereplace' instead of 'strict', or
  • Your code has to run in Python 3.0-3.1 (which almost nobody used).

However, if you really want to, you can do what you were already doing; you just need to explicitly specify the encoding in the str call, just as you did in the bytes call:

>>> a = "Gżegżółka"
>>> b = bytes(a, 'utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = str(b, 'utf-8')
>>> c

Calling str on a bytes object without an encoding, as you were doing, doesn't decode it, and doesn't raise an exception like calling bytes on a str without an encoding, because the main job of str is to give you a string representation of the object—and the best string representation of a bytes object is that b'…'.

Sign up to request clarification or add additional context in comments.

Comments

0

I found it. The simplest way to convert string representation of bytes to bytes again is through the eval statement:

a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a) #this is the input we deal with

a = eval(a) #that's how we transform a into bytes
a = str(a, 'utf-8') #...and now we convert it into string

print(a)

3 Comments

As @abarnert commented it looks like you are trying to fix the wrong code but if you are going to do this at least use ast.literal_eval instaed of eval.
Welp, now I found that I just load script without utf-8 encoding. But still, I think that trying to decipher str(bytes(x)) is interesting problem :)
I don't know what you are talking about, there is nothing like that in your question. As you found out you can reconstruct a bytes object from its string representation but you cannot do that with arbitrary objects so I would say a more interesting problem is how you managed to paint yourself into that corner and how to avoid it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.