Retrieving bytes from string representation of bytes in python 3

Question

The following snippet works perfectly in outputting the correct UTF8 character representation:

a = b"Tenemos la Soluci\xc3\xb3n"
a.decode('utf8')
'Tenemos la Solución' # correct output

But in my use-case the actual bytes are stored as a string in Database. In that case how do i retrieve the output with correct UTF8 representation ?

a = "Tenemos la Soluci\xc3\xb3n" # retrieved from Database
b = bytes(a, 'utf8')
b.decode('utf8')
'Tenemos la SoluciÃ³n' # incorrect output

Please suggest how to resolve this.

Mark Tolonen · Accepted Answer · 2018-11-22 07:41:42Z

2

What you have is mojibake, and it occurs when, for example, UTF-8-encoded text is stored in a database configured for ISO-8859-1 or similar encoding. latin1 is a 1:1 mapping of Unicode code points to equivalent bytes, assuming the Unicode string only contains U+0000 to U+00FF, and can be used to reverse the problem:

>>> a = "Tenemos la Soluci\xc3\xb3n" # retrieved from Database
>>> a.encode('latin1').decode('utf8')
'Tenemos la Solución'

answered Nov 22, 2018 at 7:41

Mark Tolonen

181k26 gold badges182 silver badges278 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Retrieving bytes from string representation of bytes in python 3

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related