1

The following snippet works perfectly in outputting the correct UTF8 character representation:

a = b"Tenemos la Soluci\xc3\xb3n"
a.decode('utf8')
'Tenemos la Solución' # correct output

But in my use-case the actual bytes are stored as a string in Database. In that case how do i retrieve the output with correct UTF8 representation ?

a = "Tenemos la Soluci\xc3\xb3n" # retrieved from Database
b = bytes(a, 'utf8')
b.decode('utf8')
'Tenemos la Solución' # incorrect output

Please suggest how to resolve this.

1 Answer 1

2

What you have is mojibake, and it occurs when, for example, UTF-8-encoded text is stored in a database configured for ISO-8859-1 or similar encoding. latin1 is a 1:1 mapping of Unicode code points to equivalent bytes, assuming the Unicode string only contains U+0000 to U+00FF, and can be used to reverse the problem:

>>> a = "Tenemos la Soluci\xc3\xb3n" # retrieved from Database
>>> a.encode('latin1').decode('utf8')
'Tenemos la Solución'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.