Python encode/decode from JSON exceptions.UnicodeDecodeError

Question

I pulling in some JSON data that has something like this:

{
 "string":"â€¢ Christmas 2014 â€¢",
 "layer_id":490,
 "other": "attributes",
 "that_dont": "matter"
}

This JSON is being generated elsewhere and I'm pulling it in via an http request (using json.loads(request.text)).

When I print the string in my console, I get:

â˘ Christmas 2014

(and an exceptions.UnicodeDecodeError error if I try to str())

I'm printing the string on a PDF and need the string to literally be:

"\u00B7 Christmas 2014 \u00B7"

My instincts are a bit hacky and I just want to replace the series of strange characters with the proper unicode point, but I don't even know what it is that I'm looking to replace.

Why U+00B7 and not U+2022? That's the original content, in any case; • Christmas 2014 •. — Martijn Pieters
– Martijn Pieters, Commented Dec 5, 2014 at 15:53

Martijn Pieters · Accepted Answer · 2014-12-05 16:09:31Z

1

Don't use response.text; you are causing a Mojibake here. response.text may end up using the wrong codec if no characterset was specified on the response.

Use response.json() instead, and let that handle the correct codec for your JSON.

If you still see the same result, then the source used cp1252 to decode UTF-8 data and you need to revert that process:

corrected = broken.encode('cp1252').decode('utf8')

which fixes your specific issue:

>>> print u"â€¢ Christmas 2014 â€¢".encode('cp1252').decode('utf8')
• Christmas 2014 •

Those are U+2022 BULLET characters.

You could also use the ftfy library, which can handle Mojibake untangling automatically for you:

>>> import ftfy
>>> print ftfy.fix_text(u"â€¢ Christmas 2014 â€¢")
• Christmas 2014 •

edited Dec 5, 2014 at 16:09

answered Dec 5, 2014 at 15:54

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python encode/decode from JSON exceptions.UnicodeDecodeError

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related