My code makes gets some content from an UserVoice site. As you might know, UserVoice is a shitty piece of software that can't handle data correctly; indeed, to reduce the amount of text on the search page, they cut the text at, let's say, 300 characters and then add a "..." to the end. Thing is, they don't care cutting in the middle of a multi-bytes character, resulting in a partial utf-8 "byte": eg. for the è char, I got \xc3 instead of \xc3\xa8s.
Of course, when I give this horrible soup to json.loads, it fails with UnicodeDecodeError. So my question is simple: how can I ask json.loads to ignore these bad bytes, as I would do using .decode('utf-8', 'ignore') if I had access to the internals of the function?
Thanks.