How to ensure matched quotations in JSON using Python?

Question

I'm parsing through a text a sometimes I get the following

{"name":"John","last" : Doe", "Food":"Fries","Coffee" : "Need}

I'm dealing with someone else's data here so I just have to deal with it.

Is there a possible use of regex expressions (or anything else for that matter) Where I can read through the file and whenever I find unmatched quotations modify the file by matching them.

So I can end up with

{"name":"John","last" : "Doe", "Food":"Fries","Coffee" : "Need"}

is the unmatched quote always the last thing before a closing bracket? — MoxieBall
– MoxieBall, Commented Jun 14, 2018 at 21:14
Do your real-life strings contain only letters, or at least not contain any characters with special meaning to JSON like []{}"\:,? — abarnert
– abarnert, Commented Jun 14, 2018 at 21:26
What you're asking for is basically impossible in general, because it's ambiguous—but it may be possible, or even dead easy, for your particular data set. For example, if none of those special characters ever appear in your JSON strings, you know that an unclosed quote was supposed to end at the next one of ,:]}, and an opened quote is only a little more complicated. But if you have to handle strings like "spam:\"eggs\"}" that may be missing quotes, that's a different story. — abarnert
– abarnert, Commented Jun 14, 2018 at 21:28

DYZ · Accepted Answer · 2018-06-14 21:57:07Z

1

If missing quotation marks are the only problem with the text and there are no escaped quotation marks within the fields, then you can repair the text by looking for the four types of irregularities.

s = '{name":"John","last" : Doe", "Food:"Fries","Coffee" : "Need}'

A missing quotation mark after a semicolon:

s = re.sub('"\s*:\s*(?=[^\s"])', '":"', s)

A missing quotation mark before a semicolon:

s = re.sub('(?<=[^\s"])\s*:\s*"', '":"', s)

A missing quotation mark before the closing brace:

s = re.sub('(?<=[^\s"])\s*\}', '"}', s)

A missing quotation mark after the opening brace:

s = re.sub('\{\s*(?=[^\s"])', '{"', s)

Apply all four transformations one after another, and hopefully the problem is gone:

print(s)
#{"name":"John","last":"Doe", "Food":"Fries","Coffee" : "Need"}

answered Jun 14, 2018 at 21:57

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to ensure matched quotations in JSON using Python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related