Keep getting errors reading a Json file with Python

Question

I have a Json file like such:

{"id": "53f43a7bdabfaeb22f497fb8", "name": "Nayara Fernanda Monte", "h_index": 0, "n_pubs": 1, "tags": [], "pubs": [{"i": "53e9bc79b7602d97048f8888", "r": 2}, {"i": "56d8971cdabfae2eee185494", "r": 2}], "n_citation": 0, "orgs": [""]}
{"id": "53f43f5adabfaedf435b9bdf", "name": "J\u00f6rg B\u00e4ssmann", "h_index": 0, "n_pubs": 1, "tags": [{"w": 1, "t": "Vehicle Theft .Immobilisation .Crime Prevention.Crimereduction . Displacement .Motorcycle Theft .Opportunistic Offenders .Professional Offenders . Evaluation.Mixed-Methods Design"}], "pubs": [{"i": "53e9b4a1b7602d9703fad4e7", "r": 0}], "n_citation": 0, "orgs": ["Bingen am Rhein, Germany"]}

I tried reading it using the following code:

import json

with open('path/xyz.json') as f:
data = json.load(f)

However, it returns an error:

'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

How do I fix this error? Thanks.

Change the encoding to utf-16 or using the 'rb' mode and re-try; — Aditya
– Aditya, Commented Jun 21, 2020 at 4:29
If those two lines are really your file, it's not a valid JSON object. It's two JSON objects separated by a newline. — Mark
– Mark, Commented Jun 21, 2020 at 4:30
@MarkMeyer - I think you just saved OP from the next stackoverflow question. — tdelaney
– tdelaney, Commented Jun 21, 2020 at 4:32
this is a file with each line has a dictionary and OP is trying to load all of them together, which is causing error. better to load the all of them one by one and then make a json object — sahasrara62
– sahasrara62, Commented Jun 21, 2020 at 4:45
@sahasrara62 - The most immediate problem is that its a utf-16 encoded file. The line-by-line issue is next. — tdelaney
– tdelaney, Commented Jun 21, 2020 at 4:52

Hikash · Accepted Answer · 2020-06-21 05:05:42Z

1

If you're stuck with the multiple json "documents" in a single file, then you could always do this:


json_documents = []
with open('path/to/file', 'r') as fh:
  for line in fh:
    json_documents.append( json.loads(line) )

this will decode the string version of each line. Note: this only works if each line is a whole json document. If multiple documents are on a single line, or if a single document spans multiple lines, then you'll need to do something fancier.

answered Jun 21, 2020 at 5:05

Hikash

4293 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

tdelaney · Accepted Answer · 2020-06-21 04:50:05Z

0

Microsoft UTF-16 encoded files start with a Byte Order Mark (BOM) of FF, FE or FE, FF, depending on whether the machine is big- or little-endian. In this case, Microsoft stores unicode characters in a two-byte format. usually each 2 bytes store a single unicode character, but even with UTF-16, some encodings will extend to 4 bytes.

As mentioned, encoding=UTF-16 should read it. See the Unicode HowTo.

As a side note, UTF-16 encoded JSON files may not be recognized by all programs. If you plan on passing them in an HTTP packet for instance, reencoding to UTF-8 is likely a good choice.

import json

with open('path/xyz.json', encoding="UTF-16") as f:
    for line in f:
        data = json.loads(line)

edited Jun 21, 2020 at 4:50

answered Jun 21, 2020 at 4:38

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

2 Comments

ch4r1t Over a year ago

I tried this. It said "the JSON object must be str, bytes or bytearray, not TextIOWrapper".

tdelaney Over a year ago

My mistake, I should have decoded the line. There are multiple JSON objects in the file, one per line. You'll have to figure out how you want to handle that.

gxhamster · Accepted Answer · 2020-06-21 05:03:30Z

0

I think the problem is with the JSON file There can only be one set of brackets, all the data is inside the this bracket, But you have seperated the data

You can do something like this:

{
"1": {
    "id": "53f43a7bdabfaeb22f497fb8",
    "name": "Nayara Fernanda Monte",
    "h_index": 0,
    "n_pubs": 1,
    "tags": [],
    "pubs": [{
        "i": "53e9bc79b7602d97048f8888",
        "r": 2
    }, {
        "i": "56d8971cdabfae2eee185494",
        "r": 2
    }],
    "n_citation": 0,
    "orgs": [""]
},
"2": {
    "id": "53f43f5adabfaedf435b9bdf",
    "name": "J\u00f6rg B\u00e4ssmann",
    "h_index": 0,
    "n_pubs": 1,
    "tags": [{
        "w": 1,
        "t": "Vehicle Theft .Immobilisation .Crime Prevention.Crimereduction . Displacement .Motorcycle Theft .Opportunistic Offenders .Professional Offenders . Evaluation.Mixed-Methods Design"
    }],
    "pubs": [{
        "i": "53e9b4a1b7602d9703fad4e7",
        "r": 0
    }],
    "n_citation": 0,
    "orgs": ["Bingen am Rhein, Germany"]
}}

answered Jun 21, 2020 at 5:03

gxhamster

11 bronze badge

1 Comment

lenz Over a year ago

The format is called JSON lines.

Pranav Choudhary · Accepted Answer · 2020-06-21 05:30:27Z

0

The JSON you provided is not a valid JSON.

You are putting multiple JSON objects without any separator or an ARRAY.

For the encoding issue, Seems like the JSON object is being converted to str, from binary.

Try this:

with open('./xyz.json','rb') as f:
  data = json.load(f)

Passed an added parameter 'rb', this will treat the values as binary and won't attempt to convert them into bytes.

Check this reply: https://repl.it/@SourabhLalwani/FickleBriefOperatingenvironment#xyz.json

edited Jun 21, 2020 at 5:30

Pranav Choudhary

2,8163 gold badges20 silver badges39 bronze badges

answered Jun 21, 2020 at 4:52

Sourabh Lalwani

11 bronze badge

Collectives™ on Stack Overflow

Keep getting errors reading a Json file with Python

4 Answers 4

Comments

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related