3

I want to read json file using python pandas. Each line of the file is a complete object in JSON.

I'm using below versions-

python : 2.7.6

pandas: 1.19.1

json file-

{"id":"111","p_id":"55","name":"aaa","notes":"","childs":[]}
{"id":"222","p_id":"56","name":"bbb","notes":"","childs":[]}
{"id":"333","p_id":"75","name":"ccc","notes":"","childs":[]}
{"id":"444","p_id":"76","name":"ddd","notes":"","childs":["abc","efg","pqr"
,"rtu"]}

I'm using below code to read json file-

df = pd.read_json("temp.txt", lines = True)
print df

The problem is, in json file "childs" key contains a array of not known indexes and in between "\n" is available. so if I run above code I'm getting ValueError: Expected object or value but if I remove "\n" available after "pqr" my code gets work.

I don't want to remove available "\n" from my data. I want to handle this within my code. I want to use python pandas only instead of python json libraries for handling data in good manner.

How I can make use of python pandas only and handle this type of file?

3
  • Read the whole file as string and split it by new line, Then you have 4 json strings which you can simple parse. Commented May 5, 2017 at 11:01
  • @Erik Šťastný- ok but how I can maintain that data in pandas dataframe after spiting it by new line? Commented May 5, 2017 at 11:15
  • make every lines of the json file to be a valid json, is a better way Commented May 5, 2017 at 11:33

2 Answers 2

8

first check if it's a valid json file or not using JSON validator site

once the file is in valid json format you can use the below code to read it as dataframe

with open("training.json") as datafile:
    data = json.load(datafile)
dataframe = pd.DataFrame(data)

hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

0

read_json() can't work because of the new line after "pqr". You can either try and fix that line or try and format the whole thing into valid JSON. I'm doing the latter here by adding commas after new lines and surrounding the whole thing with brackets to form a proper JSON array:

with open('temp.txt') as f:
    content = f.read()

pd.read_json('[' + content.replace('}\n', '},') + ']')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.