0

I am trying to read in a large .json file into a pandas dataframe on google colab. I have read similar problems on here as well as the api for read_json method to no avail. I feel that the orient='records' arg should work for my json. Any help would be appreciated.

My code:

import pandas as pd

df = pd.read_json('/content/data/events_World_Cup.json', orient='records')

The error:

/usr/local/lib/python3.7/dist-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1157         else:
   1158             self.obj = DataFrame(
-> 1159                 loads(json, precise_float=self.precise_float), dtype=None
   1160             )
   1161 

ValueError: Expected object or value

An element of my json file:

[
    {"eventId": 8, 
     "subEventName": "Simple pass", 
     "tags": [{"id": 1801}], 
     "playerId": 122671, 
     "positions": [{"y": 50, "x": 50}, {"y": 53, "x": 35}], 
     "matchId": 2057954, 
     "eventName": "Pass", 
     "teamId": 16521, 
     "matchPeriod": "1H", 
     "eventSec": 1.6562140000000003, 
     "subEventId": 85, 
     "id": 258612104
     }
.
.
.
]

The entire json file can be found here: https://figshare.com/articles/dataset/Events/7770599?backTo=/collections/Soccer_match_event_dataset/4415000

I am using the events_World_Cup.json to start off given its size.

Thank you

2 Answers 2

1

This worked for me;

pd.read_json('events_World_Cup.json', orient='records')

Pandas version: 1.3.4

Python version: 3.10.0

Can you please check your pandas version?

Sign up to request clarification or add additional context in comments.

3 Comments

my pandas version on colab is 1.3.5 and my python version is 3.7.12. Working on upgrading python version in colab. Would the python version really make that big of a difference?
I dont think its because of python version. I've tried it on local jupyter maybe its because of colab lib dependencies.
For some reason it worked when I placed the json file as locally as possible. It seems the problem was with the path, even though I had the correct path to it.
0

It turns out that this error is thrown even when the file is not found. That error message is extremely misleading. What I had to do was:

drive.mount('/content/drive')

then,

WC_events = pd.read_json('/content/drive/MyDrive/colab_notebooks/491_research/events_World_Cup.json', orient='records')

and this worked.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.