I am trying to read in a large .json file into a pandas dataframe on google colab. I have read similar problems on here as well as the api for read_json method to no avail. I feel that the orient='records' arg should work for my json. Any help would be appreciated.
My code:
import pandas as pd
df = pd.read_json('/content/data/events_World_Cup.json', orient='records')
The error:
/usr/local/lib/python3.7/dist-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1157 else:
1158 self.obj = DataFrame(
-> 1159 loads(json, precise_float=self.precise_float), dtype=None
1160 )
1161
ValueError: Expected object or value
An element of my json file:
[
{"eventId": 8,
"subEventName": "Simple pass",
"tags": [{"id": 1801}],
"playerId": 122671,
"positions": [{"y": 50, "x": 50}, {"y": 53, "x": 35}],
"matchId": 2057954,
"eventName": "Pass",
"teamId": 16521,
"matchPeriod": "1H",
"eventSec": 1.6562140000000003,
"subEventId": 85,
"id": 258612104
}
.
.
.
]
The entire json file can be found here: https://figshare.com/articles/dataset/Events/7770599?backTo=/collections/Soccer_match_event_dataset/4415000
I am using the events_World_Cup.json to start off given its size.
Thank you