1

I am stumped in trying to output this JSON file to a dataframe. I can see the JSON content printed out on the screen, but when I try to load it to a dataframe the result is empty. Any advice gladly appreciated. The output I am looking for is show on the picture: df first few rows

        import json
        from urllib.request import urlopen
        import pandas as pd

        with urlopen('https://statdata.pgatour.com/r/021/2020/player_stats.json') as response:
            source = response.read()

        data = json.loads(source)
        tid = data['tournament']['tournamentNumber']

        for item in data['tournament']['players']:
            try:
                pid = item['pid']
                stats = item['stats']
                for stat in stats:
                    statId = stat['statId']
                    name = stat['name']
                    tValue = stat['tValue']
                    print(tid, pid, statId, name, tValue)
            except Exception as e:
                print(e)
                print(item)
                break 

df = pd.DataFrame (data, columns = ['tid', 'pid', 'statId', 'name', 'tValue'])
print(df)

3 Answers 3

1

IIUC json_normalize

working after your source variable.

if you look in rounds you'll see different values for each statId.

json_obj = json.loads(source)
df = pd.json_normalize(json_obj,record_path=['tournament','players'])
df1 = df.explode('stats')
df1 = df1.join(pd.json_normalize(df1['stats'])).drop('stats',1)

print(df1.drop('rounds',1))

        pid             pn statId             name tValue rank rankAll cValue
0     40026  Daniel Berger    106           Eagles      0  T15     T18       
1     40026  Daniel Berger    106           Eagles      0  T15     T18       
2     40026  Daniel Berger    106           Eagles      0  T15     T18       
3     40026  Daniel Berger    106           Eagles      0  T15     T18       
4     40026  Daniel Berger    106           Eagles      0  T15     T18       
...     ...            ...    ...              ...    ...  ...     ...    ...
3695  01378    David Frost  02567  SG: Off-the-Tee  4.960    2       2       
3696  01378    David Frost  02567  SG: Off-the-Tee  4.960    2       2       
3697  01378    David Frost  02567  SG: Off-the-Tee  4.960    2       2       
3698  01378    David Frost  02567  SG: Off-the-Tee  4.960    2       2       
3699  01378    David Frost  02567  SG: Off-the-Tee  4.960    2       2       

[3700 rows x 8 columns]
Sign up to request clarification or add additional context in comments.

Comments

1

You can do the same thing using json_normalize:

with urlopen('https://statdata.pgatour.com/r/021/2020/player_stats.json') as response:
    source = response.read()

data = json.loads(source)
df = pd.json_normalize(data,
                       record_path=['tournament', 'players', 'stats'],
                       meta=[['tournament', 'tournamentNumber'],
                       ['tournament', 'players', 'pid']])
print(df[['statId', 'name', 'tournament.players.pid', 'tournament.tournamentNumber', 'tValue']])


     statId                    name tournament.players.pid tournament.tournamentNumber   tValue
0       106                  Eagles                  40026                         021        0
1       107                 Birdies                  40026                         021       22
2       523                    Pars                  40026                         021       44
3       184                  Bogeys                  40026                         021        5
4       520                 Doubles                  40026                         021        1
...     ...                     ...                    ...                         ...      ...
3695  02569    SG: Around-the-Green                  01378                         021   -1.131
3696  02568  SG: Approach-the-Green                  01378                         021   -8.661
3697  02567         SG: Off-the-Tee                  01378                         021   -6.391
3698  02674        SG: Tee-to-Green                  01378                         021  -16.183
3699  02675               SG: Total                  01378                         021  -15.432

Comments

1

Here you go buddy, although you are reading all the data, you are not storing it as a list, you are still trying to import data (as a json) into the dataframe and this won't work. I have created some lists so that you can store the values individually and then insert as columns, please check if I understood the problem correctly, below the code.

import json
from urllib.request import urlopen
import pandas as pd

tid_list = []
pid_list = []
stats_list = []
stats_id_list = []
name_list = []
tValue_list = []
for n in range(20,22,1):
    if n < 10:
        week = '00'+str(n)
    else:
        week = '0'+str(n)

    with urlopen('https://statdata.pgatour.com/r/'+week+'/2020/player_stats.json') as response:
        source = response.read()

    data = json.loads(source)
    tid = data['tournament']['tournamentNumber']

    for item in data['tournament']['players']:
        try:
            pid = item['pid']
            tid_list.append(tid)
            pid_list.append(pid)
            stats = item['stats']
            i=0
            for stat in stats:
                if i > 0:
                    pid_list.append(pid)
                    tid_list.append(tid)
                statId = stat['statId']
                stats_id_list.append(statId)
                name = stat['name']
                name_list.append(name)
                tValue = stat['tValue']
                tValue_list.append(tValue)
                i+=1
                print(tid, pid, statId, name, tValue)
        except Exception as e:

            break

#print(data)
df = pd.DataFrame(data={'tid':tid_list,'pid':pid_list,'statsId':stats_id_list,'name':name_list,'tValue':tValue_list})#, columns = ['tid', 'pid', 'statId', 'name', 'tValue'])
print(df)

output:

      tid    pid statsId                    name   tValue
0     021  40026     106                  Eagles        0
1     021  40026     107                 Birdies       22
2     021  40026     523                    Pars       44
3     021  40026     184                  Bogeys        5
4     021  40026     520                 Doubles        1

2 Comments

Thanks T. Novais much appreciated, the URL changes by week (021) as shown, could i use your approach with a variable in the URL and range ? 'statdata.pgatour.com/r{}/2020/player_stats.json'
yes @rug, I have edited the code so that you can set a range (I have set from week 20 until 21) and it will loop through it and fill the dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.