Save pandas dataframe to json by column value

Question

I have a dataframe with 10000 rows that look like below:

    import numpy as np
df=pd.DataFrame(np.array([['facebook', '15',"women tennis"], ['facebook', '20',"men basketball"], ['facebook', '30','club'],
                          ['apple', "10","vice president"], ['apple', "100",'swimming contest']]),columns=['firm','id','text'])

I'd like to save each firm into a separate JSON file. So the json file for Facebook looks like below, with the file name written as "firm.json" (e.g. facebook.json). The same will be for other firms, such as Apple.

Sorry, I am still a beginner to Pandas, is there a way to do so effectively?

if you have a large dataset like 10000 rows, you should try pandas manipulation to avoid for-loops and lambda functions — Gold79
– Gold79, Commented Aug 16, 2022 at 21:43

SomeDude · Accepted Answer · 2022-08-16 21:49:55Z

3

You can do:

json_cols = df.columns.drop('firm').tolist()
json_records = df.groupby('firm')[json_cols].apply(
                          lambda x:x.to_json(orient='records'))

Then for 'facebook':

facebook_json = json_records['facebook']

'[{"id":"15","text":"women tennis"},
  {"id":"20","text":"men basketball"},
  {"id":"30","text":"club"}]'

for 'apple':

apple_json = json_records['apple']

'[{"id":"10","text":"vice president"},{"id":"100","text":"swimming contest"}]'

Save all at once

for col, records in json_records.iteritems():
    with open(f"{col}.json", "w") as file:
        file.write(records)

edited Aug 16, 2022 at 21:49

answered Aug 16, 2022 at 21:30

SomeDude

14.3k5 gold badges26 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Michael Hodel · Accepted Answer · 2022-08-16 21:23:32Z

1

import pandas as pd
import numpy as np
import json

df = pd.DataFrame(
    np.array([
        ['facebook', '15',"women tennis"],
        ['facebook', '20',"men basketball"],
        ['facebook', '30','club'],
        ['apple', "10","vice president"],
        ['apple', "100",'swimming contest']]
    ), columns=['firm','id','text']
)

firms = set(df.firm)
for firm in firms:
    df_firm = df[df.firm == firm]
    d = []
    for _, r in df_firm.iterrows():
        d.append({'id': r.id, 'Text': str(r.text)})
    with open(f'{firm}.json', 'w') as f:
        json.dump(d, f)

I'm sure there's a simpler way, but thats one way.

answered Aug 16, 2022 at 21:23

Michael Hodel

3,0381 gold badge8 silver badges12 bronze badges

2 Comments

SomeDude Over a year ago

Do not use iterrows() unless required. Iterating rows in dataframe is slow.

Michael Hodel Over a year ago

As long as runtime is not crucial I do not see any issue using it. OP did not specify any such constraints.

Adrian Ang · Accepted Answer · 2022-08-16 21:34:12Z

1

Here's one way to do it:

import json
import numpy as np

df=pd.DataFrame(np.array([['facebook', '15',"women tennis"], ['facebook', '20',"men basketball"], ['facebook', '30','club'],
                          ['apple', "10","vice president"], ['apple', "100",'swimming contest']]),columns=['firm','id','text'])

for firm in set(df['firm']):
    f = open(firm + '.json', 'w')
    f.write(json.dumps(list(df[df['firm']==firm][['id', 'text']].T.to_dict().values())))
    f.close()

Output:

apple.json
[{"id": "10", "text": "vice president"}, {"id": "100", "text": "swimming contest"}]

facebook.json
[{"id": "15", "text": "women tennis"}, {"id": "20", "text": "men basketball"}, {"id": "30", "text": "club"}]

answered Aug 16, 2022 at 21:34

Adrian Ang

5806 silver badges12 bronze badges

Collectives™ on Stack Overflow

Save pandas dataframe to json by column value

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related