1

I have a dataframe with some NaN values.

Here is a sample dataframe:

sample_df = pd.DataFrame([[1,np.nan,1],[2,2,np.nan], [np.nan, 3, 3], [4,4,4],[np.nan,np.nan,5], [6,np.nan,np.nan]])

It looks like:

enter image description here

What I did after to get a json:

sample_df.to_json(orient = 'records')

Which gives:

'[{"0":1.0,"1":null,"2":1.0},{"0":2.0,"1":2.0,"2":null},{"0":null,"1":3.0,"2":3.0},{"0":4.0,"1":4.0,"2":4.0},{"0":null,"1":null,"2":5.0},{"0":6.0,"1":null,"2":null}]'

I want to save this dataframe to a json with 2 rows in each json, but with none of the Nan values. Here is how I tried to do it:

df_dict = dict((n, sample_df.iloc[n:n+2, :]) for n in range(0, len(sample_df), 2))

for k, v in df_dict.items():
    print(k)
    print(v)
    for d in (v.to_dict('record')):
        for k,v in list(d.items()):
            if type(v)==float:
                if math.isnan(v):
                    del d[k]

json.dumps(df_dict)

Output I want:

'[{"0":1.0,"2":1.0},{"0":2.0,"1":2.0}]' -> in one .json file '[{"1":3.0,"2":3.0},{"0":4.0,"1":4.0,"2":4.0}]' -> in second .json file '[{"2":5.0},{"0":6.0}]' -> in third .json file

2
  • 1
    Dataframe + expected output will help a lot! Thanks. Commented Sep 12, 2017 at 0:39
  • @cᴏʟᴅsᴘᴇᴇᴅ Added! Sorry for not giving enough detail. Commented Sep 13, 2017 at 23:59

2 Answers 2

1

Use apply to drop NaNs, groupby to group and dfGroupBy.apply to JSONify.

s = sample_df.apply(lambda x: x.dropna().to_dict(), 1)\
        .groupby(sample_df.index // 2)\
        .apply(lambda x: x.to_json(orient='records'))
s    

0            [{"0":1.0,"2":1.0},{"0":2.0,"1":2.0}]
1    [{"1":3.0,"2":3.0},{"0":4.0,"1":4.0,"2":4.0}]
2                            [{"2":5.0},{"0":6.0}]
dtype: object

Finally, iterate over .values and save to separate JSON files.

import json
for i, j_data in enumerate(s.values):
    json.dump(j_data, open('File{}.json'.format(i + 1), 'w'))
Sign up to request clarification or add additional context in comments.

4 Comments

What if I changed the original dataframe's index to be a column with strings in the data and I wanted the same output? I get the error TypeError: cannot perform floordiv with this index type: <class 'pandas.core.indexes.base.Index'>.
@pr338 Use np.arange(df.shape[0]) // 2
Sorry, I wasn't clear. I meant the output with the index being a string like "indexhere" [{"fund.numeric.returnY3CategoryRank":0,"fund.... Going to edit original question with another example if this is still not clear.
@pr338 Ah, sorry... things are getting jumbled up. Can you ask a new question?
0

I suggest:

with open("data.json","w") as fpout:
    fpout.write("{\n")
    for row_id in range(sample_df.shape[0]):
        fpout.write("\t" + str(sample_df.index[row_id]) + ":" + sample_df.iloc[row_id].dropna().to_json(orient="index") + "\n")
    fpout.write("}\n")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.