1

From an API, I get information about who has filled a particular form and when they have done it as a json object. I get the below data from 2 forms formid = ["61438732", "48247759dc"]. The json object is stored in the r_json_object variable.

r_json_online = {
'results': [
{'submittedAt': 1669963478503,
   'values': [{'email': '[email protected]'}]},
{'submittedAt': 1669963259737,
   'values': [{'email': '[email protected]'}]},
{'submittedAt': 1669963165956,
   'values': [{'email': '[email protected]'}]}
]
}

I have used the json_normalize function to de-nest the json object and insert the values into a DataFrame called form_submissions. This is the code I have used

import pandas as pd
from pandas import json_normalize
    
submissions = []

formid = ["61438732", "48247759dc"]

for i in range(0,len(formid)):
    
    submissions.extend(r_json_online["results"])

    form_submissions = pd.DataFrame()

    for j in submissions:
        form_submissions = form_submissions.append(json_normalize(j["values"]))
        form_submissions = form_submissions.append({'createdOn': j["submittedAt"]}, ignore_index=True)
        form_submissions = form_submissions.append({'formid': formid[i]}, ignore_index=True)
                 
form_submissions['createdOn'] = form_submissions['createdOn'].fillna(method = 'bfill')
form_submissions['formid'] = form_submissions['formid'].fillna(method = 'bfill')
form_submissions = form_submissions.dropna(subset= 'email')

Code explanation:

  1. I have created an empty list called submissions

  2. For each value in the formid list, I'm running the for loop.

  3. In the for loop:

    a. I have added data to the submissions list
    b. Created an empty DataFrame, normalized the json object and appended the values to the DataFrame from each element in the submissions list

Expected Output:

  1. I wanted the first 3 rows to have formid = '61438732'
  2. The next 3 rows should have the formid = '48247759dc'

enter image description here

Actual Output:

  1. The formid is the same for all the rows

enter image description here

2

2 Answers 2

0

The problem is you are using this line " form_submissions = pd.DataFrame()" in the loop which reset your dataframe each time

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! It worked when I moved it outside the loop
No probs, please mark the answer as accepted so people can see you don't need more answers
0

This can easily attained by converting into two dataframes and doing cartesian product/cross merge on between both.

formids = ["61438732", "48247759dc"]
form_submissions_df=json_normalize(r_json_online['results'], record_path=['values'], meta=['submittedAt'])

# converting form_ids list to dataframe
form_ids_df = pd.DataFrame (formids, columns = ['form_id'])

# cross merge for cartesian product result 
form_submissions_df.merge(form_ids_df, how="cross")

Results

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.