1

Previously I followed the solution from this question , but then I realized that it's not the same with my case, I'd like to display some values for the same as_of_dates and ID in the display_rows section in the JSON file, I have a dataframe like this:

     as_of_date create_date   ID  value_1   count   value_3
0    02/03/2021 02/03/2021  12345   5         2      55
1    02/03/2021 01/03/2021  12345   8         2      55
2    02/03/2021 01/03/2021  34567   9         1      66
3    02/03/2021 02/03/2021  78945   9         1      77
4    03/03/2021 02/03/2021  78945   9         1      22
5    03/03/2021 02/03/2021  12345   5         1      33

where count column is the number of the rows for the same ID & as_of_date, for example, for as_of_date=02/03/2021 and ID=12345,there are two rows (each row has different create_date but I don't care about the create_date), so count for the first two rows are the same: 2.

The expected JSON is:

{
    "examples": [
        {
            "Id": 12345,
            "as_of_date": "2021-03-02 00:00:00", # this field is datetime format
            "value_3": 55, 
            "count": 2,    # for the same 'ID=12345'&'as_of_date=02/03/2021'
            "display_rows": [
                {
                    "value_1": 5,
                    "type": "int" # 'type' field will always be 'int'
                },
                {
                    "value_1": 8,
                    "type": "int"
                }
            ]
        },
        {
            "Id": 34567,
            "as_of_date": "2021-03-02 00:00:00",
            "value_3": 66,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int"
                }
            ]
        },
        {
            "Id": 78945,
            "as_of_date": "2021-03-02 00:00:00",
            "value_3": 77,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int" 
                }
            ]
        },
        {
            "Id": 78945,
            "as_of_date": "2021-03-03 00:00:00",
            "value_3": 22,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int" 
                }
            ]
        },
        {
            "Id": 12345,
            "as_of_date": "2021-03-03 00:00:00",
            "value_3": 33,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 5,
                    "type": "int" 
                }
            ]
        }
    ]
}

It took me almost a whole day to figure out, but seems not working... Can someone help please? Thanks.

1 Answer 1

1

Use GroupBy.apply with lambda function for processing value_1 column like:

import json

df['as_of_date'] = pd.to_datetime(df['as_of_date'], dayfirst=True, errors='coerce')


f = lambda x: [ {"value_1": y, "type": "int" } for y in x]
df = (df.groupby(['as_of_date','ID','value_3','count'])['value_1']
        .apply(f)
        .reset_index(name='display_rows'))
print (df)
  as_of_date     ID  value_3  count  \
0 2021-03-02  12345       55      2   
1 2021-03-02  34567       66      1   
2 2021-03-02  78945       77      1   
3 2021-03-03  12345       33      1   
4 2021-03-03  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'type': 'int'}, {'value_1': 8,...  
1                    [{'value_1': 9, 'type': 'int'}]  
2                    [{'value_1': 9, 'type': 'int'}]  
3                    [{'value_1': 5, 'type': 'int'}]  
4                    [{'value_1': 9, 'type': 'int'}]  

j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)

print (j)
{"examples": [{"as_of_date": "2021-03-02 00:00:00", "ID": 12345, "value_3": 55, "count": 2, "display_rows": [{"value_1": 5, "type": "int"}, {"value_1": 8, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 34567, "value_3": 66, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 78945, "value_3": 77, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 12345, "value_3": 33, "count": 1, "display_rows": [{"value_1": 5, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 78945, "value_3": 22, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}]}

EDIT:

#added some another column
df['value_7'] = 52
print (df)
   as_of_date create_date     ID  value_1  count  value_3  value_7
0  02/03/2021  02/03/2021  12345        5      2       55       52
1  02/03/2021  01/03/2021  12345        8      2       55       52
2  02/03/2021  01/03/2021  34567        9      1       66       52
3  02/03/2021  02/03/2021  78945        9      1       77       52
4  03/03/2021  02/03/2021  78945        9      1       22       52
5  03/03/2021  02/03/2021  12345        5      1       33       52

#added type column for last value in dict
df = (df.assign(type='int')
        .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','type']]
        .apply(lambda x:  x.to_dict('records'))
        .reset_index(name='display_rows'))
print (df)
   as_of_date     ID  value_3  count  \
0  02/03/2021  12345       55      2   
1  02/03/2021  34567       66      1   
2  02/03/2021  78945       77      1   
3  03/03/2021  12345       33      1   
4  03/03/2021  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'value_7': 52, 'type': 'int'},...  
1     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  
2     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  
3     [{'value_1': 5, 'value_7': 52, 'type': 'int'}]  
4     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  

j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)

EDIT:

df = (df.assign(example_placeholder='xyz')
        .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','example_placeholder']]
        .apply(lambda x:  x.to_dict('records'))
        .reset_index(name='display_rows'))
print (df)
   as_of_date     ID  value_3  count  \
0  02/03/2021  12345       55      2   
1  02/03/2021  34567       66      1   
2  02/03/2021  78945       77      1   
3  03/03/2021  12345       33      1   
4  03/03/2021  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'value_7': 52, 'example_placeh...  
1  [{'value_1': 9, 'value_7': 52, 'example_placeh...  
2  [{'value_1': 9, 'value_7': 52, 'example_placeh...  
3  [{'value_1': 5, 'value_7': 52, 'example_placeh...  
4  [{'value_1': 9, 'value_7': 52, 'example_placeh...  

df = (df.assign(aa='xyz', type='int')
        .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','aa', 'type']]
        .apply(lambda x:  x.to_dict('records'))
        .reset_index(name='display_rows'))
print (df)

   as_of_date     ID  value_3  count  \
0  02/03/2021  12345       55      2   
1  02/03/2021  34567       66      1   
2  02/03/2021  78945       77      1   
3  03/03/2021  12345       33      1   
4  03/03/2021  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...  
1  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
2  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
3  [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...  
4  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
Sign up to request clarification or add additional context in comments.

12 Comments

Hi thanks, just a quick follow-up question, what if I have multiple columns like value_1 needs to be added in section display_rows? I tried:df = (df.groupby(['as_of_date','ID','value_3','count'])['value_1', 'value_7', 'value_8'] .apply(lambda x: [ {"value_1": a,'value_7' : b, 'value_8': c, "type": "int" } for a, b, c in x]) .reset_index(name='display_rows')) This gave me error .apply(lambda x: [{ ValueError: too many values to unpack (expected 3)
Thanks, it seems not working if I change 'type='int' to some other strings, like .assign(example_placeholder = 'xyz'
@Cecilia - example_placeholder is added after groupby like [['value_1', 'value_7','example_placeholder']] ?
I know the issue, I should use records instead of record, a silly mistake.
@Cecilia - Never do it, so no idea. Maybe try looking for some ssolution or post question for this problem.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.