Flatten list of dict (array) as new columns not working with json normalize

Question

Apologies as I'm quite inexperienced with Python and looking for assistance here 🙏

I am trying to flatten a list of dicts into new columns in an existing dataframe, with using key as column names and values as cell content. Now json_normalize is successful in unwrapping the data, but fails to place dict*value*1 as column header, and dict*value*2 as cell content. In the examples I saw in other posts, it did work, but somehow not for me.

The column originally looks like this :

View of the column (1D array) before unwrapping

What I obtain is something like this:

View after unwrapping with json_normalize:

View after unwrapping with json_normalize

Obviously what I'd like to achieve is something like this:

Desired Outpu:

Desired Output

I have tried pd.DataFrame(form_records(df\["fields"\]), and also tried pd.json_normalize(df\["fields"\]) which both came to the same output. The returned dataframe has a dict in each cell which is not what I am trying to get. What I'd like is "id" as column name, and "value" as cell content for every new column created.

Eventually I tried after json_normalize to extract the "ids" to rename my columns, like so :

#extracting row 1
my_list = fields_values.iloc[:1].values.flatten().tolist()

# getting all values under the "id" key 
ids = [d['id'] for d in my_list]

#extracting all current column names 
col_names = fields_values.columns.tolist()

#zipping the extract "ids" with the current col names 
new_col_names = dict(zip(col_names, ids))

#renaming the columns with the ids 
fields_values.rename(columns=new_col_names, inplace=True)

This eventually gave me the desired col names, but I still have the cell content with the dicts instead of only the value under "value" ... So I'm not sure if this is the best practice :

Correct col names but incorrect cell content:

correct col names but incorrect cell content

Does any one has any advice on this? Is there something I'm missing in the json_normalize function arguments, or shall I continue with extracting per row/col the dict values and replace in each cell (I find this to be such a pain!)? Isn't there a more efficient way to do this?

Bushmaster · Accepted Answer · 2024-02-05 15:08:42Z

1

You can use explode() to convert list elements to new rows. Then use pd.Series to convert dict type values to new columns. Then use transpose to transpose index and column:

df = df.explode("fields")["fields"].apply(pd.Series)
df = df.groupby("id")["value"].apply(list).to_frame().T
df = df.explode(df.columns.tolist(),ignore_index=True)

Example:

df = pd.DataFrame({"fields":[[{"id":2753537689,"value":"abc"},{"id":27537849,"value":123}],
                                  [{"id":2753537689,"value":"bcd"},{"id":27537849,"value":456}]]})
'''
    fields
0   [{'id': 2753537689, 'value': 'abc'}, {'id': 27537849, 'value': 123}]
1   [{'id': 2753537689, 'value': 'bcd'}, {'id': 27537849, 'value': 456}]

'''
df = df.explode("fields")["fields"].apply(pd.Series)
df = df.groupby("id")["value"].apply(list).to_frame().T
df = df.explode(df.columns.tolist(),ignore_index=True)

Out:

id    27537849   2753537689
0        123        abc
1        456        bcd

Note: explode() may cause performance or memory problems in large datasets.

edited Feb 5, 2024 at 15:08

answered Feb 5, 2024 at 11:39

Bushmaster

4,6364 gold badges11 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Fat_princess Over a year ago

Hello, thank you I tried the technique explained, but not the expected result. It returns a dataframe with 1 row and 173K columns, as it fails to agregate the identical fields (for instance id 27537689) as one single column with all the values within it, and instead creates additional columns with the same column name suffixed. I start with a table of 1000 rows with data related to 1000 different tickets (1 row = 1 ticket), my aim to keep it 1000 rows, but to have that "fields" columns unwrapped as new columns in the same table, to see if for one ticket specific fields were used.

Bushmaster Over a year ago

Ok, I edited my answer, can you check it?

Collectives™ on Stack Overflow

Flatten list of dict (array) as new columns not working with json normalize

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related