Pandas : Create new column based on text values of other columns

Question

My dataframe looks like this:

    id          text                    labels
0   447         glutamine synthetase    [protein]
1   447         GS                      [protein]
2   447         hepatoma                [indication]
3   447         NaN                      NaN
4   442         Metachromatic           [indication]

I want to transform the dataframe and create two new columns named proteins and indications that contain the text when labels is protein or indication for the same id.

Wanted output

    id          protein                     indication
0   447         glutamine synthetase, GS    hepatoma
0   442         NaN                         Metachromatic

Can someone help how to do this?

Mayank Porwal · Accepted Answer · 2022-05-13 14:01:06Z

1

Use df.explode with Groupby.agg and df.pivot:

In [417]: out = df.explode('labels').groupby(['id', 'labels'])['text'].agg(','.join).reset_index().pivot('id', 'labels').reset_index().droplevel(0, axis=1).rename_axis(None, axis=1)

In [423]: out.columns = ['id', 'indication', 'protein']

In [424]: out
Out[424]: 
    id     indication                  protein
0  442  Metachromatic                      NaN
1  447       hepatoma  glutamine synthetase,GS

edited May 13, 2022 at 14:01

answered May 13, 2022 at 13:55

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas : Create new column based on text values of other columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related