0

My dataframe looks like this:

    id          text                    labels
0   447         glutamine synthetase    [protein]
1   447         GS                      [protein]
2   447         hepatoma                [indication]
3   447         NaN                      NaN
4   442         Metachromatic           [indication]

I want to transform the dataframe and create two new columns named proteins and indications that contain the text when labels is protein or indication for the same id.

Wanted output

    id          protein                     indication
0   447         glutamine synthetase, GS    hepatoma
0   442         NaN                         Metachromatic

Can someone help how to do this?

1 Answer 1

1

Use df.explode with Groupby.agg and df.pivot:

In [417]: out = df.explode('labels').groupby(['id', 'labels'])['text'].agg(','.join).reset_index().pivot('id', 'labels').reset_index().droplevel(0, axis=1).rename_axis(None, axis=1)

In [423]: out.columns = ['id', 'indication', 'protein']

In [424]: out
Out[424]: 
    id     indication                  protein
0  442  Metachromatic                      NaN
1  447       hepatoma  glutamine synthetase,GS
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.