0

I am new to python. I have a pandas dataframe as follows (with 4 columns) in Python version 3.7.4:

df = pd.DataFrame({'Patient_Key': [2333836, 2319735],
                   'DX1': ["N184", 'Z6827'],
                   'DX2': ['D649', 'N184'],
                   'DX3': ['E785', 'I10']})
   Patient_Key    DX1   DX2   DX3
0      2333836   N184  D649  E785
1      2319735  Z6827  N184   I10

How to we convert this to a new dataframe with only 2 columns?

-- Expected Conversion
2333836, ["N184", "D649", "E785"]
2319735, ["Z6827", "N184", "I10"]
1
  • 2
    You want to convert to a column of arrays or to a column of lists? Commented Jul 23, 2021 at 18:28

1 Answer 1

4

Filter DX columns and convert each row to a list with apply:

df[['Patient_Key']].join(
  df.filter(regex='DX').apply(pd.Series.tolist, 1).rename('DX')
)

   Patient_Key                  DX
0      2333836  [N184, D649, E785]
1      2319735  [Z6827, N184, I10]

Or convert DX columns sub dataframe to a list and then assign it to a columnn:

df['DX'] = df.filter(regex='DX').values.tolist()
df[['Patient_Key', 'DX']]

   Patient_Key                  DX
0      2333836  [N184, D649, E785]
1      2319735  [Z6827, N184, I10]
Sign up to request clarification or add additional context in comments.

7 Comments

You can directly use list instead of pd.Series.tolist.
@Ch3steR Yep. list works too and is more concise here.
It could be a display issue. Do you need the column as string type or you need it to be the format when writing to csv ?
If you need it as a string, then you can json.dumps the list, I believe. Something like: import json; df.filter(regex='DX').apply(lambda s: json.dumps(s.to_list()), 1)
@LCJ Then you can try df['DX'] = df['DX'].astype('str') to make it a string.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.