3

I have the df:

lst_1 = ['November-2022', 'October-2022', 'September-2022', 'November-2022']
lst_2 = ['', '', '', '', '']
lst_3 = ['', '', '', '', '']
lst_4 = ['', '', '', '', '']

df1 = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3, lst_4)),
              columns =['Date_updated', 'November-2022', 'October-2022', 'September-2022'])

I want to write a function to have df2 like this:

lst_1 = ['November-2022', 'October-2022', 'September-2022', 'November-2022']
lst_2 = ['x', '', '', 'x', '']
lst_3 = ['', 'x', '', '', '']
lst_4 = ['', '', 'x', '', '']

df2 = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3, lst_4)),
              columns =['Date_updated', 'November-2022', 'October-2022', 'September-2022'])

When row['Date_updated'] == column names of df1, it will tick 'x' to that columns with the same name. I tried this, but not work:

df1['Date_Updated'] = df1['Date_Updated'].astype(str)
lst_date = list(df1['Date_Updated'].tolist())
def add_tick(r):
    for i in lst_date:
        if r['Date_Updated'] == r[i]:
            return 'X'
        continue
for i in lst_date:
    df1[i] = df1.apply(add_tick, axis = 1)

2 Answers 2

1

Use get_dummies with DataFrame.replace and DataFrame.update:

df1.update(pd.get_dummies(df1['Date_updated']).replace({1:'X', 0:''}))
print (df1)
     Date_updated November-2022 October-2022 September-2022
0   November-2022             X                            
1    October-2022                          X               
2  September-2022                                         X
3   November-2022             X       

Details:

print (pd.get_dummies(df1['Date_updated']))
   November-2022  October-2022  September-2022
0              1             0               0
1              0             1               0
2              0             0               1
3              1             0               0

Another idea without update datetimes columns:

df2 = df1[['Date_updated']].join(pd.get_dummies(df1['Date_updated']).replace({1:'X', 0:''}))
print (df2)
     Date_updated November-2022 October-2022 September-2022
0   November-2022             X                            
1    October-2022                          X               
2  September-2022                                         X
3   November-2022             X                      
Sign up to request clarification or add additional context in comments.

Comments

1

Use pd.crosstab:

new_df = df.set_index('Date_updated')
output = pd.crosstab(index=new_df.index, columns=new_df.columns, values='x', 
                     aggfunc=lambda x: x).fillna('')

Output:

col_0   November-2022   October-2022    September-2022
row_0           
November-2022   x                            
October-2022                 x               
September-2022                               x

1 Comment

thanks for your reply, but how about bigger data. I have fixed the above df1, df2. I tried your way, but not work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.