I am trying to speed up the code implementation below and improve the performance, as I am working with a dataframe with 40k columns. And I need to apply the following function to all the columns of the dataframe.
def differencing(col,per=1):
df[f'{col}_d{per}'] = df[col].diff(periods = per)
df[f'{col}_d{per}'].fillna(0,inplace=True)
df[f'{col}_d{per}_ind'] = np.where(df[f'{col}_d{per}'] > 0 , 1, np.where(df[f'{col}_d{per}'] < 0, -1,0)) # 3 classes
for col in df.columns:
differencing(col,per=1)
I only know how to use a for loop to apply this function column by column. How can I speed this up ? Problem with apply is that the function is adding 2 new columns to the existing dataframe. This is where I am stuck.