I have a Pandas data frame and wish to demean each of the numeric columns, leaving the categorical variable column entries unchanged. By "demean" I simply wish to subtract from each column entry the mean of all entries in the corresponding column.
The data frame comes 569 patients in the Wisconsin Breast Cancer directory, listing for each patient 10 various numeric measurements, along with a diagnosis of M (malignant) or B (benign).
import pandas as pd
df = pd.read_csv('data/UWbcd.csv')
%load_ext google.colab.data_table. #just for purposes of browsing the data
df - df.mean()
Using this method, the entries in each numeric column are demeaned fine, but the categorical variables,
df['Diagnosis']
all become NaN.
Is there an efficient way to leave categorical variables alone when demeaning?
df.apply(lambda s: s - s.mean() if (s.dtype == np.int or s.dtype == np.float) else s).