I need to convert a large dataframe to a numpy array. Preserving only numerical values and types. I know there are well documented ways to do so.
So, which one is to prefer?
df.values
df._as_matrix()
pd.to_numeric(df)
... others ...
Decision factor:
efficiency
safely operating on nan,np.nans, and other possible unexpected values
numerically stable
object. It seems that pandas readily switches toobjectto accommodate strings andnan(floats).numpyon the other hand usesobjectto handle sublists of varying size.