I am using python with dask to create a logistic regression model, In order to speed up things when training.
I have x that is the feature array (numpy array) and y that is a label vector.
edit: The numpy arrays are: x_train (n*m size) array of floats and the y_train is (n*1) vector of integers that are labels for the training. both suits well into sklearn LogisticRegression.fit and working fine there.
I tried to use this code to create a pandas df then converting it to dask ddf and training on it like shown here
from dask_ml.linear_model import LogisticRegression
from dask import dataframe as dd
df["label"] = y_train
sd = dd.from_pandas(df, npartitions=3)
lr = LogisticRegression(fit_intercept=False)
lr.fit(sd, sd["label"])
But getting an error
Could not find signature for add_intercept:
I found this issue on Gitgub
Explaining to use this code instead
from dask_ml.linear_model import LogisticRegression
from dask import dataframe as dd
df["label"] = y_train
sd = dd.from_pandas(df, npartitions=3)
lr = LogisticRegression(fit_intercept=False)
lr.fit(sd.values, sd["label"])
But I get this error
ValueError: Multiple constant columns detected!
How can I use dask to train a logistic regression over data originated from a numpy array?
Thanks.
y_train? How is the correspondingxvalues created? Please mention that as it is confusing.