sklearn MeanShift different prediction output

Question

Why do I get different outputs from line

print ms.fit_predict(val), p_all[idx]

The outputs from ms.fit_predict(val) are all 0.

import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth

X = 100*np.random.random_sample((500,15))-100
X = np.array(X, dtype=np.float)
bandwidth = estimate_bandwidth(X, quantile=0.01)
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
p_all = ms.fit_predict(X)
for idx, val in enumerate(X):
    print val
    print X[idx]
    print ms.fit_predict(val), p_all[idx]

eqzx · Accepted Answer · 2015-10-08 14:18:31Z

1

You're re-fitting inside a loop, to a single data point. p_all is the cluster assignments of fitting to all of the data in X. After the loop runs, if you print out ms.cluster_centers_, it will be the same as val, because it is fit only to val. Thus, there will be only a single cluster, and its index is 0.

I'm guessing you're confused about the interface to MeanShift. It doesn't refit online. Every time you call fit() or fit_predict(), it fits only to the data you pass it, and ignores the old solution.

I'd suggest having a look at the sklearn MeanShift documentation

answered Oct 8, 2015 at 14:18

eqzx

5,6394 gold badges42 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user200340 Over a year ago

Thanks, I really need to use scikit-learn.org/stable/modules/generated/… in order to get the same output.

Collectives™ on Stack Overflow

sklearn MeanShift different prediction output

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related