0

Why do I get different outputs from line

print ms.fit_predict(val), p_all[idx]

The outputs from ms.fit_predict(val) are all 0.

import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth

X = 100*np.random.random_sample((500,15))-100
X = np.array(X, dtype=np.float)
bandwidth = estimate_bandwidth(X, quantile=0.01)
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
p_all = ms.fit_predict(X)
for idx, val in enumerate(X):
    print val
    print X[idx]
    print ms.fit_predict(val), p_all[idx]

1 Answer 1

1

You're re-fitting inside a loop, to a single data point. p_all is the cluster assignments of fitting to all of the data in X. After the loop runs, if you print out ms.cluster_centers_, it will be the same as val, because it is fit only to val. Thus, there will be only a single cluster, and its index is 0.

I'm guessing you're confused about the interface to MeanShift. It doesn't refit online. Every time you call fit() or fit_predict(), it fits only to the data you pass it, and ignores the old solution.

I'd suggest having a look at the sklearn MeanShift documentation

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I really need to use scikit-learn.org/stable/modules/generated/… in order to get the same output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.