I would like to be use GridSearchCV to determine the parameters of a classifier, and using pipelines seems like a good option.
The application will be for image classification using Bag-of-Word features, but the issue is that there is a different logical pipeline depending on whether training or test examples are used.
For each training set, KMeans must run to produce a vocabulary that will be used for testing, but for test data no KMeans process is run.
I cannot see how it is possible to specify this difference in behavior for a pipeline.