0

I have a classification and regression question on machine learning. First question, the following dataset http://it.tinypic.com/view.php?pic=oh3gj7&s=8#.VIjhRDGG_lF

Can we say, the data set is linearly separable? In order to apply a linear model for classi cation, a transformation of the input space is not needed for this dataset, or is not possible for this dataset? My answer is no, but I am not sure for the second, I am not sure a transformation is possible for the dataset.

Second question about regression probl: Give the following data set f : R -> R http://it.tinypic.com/view.php?pic=madsmr&s=8#.VIjhVjGG_lE

Can we say that : A linear model for regression can be used to learn the function associated to this data set ? Given this data set, it is not possible to determine an optimal con guration of the linear model?

I am reading the book of Tom Mitchell Machine learning, and Pattern Recognition and Machine Learning Bishop, but I still have trouble giving the right answer. Thanks in advance.

1 Answer 1

1

Neither of this datasets can be modeled using linear classification/regression.

In case of the "input data transfromation" if only dataset is consistent (there are no two exact same points with two different labels) there always exists transformation after which data is linearly separable. In particular one can construct it with:

phi(x) = 1 iff label of x is "1"

in other words, you map all positive samples to "1" and negatives to "0", so your data is now trivialy linearly separable. Or simply map your N points into N unit vectors in R^N space in such a way that i'th point is mapped to [0 0 0 ... 1 ... 0 0 0]^T where this "1" appears at i'th place. Such dataset is trivialy linearly separable for any labeling.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very much.I understand it now the first problem.as for the second question, is it possible to determine an optimal configuration of the linear model?
what is "optimal configuration of the linear model", there is no such generic term. Once you give an optimization criterion one can answer whether it is obtainable and in what complexity
Okay, so you say it cant be modeled using regression. Can you explain me why please? It is maybe beacuse there are two ore more points over X, lets say over 10000 in my example ? Sorry for my english
Its because the relation between variables is clearly non-linear, linear relation in R^2 is simply a straight line while you can clearly see that this relation is much more complex it looks like a 3rd degree polynomial

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.