Currently I am working on my project at school and I have a bit extraordinary task. My job is to scrape the data from a certain page on the facebook put that into learning model, where it should have 1 input as List and output as Int32.
Firstly, let me briefly explain algorithms I already designed:
- Scraped the data
- Stemmed it
- Removed capitalization, punctuation, emojis and spaces
- Merged words with the same root
- Counted occurrence of words and assigned count value to every word
- Performed tf-idf calculation to extract weights of every word in every post
Now, I have a
Dictionary<String,List<double[],int>>, which represents
postId:[wordWeights],amountOfLikes as
23425234_35242352:[0.027,0.031,0.009,0.01233],89
I have to train my model with different posts and their likes. For this purpose, have chosen to use Accord.NET library on C# and so far analyzed their Simple Linear Regression Class.
Firstly, I saw that I can use OrdinaryLeastSqure and feed it with possible inputs and ouputs as
double[] input = {0.123,0.23,0.09}
double[] output = {98,0,0}
OrdinaryLeastSquares ols = new OrdinaryLeastSquares();
regression = ols.Learn(inputs, output);
As you can see number of inputs in array should match number of outputs, therefore, I fulfilled it with zeros. As a result, I got obvious wrong output. I cannot come up with a proper way of feeding my data to Linear Regression Class. I know that approach with fulfilling the array with zero's is wrong, but it is so far the only solution I came up with. I would appreciate if anyone tells me the way I should use regression in this case and helps in choosing a proper algorithm. Cheers!