I have a dataset that has a 'Country' column with number of instances for each country;
these can vary a lot, for example there are around 2000 rows for Japan and only 100 for Thailand.
I am OneHotEncoding this feature and I am trying to figure out the best way to filter the data to keep only the countries that have help improve the RMSE the most.
I have tried using feature importance and also tried to manually loop through the data removing one Country at the time but I was wondering if there was a better way to do this?
Thanks