I have a an array of strings like this (from Twitter):
String str= "The Green New Deal is viable. It is the same vision that FDR had for his New Deal programs: nationwide mobilization http://94739 #thegreendeal #nationwide"
What I want is to 1) turn this string into an array and 2) remove stop words and include stemming 3) remove all characters except for '#' which indicates a term is a hashtag.
So I have tried to use this cool library https://github.com/uttesh/exude which does stemming and removes stop words, and lowercases and removes characters. The problem is this removes the hashtags. Code for this:
String tweetString = ExudeData.getInstance().filterStoppingsKeepDuplicates(str);
I have also tried this:
String[] wordArray = str.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
But this also removes hashtags. Any workaround using either method to keep the hashtags? (I'd prefer to keep the exude library for this)