10

Using Javascript. (note there is a similar post, but the OP requested Java, this is for Javascript)

I'm trying to remove a list of words from an entire string without looping (preferably using Regular Expressions).

This is what I have so far, and it removes some of the words but not all of them. Can someone help identify what I'm doing wrong with my RegEx function?

   //Remove all instances of the words in the array
  var removeUselessWords = function(txt) {

	var uselessWordsArray = 
        [
          "a", "at", "be", "can", "cant", "could", "couldnt", 
          "do", "does", "how", "i", "in", "is", "many", "much", "of", 
          "on", "or", "should", "shouldnt", "so", "such", "the", 
          "them", "they", "to", "us",  "we", "what", "who", "why", 
          "with", "wont", "would", "wouldnt", "you"
        ];
			
	var expStr = uselessWordsArray.join(" | ");
	return txt.replace(new RegExp(expStr, 'gi'), ' ');
  }

  var str = "The person is going on a walk in the park. The person told us to do what we need to do in the park";
  
  console.log(removeUselessWords(str));

//The result should be: "person going walk park. person told need park."

3
  • 1
    Get rid of the whitespace around | for starters. Commented Apr 4, 2018 at 15:41
  • If I do that, then the function removes all characters instead of words. (ie: "walk" would be "wlk") Commented Apr 4, 2018 at 15:43
  • @Jared Smith, I retract my statement above, as RomanPerekhrest made use of your comment. Commented Apr 4, 2018 at 15:47

2 Answers 2

13

Three moments:

  • join array items with | without side spaces
  • enclose regex alternation group into parentheses (...|...)
  • specify word boundary \b to match a separate words

var removeUselessWords = function(txt) {
    var uselessWordsArray = 
        [
          "a", "at", "be", "can", "cant", "could", "couldnt", 
          "do", "does", "how", "i", "in", "is", "many", "much", "of", 
          "on", "or", "should", "shouldnt", "so", "such", "the", 
          "them", "they", "to", "us",  "we", "what", "who", "why", 
          "with", "wont", "would", "wouldnt", "you"
        ];
			
	  var expStr = uselessWordsArray.join("|");
	  return txt.replace(new RegExp('\\b(' + expStr + ')\\b', 'gi'), ' ')
                    .replace(/\s{2,}/g, ' ');
  }

var str = "The person is going on a walk in the park. The person told us to do what we need to do in the park";
  
console.log(removeUselessWords(str));

Sign up to request clarification or add additional context in comments.

3 Comments

Wow. This works. Thank you. What does the \\b imply?
This is even better!
You'll also want to add a .replace(/\s+\.(\s|$)/g, '.$1') at the end of all of that to clean up potential spaces before periods.
2

May be this is what you want:

   //Remove all instances of the words in the array
  var removeUselessWords = function(txt) {

	var uselessWordsArray = 
        [
          "a", "at", "be", "can", "cant", "could", "couldnt", 
          "do", "does", "how", "i", "in", "is", "many", "much", "of", 
          "on", "or", "should", "shouldnt", "so", "such", "the", 
          "them", "they", "to", "us",  "we", "what", "who", "why", 
          "with", "wont", "would", "wouldnt", "you"
        ];
			
	var expStr = uselessWordsArray.join("\\b|\\b");
	return txt.replace(new RegExp(expStr, 'gi'), '').trim().replace(/ +/g, ' ');
  }

  var str = "The person is going on a walk in the park. The person told us to do what we need to do in the park";
  
  console.log(removeUselessWords(str));

//The result should be: "person going walk park. person told need park."

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.