4

I use this to delete duplicated words (Notepad++ or Powergrep) (\b\w+\b)\W+\1. replace with ${1}. How can this be changed to find nonconsecutive duplicated words in one line and delete second duplicated word?

Example
word1, word2, word1, word3,
Result
word1, word2, word3,

Tried this but then it select both duplicated words and strings beetwen them.

(\b\w+\b)(.*?)\W+\1.
1
  • This can't be easily done with regex. You could do this ((\b\w+\b).*)\b\2\b repeatedly on the whole file until it finds no more duplicates, but it doesn't address any surrounding formatting. The other way is to split on whitespace, then recurse the array deleting dups, then rewrite the file. Commented Aug 31, 2015 at 19:15

1 Answer 1

5

Looking ahead is easier using lookahead rather than looking behind.

\b(\w+)\b\s*,\s*(?=.*\1)

You can use this and replace by empty string.See demo.

https://regex101.com/r/sS2dM8/24

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, it works. Another problem, is there a way to limit the words by lenght? I am having problem that this regex will selects parts of the word, like for example word.1, will select ".1" as a word. Tried \b(\w+){4,10}\b\s*,\s*(?=.*\1) to limit the words by length but then it selects whole text.
@Jim8645 \b(\w{4,10})\b\s*,\s*(?=.*\1) use this to limit the length
It works, thanks, I figured how to find only numeric words or only alphabetic words, \b(\w[0-9]{4,10})\b\s*,\s*(?=.*\1), \b(\w[a-z]{4,10})\b\s*,\s*(?=.*\1), but how to limit to find only links? for ex http://www.google.com/abc

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.