3

I am trying to figure out the pattern that will get words from a string. Say for instance my string is:

string text = "HI/how.are.3.a.d.you.&/{}today 2z3";

I tried to eliminate anything under 1 letter or number but it doesn't work:

Regex.Split(s, @"\b\w{1,1}\b");

I also tried this:

Regex.Splits(text, @"\W+"); 

But it outputs:

"HI how are a d you today"

I just want to get all the words so that my final string is:

"HI how are you today"

2
  • 1
    By solving this you aren't getting the list of all words, just all 2+ letter words. You will exclude single letter words like 'a', 'I', etc. Is that intended? Commented Oct 17, 2011 at 4:48
  • Agree with Gibron. Single letter words are valid words. Commented Oct 17, 2011 at 5:09

1 Answer 1

4

To get all words that are at least 2 characters long you can use this pattern: \b[a-zA-Z]{2,}\b.

string text = "HI/how.are.3.a.d.you.&/{}today 2z3";
var matches = Regex.Matches(text, @"\b[a-zA-Z]{2,}\b");
string result = String.Join(" ", matches.Cast<Match>().Select(m => m.Value));
Console.WriteLine(result);

As others have pointed out in the comments, "A" and "I" are valid words. In case you decide to match those you can use this pattern instead:

var matches = Regex.Matches(text, @"\b(?:[a-z]{2,}|[ai])\b",
                            RegexOptions.IgnoreCase);

In both patterns I've used \b to match word-boundaries. If you have input such as "1abc2" then "abc" wouldn't be matched. If you want it to be matched then remove the \b metacharacters. Doing so from the first pattern is straightforward. The second pattern would change to [a-z]{2,}|[ai].

Sign up to request clarification or add additional context in comments.

4 Comments

Do you actually need \b's in there?
@liho1eye I don't need them for the given input, so perhaps they can be omitted. If the OP has some input such as "1abc2" the current pattern would ignore the "abc" word. If that's not desired, then the \b metacharacters can be removed in order to match that.
I guess there is an ongoing confusion about what word is. To me "1abc2" is a valid word, though maybe not for the OP.
@liho1eye on the one hand the OP was using \w, which would include numbers, however in the final output the OP excludes "2z3" which would've been a word. Going off of that I don't think "1abc2" would qualify.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.