0

Why in this code do i have to repeat the regex pattern 3 times to find 3 separate numbers? I would like to only use ".*(\\d{10}+).*" to find all the numbers in the string word but i am having to repeat it 3 times why is this what am i doing wrong?

    public static void main (String [] args){

    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+","");

    Pattern pat = Pattern.compile(".*(\\d{10}+).*"+".*(\\d{10}+).*"+".*(\\d{10}+).*");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        for (int i = 1; i <= mat.groupCount(); i++) {
            System.out.println(mat.group(i));
        }
    } 

}
3
  • What exactly are you trying to match ?? Commented Mar 6, 2017 at 1:37
  • the 3, 10 numbered strings in the String "word" Commented Mar 6, 2017 at 1:39
  • @SrikanthA The code works but i want to know why i have to print the regex code .*(\\d{10}+).* 3 times when i iterate through the groups it should just print them all out right? Commented Mar 6, 2017 at 1:44

3 Answers 3

1

This is because .* is a greedy pattern (see Regex Quantifiers), meaning that it will try to eat as much as possible from the string while still getting a match. So in your case, it will capture all the numbers except the last one.

In order to fix this, you should get rid of the match all pattern .*, since find will already get you all the matches with anything in between.

So using just (\\d{10}) should work.

public static void main (String [] args){
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+","");

    Pattern pat = Pattern.compile("(\\d{10})");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        for (int i = 1; i <= mat.groupCount(); i++) {
            System.out.println(mat.group(i));
        }
    }
}
Sign up to request clarification or add additional context in comments.

4 Comments

Thankyou very much this was very helpful
Greediness has nothing to do with the problem
OP's pattern matches all 3 numbers, greedy or not. Greedy will backtrack until the entire pattern matches. If the groups were optional, then it would make a difference, but they're not optional: You will never fail to find a match using greedy - it might not be the match you want. In this case, there is no choice as to what to match given the input has 3 numbers and the pattern also has 3.
Agreed that the 3 numbers pattern should work fine, greedy or not. But the OP's question is why they can't use a single pattern like ".*(\\d{10}+).*" instead of the 3 numbers one. I believe the answer fixes their issue, and it's already accepted.
1

@Hesham Attia's answer is simple enough to resolve your problem, just a little bit more explanation about how it works differently to your original pattern.

Let's add the index i to the matched group to the code:

    public static void main(String[] args) throws IOException {
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+", "");

    Pattern pat = Pattern.compile("(\\d{10})");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        for (int i = 1; i <= mat.groupCount(); i++) {
            System.out.println("Group-" + i + ": " + mat.group(i));
        }
    }
}

and you'll get the result:

Group-1: 0546105610

Group-1: 4515189675

Group-1: 5467892541

And the result of your pattern is:

Group-1: 0546105610

Group-2: 4515189675

Group-3: 5467892541

Actually the above code with new pattern "(\\d{10})" is equivalent to the following:

    public static void main(String[] args) throws IOException {
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+", "");

    Pattern pat = Pattern.compile("\\d{10}");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        System.out.println(mat.group());
    }
}

If you refer to the javadoc of Matcher.find(), Matcher.group(), Matcher.groupCount(), you'll find out method Matcher.find() try to find the next matched substring of given pattern, Matcher.group() returns the previous match, and Matcher.groupCount() does not include the entire match(which is group 0), only the capturing groups specified in your pattern.

Simply speaking, the way regex engine works is that it will walk through your pattern with the subject subsequence and try to match as much as possible(greedy mode), now let's talk about the differences between those patterns:

  1. Your original pattern: ".*(\\d{10}+).*"+".*(\\d{10}+).*"+".*(\\d{10}+).*" and why you need repeat it three times

    If only ".*(\\d{10}+).*" is given, the pattern will match the whole string, the matching parts is:

    • "Somerandommobilenumbers" matches heading .*
    • "0546105610" matches \\d{10}+ and goes to group 1
    • ",4515189675,5467892541" matches tailing .*

    The entire string has already been used for the first attempt and there's nothing left for the pattern to match again, you just have no way to extract the 2nd and 3rd number out, so you need to repeat your pattern to put them into separated groups.

  2. Pattern "(\\d{10})":

    It'll match one number sequence each time you call mat.find(), put it into group 1 and return, then you can extract the result from group 1, that's why the index of group is always 1

  3. Pattern "\\d{10}":

    The same with Pattern 2, but will not put the matching result to the group 1, so you can get the result from mat.group() directly, actually it's group 0.

Comments

-1

Your real problem is you are using Pattern, which is error prone because it requires lots of code; here's how you do it in one 1 simple line:

String[] numbers = word.replaceAll("[^\\d,]", "").split(",");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.