0

As an input I get a string like "[email protected]". All parts are variable. Only rules are that the number of digits in front is always 10 and then there is "@". I need a regex which will allow me to extract "12345" (i.e. digits from positions 2 to 6) and "site.com" substrings. For example, in above case the result could be either "12345site.com" or "12345:site.com". Can it be done with a single regular expression? How can we skip first digit and digits from positions 7 to 10 and '@'? Examples in Java will be appreciated.

6
  • What language do you use? Commented Feb 21, 2015 at 12:44
  • I use Java. But I thought it is not so important in the context of regex related question. Commented Feb 21, 2015 at 12:54
  • Absolutly not! Different languages use different regex flavors and implementations. Commented Feb 21, 2015 at 12:59
  • Ok. I added remark about Java. Commented Feb 21, 2015 at 13:07
  • Since the index and positions are known? Why do you still need a regex for this? Can substring not be used? Commented Feb 21, 2015 at 13:13

2 Answers 2

1

If i understood you correctly, this regex will do

\d(\d{5})\d{4}@(.+)

and then use

matcher.group(1) + matcher.group(2)

to concatenate the groups.

Java code:

public static void main(String[] args) {
    String s = "[email protected]";
    String patternString = "\\d(\\d{5})\\d{4}@(.+)";
    Pattern pattern = Pattern.compile(patternString);
    Matcher matcher = pattern.matcher(s);
    if (matcher.matches()) {
        System.out.println(matcher.group(1) + matcher.group(2));
        // shows "12345site.com"
    }
}
Sign up to request clarification or add additional context in comments.

8 Comments

Will that translate in Java to "\\d(\\d{5})\\d{4}@(.+)"? If so, then it results in one matching group "[email protected]"
Ok. Actually it works. "[email protected]" was group(0).. There are also group(1) and group(2) with the right values.
How would you change the code if you don't know beforehand the number of matching groups?
You should always know the number of matching groups as soon as your patternString is final. If it's not, i would use "for" cycle to dash through them. Or did you mean something else?
Ok. I can use groupCount() and iterate over all groups to concatenate all results.
|
0

Specifically for your input pattern:

\d{1}(\d{5})\d*@(.*)

2 capturing groups: 
   group 1: (\d{5})
   group 2: (.*)

Input: [email protected]
Output: 12345
        site.com

3 Comments

If in Java I try pattern = "\\d(\\d{5})\\d*@(.*)" then the output will be a complete input string
Yes. It works. I just should not use group(0). But rather group(1) and group(2).
@Max Okay. Initially you did not mention it's for use with Java. Glad you figured it out :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.