2

I have the following string:

"(1)name1:content1(2)name2:content2(3)name3:content3...(n)namen:contentn"

what I want to do is to capture each of the name_i and content_i, how can I do this? I should mention that name_i is unknown. For example name1 could be "abc", name2 could be "xyz".

What I have tried:

String regex = "\\(\\d\\)(.*):(.*)(?=\\(\\d\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
  System.out.println(matcher.group(0);
  System.out.println(matcher.group(1);
  System.out.println(matcher.group(2);
}

But the results is not very good. I also tried matcher.mathes(), nothing will be returned.

2
  • 1
    You can use \(\d+\)(\w+):(\w+) Commented Aug 29, 2017 at 9:20
  • I suggest a less readable, but a more lenient and efficient regex: "\\(\\d+\\)([^:]+):([^(]*(?:\\((?!\\d+\\))[^(]*)*)" Commented Aug 29, 2017 at 10:25

3 Answers 3

1

You may use

String s = "(1)name1:content1(2)name2:content2(3)name3:content3...(4)namen:content4";
Pattern pattern = Pattern.compile("\\(\\d+\\)([^:]+):([^(]*(?:\\((?!\\d+\\))[^(]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2)); 
} 

See the Java demo

Details

  • \\(\\d+\\) - matches (x) substring where x is 1 or more digits
  • ([^:]+) - Group 1: one or more chars other than :
  • : - a colon
  • ([^(]*(?:\\((?!\\d+\\))[^(]*)*) - Group 2:
    • [^(]* - zero or more chars other than (
    • (?:\\((?!\\d+\\))[^(]*)* - zero or more sequences of:
      • \\((?!\\d+\\)) - a ( that is not followed with 1+ digits and )
      • [^(]* - 0+ chars other than (

See the regex demo.

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

This will work if your name and content does not include any non "word"-boundary characters:

public static void test(String input){
    String regexpp = "\\(\\d+\\)(\\w+):(\\w+)";
    Pattern p = Pattern.compile(regexpp);
    Matcher m = p.matcher(input);
    while(m.find()){
        System.out.println("Name: " + m.group(1));
        System.out.println("Content: " + m.group(2));
    }
}

Output:

Name: name1
Content: content1
Name: name2
Content: content2
Name: name3
Content: content3
Name: name99
Content: content99

3 Comments

Thanks for your reply. But name_i is unknown. For example name1 could be "abc", name2 could be "xyz".
You state the exact string in your question. It is probably better to provide a more representative example not to confuse others.
: doesn't need to escaped in regex
-1

Your expression matches greedily - your first group eats up the colon first so it won't be possible to match the entire expression. You can use non-greedy matching (using the question mark as in *?) to make your pattern match.

String regex = "\\(\\d\\)(.*?):(.*?)(?=\\(\\d\\))";

1 Comment

Thanks for you suggestion. It works better than mine. But still only the first group, i.e., name1, content1, is returned in the matcher.group().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.