3

I have this java code

String msg = "*1*20*11*30*IGNORE*53*40##";
String regex = "\\*1\\*(.*?)\\*11\\*(.*?)\\*(.*?)\\*53\\*(.*?)##";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(msg);
if (matcher.find()) {
    for (int i = 0; i < matcher.groupCount(); i++) {
        System.out.println(matcher.group((i+1)));
    }
}

the output is

20
30
IGNORE
40

How do I have to change the regex, that the String which is IGNORE is ignored? I want to,that anything which is written there not to be found by the matcher. the positions where 20,30,40 is are values for me which I need to extract, IGNORE in my case is any protocol specific counter which has no need for me

3
  • 2
    You mean this: \\*1\\*(.*?)\\*11\\*(.*?)\\*(?:(?!IGNORE).)*?\\*53\\*(.*?)##? Commented Sep 24, 2015 at 11:00
  • 1
    or this \\*1\\*(.*?)\\*11\\*(.*?)\\*.*?\\*53\\*(.*?)## ? Commented Sep 24, 2015 at 11:01
  • 1
    Why do you place () around a part you don't need? Just remove those. Commented Sep 24, 2015 at 11:02

3 Answers 3

1

Always ignore the 3rd parameter:

Simply don't create a capture (don't use parentheses).

\\*1\\*(.*?)\\*11\\*(.*?)\\*.*?\\*53\\*(.*?)##

Ignore independently of position:

You need to capture the IGNORE part just like you're doing, and check in your loop if it needs to be ignored:

String msg = "*1*20*11*30*IGNORE*53*40##";
String regex = "\\*1\\*(.*?)\\*11\\*(.*?)\\*(.*?)\\*53\\*(.*?)##";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(msg);
if (matcher.find()) {
    for (int i = 0; i < matcher.groupCount(); i++) {
        if (!matcher.group(i+1).equals("IGNORE")) {
            System.out.println(matcher.group(i+1));
        }
    }
}

DEMO

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! the regex in your comment below my question works fine as well!
That regex always ignores the 3rd parameter (by not creating a capture). It won't work if IGNORE is in another position.
1

You can use a tempered greedy token to make sure you do not get a match when IGNORE is in-between the 2nd and 3rd capture groups:

\\*1\\*(.*?)\\*11\\*(.*?)\\*(?:(?!IGNORE).)*\\*53\\*(.*?)##

See demo. In this case, the 3rd group cannot contain IGNORE.

The token is useful when you need to match the closest window between two subpatterns that does not contain some substring.

In case you just do not want the 3rd group to be equal to IGNORE, use a negative look-ahead:

\\*1\\*(.*?)\\*11\\*(.*?)\\*(?!IGNORE\\*)(.*?)\\*53\\*(.*?)##
                             ^^^^^^^^^^^^

See demo

3 Comments

your demo does not give me the right result. I don't want 'some' to be a match
And what is the result then? Some is not a match, it is placed into a group. You can remove the brackets and that's all.
Good, I tried to show how you can achieve what you need with just changing the regex. Surely you can also do that multiple ways in code.
0

Split the input on * and treat IGNORE as an optional part of the delimiter, having first trimmed off the prefix and suffix:

String[] parts = msg.replaceAll("^\\*\\d\\*|##$","").split("(\\*IGNORE)?\\*\\d+\\*");

Some test code:

String msg = "*1*20*11*30*IGNORE*53*40##";
String[] parts = msg.replaceAll("^\\*\\d\\*|##$","").split("(\\*IGNORE)?\\*\\d+\\*");
System.out.println(Arrays.toString(parts));

Output:

[20, 30, 40]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.