1

Given input as:

#start
  random string 1
#end

#start
  random string 2
#end

I can write a regex as

(#start[\s\S]*?#end)

Now thing gets a bit complex with this given data:

  #start
    random string 1
    #start
      random string 2
    #end
  #end

  #start
    random string 3
  #end

and i want to get 03 matches, which are:

#start
  random string 1
#end

#start
  random string 2
#end

#start
  random string 3
#end

Will this even be possible with regex? Cause i tried most of the regex rules, but i think i missed something cause it doesn't work as I want.

Can someone show me which rules can I used to achieve this goal?

Thank you.

5
  • No way to do it with a single regex. #start random string 1 #end is missing in the string as a continuous streak of text. Commented Jun 27, 2017 at 19:54
  • Perhaps yes, perhaps no, it's depend if you give us the correct indentation. But whatever the first result will contain the second. edit your question to be more clear about that. If in real life the string isn't indented, it's not possible. Commented Jun 27, 2017 at 19:56
  • @WiktorStribiżew considering that deep level is unknown, to me this seems like a problem that can't be done with regex alone. Commented Jun 27, 2017 at 19:59
  • @CasimiretHippolyte the indention is not guaranteed.. Commented Jun 27, 2017 at 19:59
  • @Xitrum: in this case, it isn't possible. Use a more classic way with loops and stacks, flags... Commented Jun 27, 2017 at 20:02

3 Answers 3

3

You cannot do it in a single regex. However you can achieve it by extracting one group at a time and remove it from the input string in the loop till no more matches could be found.

So the regex might look like the following in java

Pattern p = Pattern.compile("^.*(#start[^#]+#end).*$");

Now you can remove the portion of string from the initial line and do it in the loop.

Here is a small test program which does it:

public static void main(String args[]) {
    String re = "#start hello there #start my world #end #end #start bye dear #end ";
    Pattern p = Pattern.compile("^(.*)(#start[^#]+#end)(.*)$");
    Matcher m;
    while ( (m = p.matcher(re)).matches()) {            
        System.out.println(m.group(2));
        re = m.group(1) + m.group(3);
    }
}

and the result is:

#start bye dear #end
#start my world #end
#start hello there  #end
Sign up to request clarification or add additional context in comments.

1 Comment

BTW, here is a disclaimer, In the question you have #start..#end on separate lines. in java i would avoid using regex as in the example. It has performance implication. I would rather do line by line processing and build stack of interpreted data chunkcs.
2

This cannot be done with regex alone. The answer to Can regular expressions be used to match nested patterns explains the detail of why this is the case. You must encode the maximum possible depth within your regex to make it work.

5 Comments

Note that the question you linked is very general and doesn't take account of what modern regex engines are able to do. There are many languages with a regex engine able to do that (but not Java): Ruby/Perl/.net languages/PHP/R/Python with the regex module...
Well this is a Java question.
Not the one you linked.
But I was answering this question, and the answer I linked to applies.
If you know about regex, then answer this question. Don't make a blanket statement such as This cannot be done with regex alone. because it can, and is done every day !! Also, it's more informative to show a sample of what you mean when you say encode the max depth.
0

I got the solution from the idea of Serge's answer. The answer is good, but didn't fit my case due to the deep level is unknown. So my solution finds the deepest matched groups, remove them from the string, and then continuing on that string.

So something likes (#start((?!#start)[\s\S])*?#end)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.