11

I would like to extract sub-string between certain two words using java.

For example:

This is an important example about regex for my work.

I would like to extract everything between "an" and "for".

What I did so far is:

String sentence = "This is an important example about regex for my work and for me";
Pattern pattern = Pattern.compile("(?<=an).*.(?=for)");
Matcher matcher = pattern.matcher(sentence);

boolean found = false;
while (matcher.find()) {
    System.out.println("I found the text: " + matcher.group().toString());
    found = true;
}
if (!found) {
    System.out.println("I didn't found the text");
}

It works well.

But I want to do two additional things

  1. If the sentence is: This is an important example about regex for my work and for me. I want to extract till the first "for" i.e. important example about regex

  2. Some times I want to limit the number of words between the pattern to 3 words i.e. important example about

Any ideas please?

3
  • do you want the pattern only match if there are only 3 words between 'an' and 'for', or do you want only the 3 words regardless of the number of words in the match? Commented Aug 15, 2011 at 9:10
  • @Dragon8: I want only the 3 words regardless of the number of words in the match. Commented Aug 15, 2011 at 9:49
  • ok, then you can split the match afterwards with someString.split(" "). it returns a array of Strings there each position is a word from your match. Commented Aug 15, 2011 at 10:03

3 Answers 3

8

For your first question, make it lazy. You can put a question mark after the quantifier and then the quantifier will match as less as possible.

(?<=an).*?(?=for)

I have no idea what the additional . at the end is good for in .*. its unnecessary.

For your second question you have to define what a "word" is. I would say here probably just a sequence of non whitespace followed by a whitespace. Something like this

\S+\s

and repeat this 3 times like this

(?<=an)\s(\S+\s){3}(?=for)

To ensure that the pattern mathces on whole words use word boundaries

(?<=\ban\b)\s(\S+\s){1,5}(?=\bfor\b)

See it online here on Regexr

{3} will match exactly 3 for a minimum of 1 and a max of 3 do this {1,3}

Alternative:

As dma_k correctly stated in your case here its not necessary to use look behind and look ahead. See here the Matcher documentation about groups

You can use capturing groups instead. Just put the part you want to extract in brackets and it will be put into a capturing group.

\ban\b(.*?)\bfor\b

See it online here on Regexr

You can than access this group like this

System.out.println("I found the text: " + matcher.group(1).toString());
                                                        ^

You have only one pair of brackets, so its simple, just put a 1 into matcher.group(1) to access the first capturing group.

Sign up to request clarification or add additional context in comments.

10 Comments

I also would advise not to use look ahead/behind syntax. Why to complicate things? an\b(.*?)\bfor will perfectly do the job.
As @dmk_k (thanks) correctly stated its not necessary to use advanced constructs here in your case, I updated my answer with a simpler solution.
@stema: thank you for your help. the second alternative does not work for me, I don't know why it couldn't find the text. another thing i want "an" to be the separate word "an" not a part from a word like in "and" or"important".
@Daisy I updated my solution adding \b word boundaries. I also added links to "Regexr", that is a nice online tool where you can test your regexes.
But i wonder why this is more easier than look ahead/behind? and when shall i use look ahead/behind?
|
3

Your regex is "an\\s+(.*?)\\s+for". It extracts all characters between an and for ignoring white spaces (\s+). The question mark means "greedy". It is needed to prevent pattern .* to eat everything including word "for".

2 Comments

A question mark makes a quantifier ungreedy. You are missing the quantifier in your answer, so you are looking for an arbitrary character and made this optional with your Question mark.
@AlexR: thank you, it works but adding * like that: "an\\s+(.*?)\\s+for"
2

public class SubStringBetween {

public static String subStringBetween(String sentence, String before, String after) {

    int startSub = SubStringBetween.subStringStartIndex(sentence, before);
    int stopSub = SubStringBetween.subStringEndIndex(sentence, after);

    String newWord = sentence.substring(startSub, stopSub);
    return newWord;
}

public static int subStringStartIndex(String sentence, String delimiterBeforeWord) {

    int startIndex = 0;
    String newWord = "";
    int x = 0, y = 0;

    for (int i = 0; i < sentence.length(); i++) {
        newWord = "";

        if (sentence.charAt(i) == delimiterBeforeWord.charAt(0)) {
            startIndex = i;
            for (int j = 0; j < delimiterBeforeWord.length(); j++) {
                try {
                    if (sentence.charAt(startIndex) == delimiterBeforeWord.charAt(j)) {
                        newWord = newWord + sentence.charAt(startIndex);
                    }
                    startIndex++;
                } catch (Exception e) {
                }

            }
            if (newWord.equals(delimiterBeforeWord)) {
                x = startIndex;
            }
        }
    }
    return x;
}

public static int subStringEndIndex(String sentence, String delimiterAfterWord) {

    int startIndex = 0;
    String newWord = "";
    int x = 0;

    for (int i = 0; i < sentence.length(); i++) {
        newWord = "";

        if (sentence.charAt(i) == delimiterAfterWord.charAt(0)) {
            startIndex = i;
            for (int j = 0; j < delimiterAfterWord.length(); j++) {
                try {
                    if (sentence.charAt(startIndex) == delimiterAfterWord.charAt(j)) {
                        newWord = newWord + sentence.charAt(startIndex);
                    }
                    startIndex++;
                } catch (Exception e) {
                }

            }
            if (newWord.equals(delimiterAfterWord)) {
                x = startIndex;
                x = x - delimiterAfterWord.length();
            }
        }
    }
    return x;
}

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.