0

I have this corresponding input text:

    Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre       II]] movie directed [[Source:NYTimes]]...
    Clark visited the [[University of Pleasantville]] campus in November 2009 to ...
    *[[1973]] – [[Clark Kent]], superhero and newspaper reporter...
    After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''...
    Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]...</code>

This is the pattern code that I am using in Java:

    <code>String pattern = "(?:\\p{Punct}|\\B|\\b)(\\[\\[[^(Arch:|Zeus:|Source:)].*?\\]\\])(?:\\p{Punct}|\\b|\\B)"; 
    Pattern r = Pattern.compile(pattern); 
    Matcher m = r.matcher(data);
      while (m.find( )) {
        System.out.println("Found value: " + m.group(1) );
      }

I am reading the file line by line using readLine of BufferedReader (sysout-ing every line as I parse it) and getting the following output using my regex:
Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre II]] movie directed [[Source:NYTimes]]... Clark visited the [[University of Pleasantville]] campus in November 2009 to ... Found value: [[University of Pleasantville]] *[[1973]] &ndash; [[Clark Kent]], superhero and newspaper reporter... Found value: [[1973]] After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''... Found value: [[negative hero]] Found value: [[Alternate Superman]] Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]... Found value: [[Daily Planet]] Found value: [[Louis Lane]]

As you can see the problem: I am not able to extract all the stuffs within the braces [[I_want_to_extract_these_except_Source_or_Arch_or_Zeus]]. Example: From the first line I should've extracted [[Superman (the Hero)|Superman]] etc. but it didn't retrieve anything. How can I modify my regex to extract everything except the ones which have [[Source:something]] etc.? Thank you.

3
  • append the whole text into string and then match Commented Jul 6, 2014 at 14:23
  • Is that the problem @nikolap? What is wrong by reading line by line? Commented Jul 6, 2014 at 14:51
  • I'm not sure about the all text but may have something like [[Lois Lane and on the next line closing ]] Commented Jul 6, 2014 at 14:56

1 Answer 1

1

Use a negative lookahead (e.g. (?!...)) like this:

\[\[(?!Arch:|Zeus:|Source).*?\]\]

See it in action: http://regex101.com/r/lJ6sH3/1

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.