I have this corresponding input text:
Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre II]] movie directed [[Source:NYTimes]]...
Clark visited the [[University of Pleasantville]] campus in November 2009 to ...
*[[1973]] – [[Clark Kent]], superhero and newspaper reporter...
After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''...
Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]...</code>
This is the pattern code that I am using in Java:
<code>String pattern = "(?:\\p{Punct}|\\B|\\b)(\\[\\[[^(Arch:|Zeus:|Source:)].*?\\]\\])(?:\\p{Punct}|\\b|\\B)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(data);
while (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
I am reading the file line by line using readLine of BufferedReader (sysout-ing every line as I parse it) and getting the following output using my regex:
Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre II]] movie directed [[Source:NYTimes]]...
Clark visited the [[University of Pleasantville]] campus in November 2009 to ...
Found value: [[University of Pleasantville]]
*[[1973]] – [[Clark Kent]], superhero and newspaper reporter...
Found value: [[1973]]
After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''...
Found value: [[negative hero]]
Found value: [[Alternate Superman]]
Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]...
Found value: [[Daily Planet]]
Found value: [[Louis Lane]]
As you can see the problem: I am not able to extract all the stuffs within the braces [[I_want_to_extract_these_except_Source_or_Arch_or_Zeus]]. Example: From the first line I should've extracted [[Superman (the Hero)|Superman]] etc. but it didn't retrieve anything. How can I modify my regex to extract everything except the ones which have [[Source:something]] etc.? Thank you.