Java Regex replace all

Question

My text will look like this

| birth_date          = {{birth date|1925|09|2|df=y}}
| birth_place         = [[Bristol]], [[England]], UK
| death_date          = {{death date and age|2000|11|16|1925|09|02|df=y}}
| death_place         = [[Eastbourne]], [[Sussex]], England, UK
| origin              = 
| instrument          = [[Piano]]
| genre               = 
| occupation          = [[Musician]]

I would like to get everything that is inside of [[ ]]. I tried to use replace all to replace everything that is not inside the [[ ]] and then use split by new line to get a list of text with [[ ]].

input = input.replaceAll("^[\\[\\[(.+)\\]\\]]", "");

Required output:

[[Bristol]]
[[England]]
[[Eastbourne]]
[[Sussex]]
[[Piano]]
[[Musician]]

But this is not giving the desired output. What am I missing here?. There are thousands of documents and is this the fastest way to get it? If no, do tell me the optimum way to get the desired output.

In addition to other problems, please note that (.+) is a "greedy" quantifier that will grab as many characters as it can between [[ and ]], meaning that for birth_place you'll get "Bristol]], [[England" as one of the matches. Adding ? after .+, as in falsetru's answer, prevents this. — ajb
– ajb, Commented Oct 4, 2013 at 16:56

Anirudha · Accepted Answer · 2013-10-04 16:22:10Z

6

You need to match it not replace

Matcher m=Pattern.compile("\\[\\[\\w+\\]\\]").matcher(input);
while(m.find())
{
    m.group();//result
}

answered Oct 4, 2013 at 16:22

Anirudha

32.9k8 gold badges71 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

falsetru · Accepted Answer · 2013-10-04 16:22:23Z

2

Use Matcher.find. For example:

import java.util.regex.*;

...

String text =
    "| birth_date          = {{birth date|1925|09|2|df=y}}\n" +
    "| birth_place         = [[Bristol]], [[England]], UK\n" +
    "| death_date          = {{death date and age|2000|11|16|1925|09|02|df=y}}\n" +
    "| death_place         = [[Eastbourne]], [[Sussex]], England, UK\n" +
    "| origin              = \n" +
    "| instrument          = [[Piano]]\n" +
    "| genre               = \n" +
    "| occupation          = [[Musician]]\n";
Pattern pattern = Pattern.compile("\\[\\[.+?\\]\\]");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
    System.out.println(matcher.group());
}

answered Oct 4, 2013 at 16:22

falsetru

371k69 gold badges769 silver badges659 bronze badges

Comments

femtoRgon · Accepted Answer · 2013-10-04 16:37:05Z

0

Just for fun, using replaceAll:

 String output = input.replaceAll("(?s)(\\]\\]|^).*?(\\[\\[|$)", "$1\n$2");

answered Oct 4, 2013 at 16:37

femtoRgon

33.4k7 gold badges67 silver badges90 bronze badges

Collectives™ on Stack Overflow

Java Regex replace all

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related