0

I have a text containing some important information I want to extrakt. The important information is marked in curly brackets. There are several different "markings" of the important text to divide it into groups.

An Example:

Lorem ipsum dolor sit {this is important}\GROUP1 amet, consetetur sadipscing elitr, sed diam {also Important}\GROUP1 nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, {not so important}\GROUP2 sed diam voluptua. At vero eos et accusam et {slightly important}\GROUP3 justo duo dolores et ea rebum. Stet clita kasd gubergren.

To find these "important text" blocks I use regex (take the stuff between "{" and "\GROUP1"):

Pattern regexGroup1 = Pattern.compile("(\\{(.*?)\\GROUP1"));  
Matcher regexMatcher = regexGroup1.matcher(data);  
regexMatcher.group(); 

to find the GROUP1 textchunks.

 Pattern regexGroup2 = Pattern.compile("(\\{(.*?)\\GROUP2"));  
 Matcher regexMatcher = regexGroup2.matcher(data);  
 regexMatcher.group();  

to find the GROUP2 textchunks.... etc.

Is there a way to make only 1 regex to find all those groups at once and access them with regexMatcher.group(1-3) ?

something like this: regexMatcher.group(1) output:

this is important
also Important

regexMatcher.group(2) output:

not so important

regexMatcher.group(3) output:

slightly important

Ty in advance.

1 Answer 1

1

You could use a slightly different Pattern, with two groups. Like,

Pattern regexGroup = Pattern.compile("(\\{(.*?)\\GROUP(\\d+)");  
Matcher regexMatcher = regexGroup.matcher(data);  

Then you might access the data with regexMatcher.group(1) and regexMatcher.group(2) (examining the result of the second for the importance).

Sign up to request clarification or add additional context in comments.

3 Comments

ah I see. But the chunks are not always marked as "GROUP1-?". I used this as an example (my bad). It should work with {}\GROUP, {}\PERSON, {}\ANIMAL, etc... Its a text annotated by some kind of NER Extractor.
The same idea, just use a regex to match ([GROUP|PERSON|ANIMAL])
(\\{(.*?)\\}\\\\(GROUP|PERSON)(\\d+)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.