1

I've following rtf string: \af31507 \ltrch\fcs0 \insrsid6361256 Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid12283827 and I want to extract the content of Study Title ie (Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}). Below is my code

String[] arr = value.split("\\s+");
//System.out.println(arr.length);
for(int j=0; j<arr.length; j++) {
    if(isNumeric(arr[j])) {
         arr[j] = "\\?" + arr[j];
    }
}

In above code, I'm splitting the string by space and iterating over the array to check if there is any number in string, however, isNumeric function is unable to process 8000 which is after \u8805 because its getting the content as 8000}}{\rtlch\fcs1. I'm not sure how I can search the Study title and its content using regex?

2
  • 1
    Suggest you use an RTF parser. See Java RTF Parser. Commented Jan 29, 2018 at 1:29
  • Hi @Andrea I can use RTF parser, however I'm not sure If I can get the unicode chars as I want to update the contents of my Study Title string. That's the reason I'm not using RTF parser as it will display the plain text without those unicode chars Commented Jan 29, 2018 at 1:35

1 Answer 1

2

Study Title: {[^}]*} will match your expect. Demo: https://regex101.com/r/FZl1WL/1

    String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
    Pattern p = Pattern.compile("Study Title: \\{[^}]*\\}");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

output:

Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}

Update as per OP ask

String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
    Pattern p = Pattern.compile("(?<=Study Title: \\{)[^}]*(?=\\})");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000
Sign up to request clarification or add additional context in comments.

3 Comments

Hi @yudong thanks for answer. Also how can I get the content only like 'Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000' rather with study title and without start '{' and ending '}' ?
@shanky, just update pattern to (?<=Study Title: \\{)[^}]*(?=\\})
Hi @yudong its giving me java.util.regex.PatternSyntaxException: Illegal repetition error

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.