2

I have the following String:

String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n"

I want to convert it to an array of String which will look like this.

String[] Title = {"Title1 Title2","Title3 Title4","Title5 Title6","Title7"}

I am trying the following code.

String[] Title=fullPDFContext.split("\r\n\r\n|\r\n \r\n|\r\n");

But not getting the desired output.

0

3 Answers 3

2

You need to split with a pattern that matches any amount of whitespace that contains a line break:

String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n";
String separator = "\\p{javaWhitespace}*\\R\\p{javaWhitespace}*";
String results[] = fullPDFContex.split(separator);
System.out.println(Arrays.toString(results));
// => [Title1 Title2, Title3 Title4, Title5 Title6, Title7]

See the Java demo.

The \\p{javaWhitespace}*\\R\\p{javaWhitespace}* matches

  • \\p{javaWhitespace}* - 0+ whitespaces
  • \\R - a line break (you may replace it with [\r\n] for Java 7 and older)
  • \\p{javaWhitespace}* - 0+ whitespaces.

Alternatively, you may use a bit more efficient

String separator = "[\\s&&[^\r\n]]*\\R\\s*";

See another demo

Unfortunately, the \R construct cannot be used in the character classes. The pattern will match:

  • [\\s&&[^\r\n]]* - zero or more whitespace chars other than CR and LF (character class subtraction is used here)
  • \\R - a line break
  • \\s* - any 0+ whitespace chars.
Sign up to request clarification or add additional context in comments.

1 Comment

Well, \\p{javaWhitespace} might be too long. A similar, but more efficient pattern is "[\\s&&[^\r\n]]*\\R\\s*" or "[\\s&&[^\r\n]]*[\r\n]\\s*".
0

Here is your solution. we can use StringTokenizer & I have used list to insert the splitted values.This can help you if you have n number of values splitted from your array

package com.sujit;

import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;

public class UserInput {

    public static void main(String[] args) {
        String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n";
        StringTokenizer token = new StringTokenizer(fullPDFContex, "\r\n");
        List<String> list = new ArrayList<>();
        while (token.hasMoreTokens()) {

            list.add(token.nextToken());
        }
        for (String string : list) {
            System.out.println(string);
        }
    }
}

Comments

0

With this code you get the output you want:

String[] Title = fullPDFContext.split(" *(\r\n ?)+ *");

2 Comments

This will not work if the line break is \n or \r, and if there are tabs before/after a line break. See my solution that handles that all.
Your answer is more complete (and was accepted). My answer made (intentionally) no assumptions on what the OP wanted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.