How to use regex with String.split()

Question

I have the following String:

String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n"

I want to convert it to an array of String which will look like this.

String[] Title = {"Title1 Title2","Title3 Title4","Title5 Title6","Title7"}

I am trying the following code.

String[] Title=fullPDFContext.split("\r\n\r\n|\r\n \r\n|\r\n");

But not getting the desired output.

Wiktor Stribiżew · Accepted Answer · 2017-07-13 13:32:02Z

2

You need to split with a pattern that matches any amount of whitespace that contains a line break:

String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n";
String separator = "\\p{javaWhitespace}*\\R\\p{javaWhitespace}*";
String results[] = fullPDFContex.split(separator);
System.out.println(Arrays.toString(results));
// => [Title1 Title2, Title3 Title4, Title5 Title6, Title7]

See the Java demo.

The \\p{javaWhitespace}*\\R\\p{javaWhitespace}* matches

\\p{javaWhitespace}* - 0+ whitespaces
\\R - a line break (you may replace it with [\r\n] for Java 7 and older)
\\p{javaWhitespace}* - 0+ whitespaces.

Alternatively, you may use a bit more efficient

String separator = "[\\s&&[^\r\n]]*\\R\\s*";

See another demo

Unfortunately, the \R construct cannot be used in the character classes. The pattern will match:

[\\s&&[^\r\n]]* - zero or more whitespace chars other than CR and LF (character class subtraction is used here)
\\R - a line break
\\s* - any 0+ whitespace chars.

edited Jul 13, 2017 at 13:32

answered Jul 13, 2017 at 13:23

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Wiktor Stribiżew Over a year ago

Well, \\p{javaWhitespace} might be too long. A similar, but more efficient pattern is "[\\s&&[^\r\n]]*\\R\\s*" or "[\\s&&[^\r\n]]*[\r\n]\\s*".

sForSujit · Accepted Answer · 2017-07-13 13:22:53Z

Here is your solution. we can use StringTokenizer & I have used list to insert the splitted values.This can help you if you have n number of values splitted from your array

package com.sujit;

import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;

public class UserInput {

    public static void main(String[] args) {
        String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n";
        StringTokenizer token = new StringTokenizer(fullPDFContex, "\r\n");
        List<String> list = new ArrayList<>();
        while (token.hasMoreTokens()) {

            list.add(token.nextToken());
        }
        for (String string : list) {
            System.out.println(string);
        }
    }
}

fazen · Accepted Answer · 2017-07-13 13:23:21Z

0

With this code you get the output you want:

String[] Title = fullPDFContext.split(" *(\r\n ?)+ *");

answered Jul 13, 2017 at 13:23

fazen

617 bronze badges

2 Comments

Wiktor Stribiżew Over a year ago

This will not work if the line break is \n or \r, and if there are tabs before/after a line break. See my solution that handles that all.

fazen Over a year ago

Your answer is more complete (and was accepted). My answer made (intentionally) no assumptions on what the OP wanted.

Collectives™ on Stack Overflow

How to use regex with String.split()

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related