1

I have the following html code segment:

        <br>
        Date: 2010-06-20,  1:37AM PDT<br>
        <br>
        Daddy: <a href="...">www.google.com</a>
        <br>

I want to extract

Date: 2010-06-20, 1:37AM PDT

and

Daddy: <a href="...">www.google.com</a>

with the help of java regex.

So what regex I should use?

4
  • Regex is the wrong tool for the job. However, you need to provide a bit more context before we can give a more suitable answer. Where does this HTML come from? How are you loading it? How does the complete HTML look like? Do you have control over it? Commented Jun 20, 2010 at 13:02
  • @BalusC - A .split() with regex may be pretty straightforward here though, just whitespace and <br>, just a thought. Commented Jun 20, 2010 at 13:03
  • 1
    @Nick: not if there's more into the actual HTML than only this "segment". Commented Jun 20, 2010 at 13:05
  • @BalusC - Agreed, hopefully that's not the case and he has this exact string already. Commented Jun 20, 2010 at 13:12

1 Answer 1

1

This should give you a nice starting point:

    String text = 
    "        <br>\n" +
    "        Date: 2010-06-20,  1:37AM PDT<br>   \n" +
    "   <br>    \n" +
    "Daddy: <a href=\"...\">www.google.com</a>   \n" +
    "<br>";

    String[] parts = text.split("(?:\\s*<br>\\s*)+");
    for (String part : parts) {
        System.out.println("[" + part + "]");
    }

This prints (as seen on ideone.com):

[]
[Date: 2010-06-20,  1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]

This uses String[] String.split(String regex). The regex pattern is "one or more of <br>, with preceding or trailing whitespaces.


Guava alternative

You can also use Splitter from Guava. It's actually a lot more readable, and can omitEmptyStrings().

    Splitter splitter = Splitter.on("<br>").trimResults().omitEmptyStrings();
    for (String part : splitter.split(text)) {
        System.out.println("[" + part + "]");
    }

This prints:

[Date: 2010-06-20,  1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]

Related questions

Sign up to request clarification or add additional context in comments.

3 Comments

Also maybe you want something like this? rubular.com/r/wy3b1ABsaC Leave a comment and I'll elaborate on any of these approaches.
Also check this one out: rubular.com/r/mftjWgKWzP Tell me which one you fancy.
I agree. For this kind of html, you shouldn't use regex, instead, you have the key which is tag "br".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.