0

I am trying to fetch values from a web page source file this is the html rules i have

e=d.select("li[id=result_48]");
e=d.select("div[id=result_48]");

this is the html tag

<li id="result_48" data-asin="0781774047" class="s-result-item">
<div id="result_48" data-asin="0781774047" class="s-result-item">

what i want to do is whatever comes in place of "li" or "div" i want to get the value inside the id .. so i want to use RegX in place of "li" or "div"

So the Jsoup element should check the id=result_48 and if something comes like that i want the data. how can i do that.

Thanks in advance

3
  • Why can't you use getElementById("result_48") as id are unique in html? Commented Oct 16, 2014 at 7:12
  • <li id="result_48" data-asin="0781774047" class="s-result-item"> This is the html tag . <div id="result_48" data-asin="0781774047" class="s-result-item"> Commented Oct 16, 2014 at 7:16
  • I can't see any regex in your ask, you ask for the id result_48 Commented Oct 16, 2014 at 7:22

1 Answer 1

1

Tested with different order of attributes. Might have missed some cases so test with your actual data. Assume that there are no spaces and quotes in the id attribute.

public static void main(String[] args) throws Exception {
    String[] lines = {
            "<li id=\"result_48\" data-asin=\"0781774047\" class=\"s-result-item\">",
            "<div id=\"result_48\" data-asin=\"0781774047\" class=\"s-result-item\">",
            "<div data-asin=\"0781774047\" id=\"result_48\" class=\"s-result-item\">",
            "<div data-asin=\"0781774047\" class=\"s-result-item\" id=\"result_48\">" };
    for (String str : lines) {
        System.out.println(extractId(str));
    }
}

private static String extractId(String line) {
    String regex = "";
    regex = regex + "(?:[<](?:li|div)).*id=\""; // match start until id="
    regex = regex + "([^\\s^\"]+)"; // capture the id inside quotes (exclude
                                    // spaces and quote)
    regex = regex + "(?:.*\">)"; // match any characters until the end ">
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(line);
    if (matcher.matches()) {
        return matcher.group(1);
    }
    return null;
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.