0

I have HTML pages as String in Java and I need to extract the JavaScript links from it. Is there any good and easy to use library that I can use? I looked up Cobra and Neko, but I don't think (maybe I'm wrong) that they have what I need, such as getting tag specific content.

1 Answer 1

1

Take a look at JSoup. It is an HTML parser that has a selector-DSL (Domain Specific Language) for finding elements of the dom.

For example, to find all a tags with an href, you would do this:

Document doc = Jsoup.connect("http://www.google.com/").get();
Elements hrefAnchors = doc.select("a[href]"); 

If you already have the html downloaded as a String, you can use the parse(String) method:

String html = "<p>Welcome to <a href='http://www.google.com/'>Google</a>.</p>";
Document doc = Jsoup.parse(html);
Sign up to request clarification or add additional context in comments.

1 Comment

thankyou for your reply . I'll definitely look into it. But as i said above i already have parsed page as string.I dont need to make a new connection to get page . Will 'Jsoup' work in that case too ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.