3

I have a HTML page like

<html>
<head>
<!-- necessary java scripts -->
</head>
<body>
<div id="content"></div>
</body>

Using the script, when the page renders, appropriate html content is placed withing the div element with id "content". So after the page renders there are a whole lot of html content withing div element.

Now i need to extract the dynamically rendered content within the div element using Java. Can anyone please suggest a way to do it?

0

3 Answers 3

1

The problem is that you need to evaluate script on the page in java. You need to get some web engine to do it. You can look here: Embedding Gecko/Webkit in Java And try to use webkit or gecko to load page. Then you can use some java library to parse html.

Sign up to request clarification or add additional context in comments.

1 Comment

I implemented using SWT and it worked fine! Thanks for the link :-)
0

You can parse html with javax.swing.text.html.HTMLEditorKit.Parser.Have a look at this link

http://java.sun.com/products/jfc/tsc/articles/bookmarks/

Comments

0

Have a look through these:

http://java-source.net/open-source/html-parsers

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.