4

I have an HTML page that has Javascript code. It needs to be rendered first before it can be converted into an image.

I am aware of projects like wkhtmltoimage, PhantomJS, khtmltopng, webkit2png, PrinceXML and html2image. I have implemented a few of those but I am trying to find a pure Java solution that does not have to use Process to execute a command. Any help would be great, thanks!

edit: I looked into Cobra however it seems that the JS support is still in dev and it does not parse my html file properly.

Or if there are any other ways of doing this, please let me know. I am just trying to find the best solution possible.

7
  • 1
    A pure Java solution or a pure JavaScript solution? They're (way) not the same thing. Commented Jun 19, 2012 at 18:51
  • Pure Java solution however it needs to be able to take in HTML + JS Commented Jun 19, 2012 at 18:53
  • Ah OK. Well, are you talking about something that can handler arbitrary HTML pages with JavaScript code in them? If so, I wouldn't get your hopes up. The FlyingSaucer project does an amazingly good job with XHTML and CSS, but it doesn't handle JavaScript. Commented Jun 19, 2012 at 18:55
  • Yeah thats what I was looking for. Hmm I remember coming across that project, but yes unfortunately it doesn't handle JavaScript. Commented Jun 19, 2012 at 18:59
  • 1
    Well I don't personally have much experience doing that, but yes that sounds like the generally correct approach. Commented Jun 19, 2012 at 19:05

2 Answers 2

1

There is no pure Java solution - no one has written a browser in Java that supports HTML 5.

I'd try either of these approaches:

  1. Use env.js + rhino to simulate a browser in which you can run the JavaScript. That should give you a DOM which you can render using FlyingSaucer, for example.

  2. Add SWT to your classpath (plus the binary for your platform). It contains a Browser component that uses your system's browser to render URLs or an HTML string.

You probably need SWTBot to run the browser in headless mode.

If that doesn't work and you're on Linux, then you can start an in-memory X server Xvfb to open your browser. Or you can use vncserver to start a desktop on your server.

[EDIT] The project phantomjs might do what you want:

PhantomJS (www.phantomjs.org) is a headless WebKit scriptable with JavaScript or CoffeeScript.
[...]
Use cases: Headless web testing, Site scraping, Page rendering
Multiplatform, available on major operating systems: Windows, Mac OS X, Linux, other Unices
Fast and native implementation of web standards: DOM, CSS, JavaScript, Canvas, SVG. No emulation!
Pure headless (X11) on Linux, ideal for continuous integration systems. Also runs on Amazon EC2.

The quickstart page explains how to load a web page and render it to an image.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the suggestions Aaron! I'll try and implement these approaches.
I have implemented env.js + rhino to run the javascript+html however I am having trouble actually linking FlyingSaucer and the DOM from env.js. I understand that I can pass a DOM Document to FlyingSaucer, however I am a little lost on how to actually get the DOM from env.js+rhino. If you have experience in env.js and rhino, any suggestions would be great, thanks!
env.js just gives you the same environment that you'd have in a browser. See this answer how to get the HTML for the final page: stackoverflow.com/questions/817218/… Just run the JavaScript from one of the solution in rhino and pass the result to FlyingSaucer to render.
Thanks again for the help. So when getting the HTML of the rendered page, I would only get the HTML and not the JS. What if the javascript doesn't actually alter the HTML code but it is needed to render a chart or graph. What I mean is, I have 3-4 javascript functions that are needed to render a chart. It seems like this would not be possible in my case because I need the JS to render the chart. If it just returns the HTML then it would return the inline script to render the chart. I might be understanding this incorrectly though.
After env.js returns, all the scripts on the page should have been executed and the DOM should be the same as in your browser. If that doesn't happen, the scripts in your page probably use animation or try to load data (which env.js prohibits by default to protect you). I suggest to write small examples to see how it works and post new questions as you hit problems.
1

I have found a solution using WebRenderer. WebRenderer is a paid solution and has a swing, server, and desktop edition. The swing edition is the only one that supports HTML5 as of 7/9/2012. However, the swing edition can be used on a server to convert the image by instantiating the browser and not creating a JFrame. See this question.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.