1

I'm trying to use urllib2 to fetch webpage from a website. After I managed to log on and retrieve the page, I found out the page has some <script>.....</script> inside. How can I save the rendered the output (the complete content of the webpage, not the script)?

3
  • Are you saying you'd like to save the content of the page, after any included Javascript has been run? Commented Feb 4, 2012 at 17:42
  • Are you doing this for testing, screen-scraping for an application, or what? In general, with JavaScript it's the browser that creates the page content, so you need a real browser to duplicate that... Commented Feb 4, 2012 at 17:44
  • @MattLuongo Yes, I'm trying to pull some of my personal message from a website which doesn't offer an API. Commented Feb 4, 2012 at 17:47

2 Answers 2

3

Javascript can't be easily handled if you are using urllib.

What you need is a headless browser, for ex. WebKit.

A simple example can be found here.

If you don't want yourself to be limited to python, try Phantomjs

Sign up to request clarification or add additional context in comments.

Comments

1

I'd also like to mention pywebkitgtk (which I've been using a lot lately as an embedded browser), and Selenium.

1 Comment

Selenium with an actual browser driver is very useful, can mimic most human interactions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.