0

I have some html-page. There is a javascript which generates some content. I have to parse this content from python-script. I have saved copy of file on the computer. Are there any ways to work with 'already generated' html? Like I can see in the browser after opening page-file. As I understand, I have to work with DOM (maybe, xml2dom lib).

2 Answers 2

2

Have you saved "the file" (web page, I imagine) before or after Javascript has altered it?

If "after", then it doesn't matter any more that some of the HTML was done via Javascript -- you can just use popular parsers like lxml or BeautifulSoup to handle the HTML you have.

If "before", then first you need to let Javascript do its work by automating a real browser; for that task, I would recommend SeleniumRC -- which brings you back to the "after" case;-).

Sign up to request clarification or add additional context in comments.

3 Comments

+1 I think you got the question better than I did. I'm leaving my answer in place anyway in case somebody needs it.
Yeah, 'before'. But my script should work almost every minute automatically. Can I implement this with SeleniumRC?
@Ockonal, if you have powerful-enough machines with lots of RAM, sure: with today's newest, fastest browsers, Javascript runs pretty fast, and Selenium adds little overhead to that.
0

I think you may have a fundamental misunderstanding in regards to what runs where: At the time JavaScript generates the content (on client side), the server side processing of the document has already taken place. There is no direct way for a server side Python script to access HTML created by JavaScript. Basically, that HTML lives only "virtually" in the browser's DOM.

You would have to find a way to transmit that HTML to your Python script. Most likely using Ajax. You would take the HTML, and add it as a parameter to your Ajax call (Remember to use POST as the request method so you don't get size limitation problems.)

An example using jQuery's AJAX functions:

$.ajax({ 
  url: "myscript.py", 
  type: "POST",
  data: { html: your_html_content_here },
  success: function(){
    alert("sent HTML to python script!");
  }});

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.