Simulating clicking on a javascript link in python

Question

I am trying to collate reviews of restaurants. Urllib2 works fine for the initial page of reviews, but there is then a link to load the next increment of comments which is a javascript link. An example page is here, and the code for the link "Next 25" is:

<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$RestRatings$Next','')" class="red" id="ctl00_ContentPlaceHolder1_RestRatings_Next">NEXT 25&gt;&gt; </a>

I have looked at all the previous answers (e.g.), and I have to say I'm none the wiser. Looking at the console in Firebug doesn't offer up a handy link. Could you suggest the best (easiest) way to achieve this?

Edit: With thanks to Seleniumnewbie this code will print out all the comments from the reviews.:

from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import re

driver = webdriver.Firefox()

def getURLinfo(url):

    driver.get(url)
    html = driver.page_source
    next25 = "ctl00_ContentPlaceHolder1_RestRatings_Next"
    soup = BeautifulSoup(html)

    while soup.find(id=re.compile(next25)):            
        driver.find_element_by_id(next25).click()
        html = html + driver.page_source
        soup = BeautifulSoup(driver.page_source)

    soup = BeautifulSoup(html)
    comment = soup.findAll(id=re.compile("divComment"))

    for entry in comment:
        print entry.div.contents #for comments

    driver.close()

Matth · Accepted Answer · 2012-11-18 01:13:26Z

2

When a user clicks that link, the function __doPostBack is being called in javascript on the client. The link to the other question you provided assumes this function makes an AJAX call and then places the result in the same page.

However, the review pages you have linked to doesn't do that. It does make an AJAX call, but then it reloads the same page. I couldn't get to trap what the AJAX call is because it reloads immediately, but since the page is just reloading with the new comments I'm pretty sure that it is telling the server to move you to the next page.

So, in order to get your next page of comments you will have to call the same url that the __doPostBack function is calling and then reload the page you are on. To find this url, I would de-obfuscate their javascript and find the function being called. I believe the actual URL that will be called will depend on the parameter to that function so you want to make sure to replicate what it does.

answered Nov 18, 2012 at 1:13

Matth

1462 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

eamon1234 Over a year ago

Thanks, I couldn't find something that looked similar to a url in the javascript ("www", "http","review") so I went with the selenium brute force approach!

Amey · Accepted Answer · 2012-11-18 01:04:17Z

1

Find the element by id="ctl00_ContentPlaceHolder1_RestRatings_Next" and then click it.

answered Nov 18, 2012 at 1:04

Amey

8,5489 gold badges47 silver badges64 bronze badges

Collectives™ on Stack Overflow

Simulating clicking on a javascript link in python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related