1

I am trying to collate reviews of restaurants. Urllib2 works fine for the initial page of reviews, but there is then a link to load the next increment of comments which is a javascript link. An example page is here, and the code for the link "Next 25" is:

<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$RestRatings$Next','')" class="red" id="ctl00_ContentPlaceHolder1_RestRatings_Next">NEXT 25&gt;&gt; </a>

I have looked at all the previous answers (e.g.), and I have to say I'm none the wiser. Looking at the console in Firebug doesn't offer up a handy link. Could you suggest the best (easiest) way to achieve this?

Edit: With thanks to Seleniumnewbie this code will print out all the comments from the reviews.:

from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import re

driver = webdriver.Firefox()

def getURLinfo(url):

    driver.get(url)
    html = driver.page_source
    next25 = "ctl00_ContentPlaceHolder1_RestRatings_Next"
    soup = BeautifulSoup(html)

    while soup.find(id=re.compile(next25)):            
        driver.find_element_by_id(next25).click()
        html = html + driver.page_source
        soup = BeautifulSoup(driver.page_source)

    soup = BeautifulSoup(html)
    comment = soup.findAll(id=re.compile("divComment"))

    for entry in comment:
        print entry.div.contents #for comments

    driver.close()

2 Answers 2

2

When a user clicks that link, the function __doPostBack is being called in javascript on the client. The link to the other question you provided assumes this function makes an AJAX call and then places the result in the same page.

However, the review pages you have linked to doesn't do that. It does make an AJAX call, but then it reloads the same page. I couldn't get to trap what the AJAX call is because it reloads immediately, but since the page is just reloading with the new comments I'm pretty sure that it is telling the server to move you to the next page.

So, in order to get your next page of comments you will have to call the same url that the __doPostBack function is calling and then reload the page you are on. To find this url, I would de-obfuscate their javascript and find the function being called. I believe the actual URL that will be called will depend on the parameter to that function so you want to make sure to replicate what it does.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I couldn't find something that looked similar to a url in the javascript ("www", "http","review") so I went with the selenium brute force approach!
1

Find the element by id="ctl00_ContentPlaceHolder1_RestRatings_Next" and then click it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.