0

Lately, I have been trying to do Web-Scraping and Crawling with Python and Selenium chromedriver. Its a reddit page which has threads and each thread has a title. When the title is clicked it goes to that particular thread. The thread consists of description and content.

What I am trying to do:

  • Step 1) Visit a reddit page
  • Step 2) Scan all the titles, store them in an array
  • Step 3) Loop through each of the items in the Title array
  • Step 4) Click on the title
  • Step 5) Get the description
  • Step 6) Go back
  • Step 7) If titles are there Start from Step 3 Else click next and got next page and start from Step 1.

I have been able to get the titles and even get to the point where it clicks my title. But when it goes back, it is giving me an error at this line: data['title'].append(title.text) in the step 3 after clicking and coming back to the page once. And returns with an error message stating: "Message: stale element reference: element is not attached to the page document"

Not been able to debug this issue, as I am fairly new to python. Any help will be appreciated.

Here is the code:

for i in range(0,3):
    titles = []
    titles = browser.find_elements_by_css_selector(".title.may-blank")
    for title in titles:
        i = i+1
        try:
            data['title'].append(title.text)
        except KeyError:
            data['title'] = [title.text]
        title.click()
        description = browser.find_element_by_css_selector(".usertext-body.may-blank-within.md-container")
        print description.text
        browser.execute_script("window.history.go(-1)")
    button = browser.find_element_by_class_name("next-button")
    button.click()
print data['title']

1 Answer 1

1

You need to find the element(s) each time you navigate away to another page.

I would change the flow of your code a little. Instead of clicking on the title, try getting href attributes, and navigate to those urls.

Also I assume you're doing this via Selenium to practice your skills. If not, I recommend Reddit API.

Sign up to request clarification or add additional context in comments.

3 Comments

This did help me. And yes you are right, for now I want to stick to selenium. Now the problem is the page has similar classes and I wish to select one of those. My research till yet shows it can be done with xpath using indexes. But I am unable to. Could you help me with this?
Ohkay, I guess problem has been resolved, I was on the correct path using xpath and index, but syntactically it was incorrect. Thanks man. You resolved my problem!!!
As I wanted to use the second sibling with the class name "md" I resolved it using this browser.find_element_by_xpath("(//div[@class='md'])[2]"). Just in case, if someone happens to land up on such a problem. Thanks again @Michal K.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.