Python selenium webdriver code performance

Question

I am scraping a webpage using Selenium in Python. I am able to locate the elements using this code:

from selenium import webdriver
import codecs

driver = webdriver.Chrome()
driver.get("url")
results_table=driver.find_elements_by_xpath('//*[@id="content"]/table[1]/tbody/tr')

Each element in results_table is in turn a set of sub-elements, with the number of sub-elements varying from element to element. My goal is to output each element, as a list or as a delimited string, into an output file. My code so far is this:

results_file=codecs.open(path+"results.txt","w","cp1252")

for element in enumerate(results_table):
    element_fields=element.find_elements_by_xpath(".//*[text()][count(*)=0]")
    element_list=[field.text for field in element_fields]
    stuff_to_write='#'.join(element_list)+"\r\n"
    results_file.write(stuff_to_write)
    #print (i)
results_file.close()
driver.quit()

This second part of code takes about 2.5 minutes on a list of ~400 elements, each with about 10 sub-elements. I get the desired output, but it is too slow. What could I do to improve the prformance ?

Using python 3.6

Download the whole page in one shot, then use something like BeautifulSoup to process it. I haven't used splinter or selenium in a while, but in splinter, <browser_object>.html will give you the page. I'm not sure what the syntax is for that in selenium, but there should be a way to grab the whole page. — GaryMBloom
– GaryMBloom, Commented Dec 6, 2017 at 7:23
I am using selenium because I need to scrapuktiple pages on a website where login is needed, and I would like to avoid logging in once for each page. BeautifulSoup is an option, but I do not know how toake it grab the active chromedriver page. And still, learning-wise, I must be doing something structurally wrong in my code — horace_vr
– horace_vr, Commented Dec 6, 2017 at 7:55
@horace_vr Does it speed up if you write to the file only once at the end, after the for loop instead of inside each iteration? — Grasshopper
– Grasshopper, Commented Dec 6, 2017 at 8:59
Selenium (and Splinter, which is layered on top of Selenium) are notoriously slow for randomly accessing web page content. Looks like driver.page_source may give the entire contents of the page in Selenium, which I found at stackoverflow.com/questions/35486374/…. If reading all the chunks on the page one at a time is killing your performance (and it probably is), reading the whole page once and processing it offline will be oodles faster. — GaryMBloom
– GaryMBloom, Commented Dec 6, 2017 at 13:31
@Gary02127 BeautifulSoup is the way to go; I tried it, based on your suggestion, and replaced the webdriver-based processing code, and instead of 2 minutes, the code is executed in a handful of seconds. If you elaborate and post an answer, I will accept it. It certainly answered my OP, although not a solution I had in mind when posting :) — horace_vr
– horace_vr, Commented Dec 6, 2017 at 21:37

GaryMBloom · Accepted Answer · 2017-12-06 21:41:02Z

1

Download the whole page in one shot, then use something like BeautifulSoup to process it. I haven't used splinter or selenium in a while, but in Splinter, .html will give you the page. I'm not sure what the syntax is for that in Selenium, but there should be a way to grab the whole page.

Selenium (and Splinter, which is layered on top of Selenium) are notoriously slow for randomly accessing web page content. Looks like .page_source may give the entire contents of the page in Selenium, which I found at stackoverflow.com/questions/35486374/…. If reading all the chunks on the page one at a time is killing your performance (and it probably is), reading the whole page once and processing it offline will be oodles faster.

answered Dec 6, 2017 at 21:41

GaryMBloom

5,7472 gold badges29 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python selenium webdriver code performance

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related