Is it advisable to speed up scraping using selenium by starting multiple webdrivers?

Question

I have over 19,000 links which I need to visit to scrape data from. Each takes about 5 seconds to fully load, which means that I will need slightly more than 26 hours to scrape everything on a single webdriver.

To me, it seems that a solution is simply to start another webdriver (or few others) in a separate python notebook which goes through another portion of the links in parallel. i.e:

In first iPython notebook:

from selenium import webdriver
driver1 = webdriver.Firefox()
... scraping code looping over links 0-9500 using driver1...

In second iPython notebook:

from selenium import webdriver
driver2 = webdriver.Firefox()
... scraping code looping over links 9501-19000 using driver2...

I'm fairly new to scraping so this question may be completely elementary/ridiculous(?). However, I've tried searching for this and haven't seen anything on the topic, so I would appreciate any advice on this matter. Or any recommendations for a better/more correct way to implement this.

I've heard of multi-threading using the thread module (http://www.tutorialspoint.com/python/python_multithreading.htm), but wonder whether implementing it in this manner would have any advantage over simply creating multiple webdrivers as in the aforementioned code.

Read the answer on this question stackoverflow.com/questions/39036137/… — parik
– parik, Commented Jun 19, 2017 at 15:00

RedVelvet · Accepted Answer · 2015-12-16 15:03:11Z

1

You really need to use Selenium in order to do this? Check Scrapy with this framework you can easily send a lots of request and scrape data. Selenium is useful to get browser automation.

answered Dec 16, 2015 at 15:03

RedVelvet

1,9533 gold badges17 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

runawaykid Over a year ago

Thanks - I have read the book, and was advised to use Selenium because the pages I am seeking to get data from have a lot of javascript which requires processing through a client-side browser.

Collectives™ on Stack Overflow

Is it advisable to speed up scraping using selenium by starting multiple webdrivers?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related