I'm writing a web scraping bot for the site AutoTrader, a popular car trading site in the UK and I'm trying to do as much as I can on my own, but I'm stuck as to how to get my script to do what I want.
Basically I want the bot to download certain information on the first 100 pages of listings for every car make and model, within a particular radius to my home. I also want the bot to stop trying to download the next pages of a particular brand/model car if there are no more new listings.
For instance if there are only 4 pages of listings and I ask it to download the listings on page 5, the web URL will automatically change to page 1, and the bot will download all the listings on page 1, then it would repeat this process for the next pages all the way up to page 100. Obviously I don't want 96 repeats of the cars on page 1 in my data set so I'd like to move onto the next model of car when this happens, but I haven't figured out a way to do that yet.
Here's what I have got so far:
for x in range(1, 101):
makes = ["ABARTH", "AC", "AIXAM", "ARIEL", "ASTON%20MARTIN", "AUDI"]
for make in makes:
my_url_page_x_make_i = 'https://www.autotrader.co.uk/car-search?' + 'sort=distance' + '&postcode=BS247EY' + '&radius=300' + '&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New' + '&make=' + make + '&page=' + str(x)
uClient = uReq(my_url_page_x_make_i)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
listings = page_soup.findAll("li", {"class": "search-page__result"})
for listing in listings:
information_container = listing.find("div", {"class": "information-container"})
title_container = information_container.find("a", {
"class": "js-click-handler listing-fpa-link tracking-standard-link"})
title = title_container.text
price = listing.find("div", {"class": "vehicle-price"}).text
print("title: " + title)
print("price: " + price)
f.write(title.replace(",", "") + "," + price.replace(",", "") + "\n")
if len(listings) < 13: makes.remove(make)
f.close()
This is far from a finished script and I only have about 1 week of real Python coding experience.
breakin for x... loop after the condition you have no pages found.urllib?