Your requests are blocking and synchronous which is why it is taking a bit of time. In simple terms, it means that the second request doesn't start, until the first one finishes.
Think of it like one conveyer belt with a bunch of boxes and you have one worker to process each box.
The worker can only process one box at a time; and he has to wait for the processing to be done before he can start processing another box (in other words, he cannot take a box from the belt, drop it somewhere to be processed, come back and pick another one).
To reduce the time it takes to processes boxes, you can:
- Reduce the time it takes to process each box.
- Make it so that multiple boxes can be processed at the same time (in other words, the worker doesn't have to wait).
- Increase the number of belts and workers and then divide the boxes between belts.
We really can't do #1 because this delay is from the network (you could reduce the timeout period, but this is not recommended).
Instead what we want to do is #2 - since the processing of one box is independent, we don't need to wait for one box to finish to start processing the next.
So we want to do the following:
- Quickly send multiple requests to a server for URLs at the same time.
- Wait for each of them to finish (independent of each other).
- Collect the results.
There are multiples ways to do this which are listed in the documentation for requests; here is an example using grequests:
import grequests
# Create a map between url and the item
url_to_item = {item.item_low_url: item for item in items}
# Create a request queue, but don't send them
rq = (grequests.head(url) for url in url_to_item.keys())
# Send requests simultaneously, and collect the results,
# and filter those that are valid
# Each item returned in the Response object, which has a request
# property that is the original request to which this is a response;
# we use that to filter out the item objects
results = [url_to_item[i.request.url]
for i in filter(lambda x: x.status_code == 200,
grequests.map(rq)))]