How to request multiple url at one time using urllib in python

Question

I'm programing a program for downloading images from internet and I would like to speed it up using multiple requests at once.

So I wrote a code you can see here at GitHub.

I can request for webpage only like this:

def myrequest(url):
    worked = False
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    while not worked:
        try:
            webpage_read = urlopen(req).read()
            worked = True
        except:
            print("failed to connect to \n{}".format(url))
    return(webpage_read)

url = "http://www.mangahere.co/manga/mysterious_girlfriend_x"
webpage_read = myrequest(url).decode("utf-8")

The while is here because I definitely want to download every single picture, so I'm trying until it work (nothing can go wrong except urllib.error.HTTPError: HTTP Error 504: Gateway Time-out)

My question is, how to run that multiple times at once?

My idea is to have " a comander" which will run 5 (or 85) pythonic scripts, give each url and get webpage from them once they are finished, but this is definitely a silly solution :)

EDIT: I used _thread but it doesn't seem to speed up the program. That should have been the solution am I doing it wrong? that is my new question. You can use link do get to my code on GitHub

def thrue_thread_download_pics(path, url, ep, name):
    lock.acquire()
    global goal
    goal += 1
    lock.release()
    webpage_read = myrequest("{}/{}.html".format(url, ep))
    url_to_pic = webpage_read.decode("utf-8").split('" onerror="')[0].split('<img src="')[-1]

    pic = myrequest(url_to_pic)

    myfile = open("{}/pics/{}.jpg".format(path, name), "wb")
    myfile.write(pic)
    myfile.close()
    global finished
    finished += 1

and I'm using it here:

for url_ep in urls_eps:

    url, maxep = url_ep.split()
    maxep = int(maxep)
    chap = url.split("/")[-1][2:]
    if "." in chap:
        chap = chap.replace(".", "")
    else:
        chap = "{}0".format(chap)

    for ep in range(1, maxep + 1):
        ted = time.time()
        name = "{}{}".format(chap, "{}{}".format((2 - len(str(ep))) * "0", ep))
        if name in downloaded:
            continue

        _thread.start_new_thread(thrue_thread_download_pics, (path, url, ep, name))

checker = -1
while finished != goal:
    if finished != checker:
        checker = finished
        print("{} of {} downloaded".format(finished, goal))
    time.sleep(0.1)

Oh, you mean you want non-blocking calls? Look into e.g. aiohttp rather than urllib. — jonrsharpe
– jonrsharpe, Commented Jan 12, 2016 at 22:53
Use threads or if you really want to build a scalable solution take a look at the gevent library (gevent.org). It is based on co-routines, but it hides them with a threading-like API on top, which makes it very simple to make web requests in a scalable way. — Joppe
– Joppe, Commented Jan 12, 2016 at 22:59

tmajest · Accepted Answer · 2016-01-13 01:18:40Z

4

Requests Futures is built on top of the very popular requests library and uses non-blocking IO:

from requests_futures.sessions import FuturesSession

session = FuturesSession()

# These requests will run at the same time
future_one = session.get('http://httpbin.org/get')
future_two = session.get('http://httpbin.org/get?foo=bar')

# Get the first result
response_one = future_one.result()
print(response_one.status_code)
print(response_one.text)

# Get the second result
response_two = future_two.result()
print(response_two.status_code)
print(response_two.text)

answered Jan 13, 2016 at 1:18

tmajest

3701 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Brambor Over a year ago

I have no time, so I will check this in two weeks or so (If I would get ino the problem would end up sitting at computer even in 2 a.m. :)) BUT the overlaping is solved more or less, now the problem is why it is not faster. The problem seem to be in requesting multiple pics from one server so I'm using the maximum of our connection.

Collectives™ on Stack Overflow

How to request multiple url at one time using urllib in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related