1

I'm programing a program for downloading images from internet and I would like to speed it up using multiple requests at once.

So I wrote a code you can see here at GitHub.

I can request for webpage only like this:

def myrequest(url):
    worked = False
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    while not worked:
        try:
            webpage_read = urlopen(req).read()
            worked = True
        except:
            print("failed to connect to \n{}".format(url))
    return(webpage_read)

url = "http://www.mangahere.co/manga/mysterious_girlfriend_x"
webpage_read = myrequest(url).decode("utf-8")

The while is here because I definitely want to download every single picture, so I'm trying until it work (nothing can go wrong except urllib.error.HTTPError: HTTP Error 504: Gateway Time-out)

My question is, how to run that multiple times at once?

My idea is to have " a comander" which will run 5 (or 85) pythonic scripts, give each url and get webpage from them once they are finished, but this is definitely a silly solution :)

EDIT: I used _thread but it doesn't seem to speed up the program. That should have been the solution am I doing it wrong? that is my new question. You can use link do get to my code on GitHub

def thrue_thread_download_pics(path, url, ep, name):
    lock.acquire()
    global goal
    goal += 1
    lock.release()
    webpage_read = myrequest("{}/{}.html".format(url, ep))
    url_to_pic = webpage_read.decode("utf-8").split('" onerror="')[0].split('<img src="')[-1]

    pic = myrequest(url_to_pic)

    myfile = open("{}/pics/{}.jpg".format(path, name), "wb")
    myfile.write(pic)
    myfile.close()
    global finished
    finished += 1

and I'm using it here:

for url_ep in urls_eps:

    url, maxep = url_ep.split()
    maxep = int(maxep)
    chap = url.split("/")[-1][2:]
    if "." in chap:
        chap = chap.replace(".", "")
    else:
        chap = "{}0".format(chap)

    for ep in range(1, maxep + 1):
        ted = time.time()
        name = "{}{}".format(chap, "{}{}".format((2 - len(str(ep))) * "0", ep))
        if name in downloaded:
            continue

        _thread.start_new_thread(thrue_thread_download_pics, (path, url, ep, name))

checker = -1
while finished != goal:
    if finished != checker:
        checker = finished
        print("{} of {} downloaded".format(finished, goal))
    time.sleep(0.1)
8
  • for url in (...):? Commented Jan 12, 2016 at 22:51
  • will it send ten request at the given time? Commented Jan 12, 2016 at 22:52
  • so no waiting until the first is finished? Commented Jan 12, 2016 at 22:52
  • Oh, you mean you want non-blocking calls? Look into e.g. aiohttp rather than urllib. Commented Jan 12, 2016 at 22:53
  • 2
    Use threads or if you really want to build a scalable solution take a look at the gevent library (gevent.org). It is based on co-routines, but it hides them with a threading-like API on top, which makes it very simple to make web requests in a scalable way. Commented Jan 12, 2016 at 22:59

1 Answer 1

4

Requests Futures is built on top of the very popular requests library and uses non-blocking IO:

from requests_futures.sessions import FuturesSession

session = FuturesSession()

# These requests will run at the same time
future_one = session.get('http://httpbin.org/get')
future_two = session.get('http://httpbin.org/get?foo=bar')

# Get the first result
response_one = future_one.result()
print(response_one.status_code)
print(response_one.text)

# Get the second result
response_two = future_two.result()
print(response_two.status_code)
print(response_two.text)
Sign up to request clarification or add additional context in comments.

1 Comment

I have no time, so I will check this in two weeks or so (If I would get ino the problem would end up sitting at computer even in 2 a.m. :)) BUT the overlaping is solved more or less, now the problem is why it is not faster. The problem seem to be in requesting multiple pics from one server so I'm using the maximum of our connection.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.