0

I want to hit a URL N number of times in python. Currently I have been doing this using webbrowser.open(), but it is very slow and consumes a lot of memory. Any more efficient method?

3
  • @EnnoShioji: it is not a duplicate. Making multiple requests to the same url in an efficient manner is a different problem. You want ab-like tool rather than mere curl. Commented Mar 1, 2014 at 11:22
  • related: Problem with multi threaded Python app and socket connections Commented Mar 1, 2014 at 11:36
  • import requests requests.get(url = "some_url") --- run this in a loop Commented Jan 5, 2020 at 14:15

4 Answers 4

4

Take a look at Urllib2.urlopen

import urllib2

for _ in range(10):
    urllib2.urlopen("http://www.stackoverflow.com")
Sign up to request clarification or add additional context in comments.

Comments

3

F.X.'s answer is almost certainly what you want.

But you asked about efficiency, and if you really want to be as efficient as possible, you can do better. The sooner you close the socket, the less you're wasting in CPU, memory, and bandwidth, both on your machine and on the web server.

Also, if you make multiple requests in parallel, while that won't save any resources on your machine (it'll actually waste some) or the server, it will probably finish faster. Is that what you're after?

Of course that raises the question of what exactly you mean by "hit a URL". Is it acceptable to just send the request and immediately shutdown? Or do you need to wait for at least the response line? For that matter, is it acceptable to make a HEAD request instead of a GET? Do you need realistic/useful headers?

Anyway, in order to do this, you'd want to drop down to a lower level. Most higher-level libraries don't give you any way to, e.g., close the socket before reading anything. But it's not that hard to craft HTTP requests.*

For example:

from contextlib import closing
from socket import create_connection
from concurrent.futures import ThreadPoolExecutor, wait

host, port = 'www.example.com', 80
path = '/path/to/resource.html'

def spam_it():
    with closing(create_connection((host, port))) as sock:
        sock.sendall('GET {} HTTP/1.0\n\n'.format(path))

with ThreadPoolExecutor(max_workers=16) as executor:
    wait(executor.submit(spam_it) for _ in range(10000))

* Well, manually crafting HTTP requests is actually quite involved… If you only need to craft a static, trivial one, do it yourself, but in general, you definitely want to use urllib, requests, or some other library.

1 Comment

+1. Though the example code won't work on Python 2 or Python 3 (use bytes literal for the data to send and mention that futures is 3rd party on Python 2)
2

Use urllib2? As a standard rule of thumb, always look in the standard library first, there are tons of useful packages there.

1 Comment

You said urllib, but linked to urllib2. Otherwise, good answer.
1
import urllib2

url = "http://www.google.com"
n = 8

for i in range(n):
  urllib.urlopen( url ).read()

You may wish to look into the requests module if you're eventually going to want to anything less trivial with HTTP requests.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.