23

I have written a script in python that uses cookies and POST/GET. I also included proxy support in my script. However, when one enters a dead proxy, the script crashes. Is there any way to check if a proxy is dead/alive before running the rest of my script?

Furthermore, I noticed that some proxies don't handle cookies/POST headers properly. Is there any way to fix this?

2
  • Can't you just catch the exception? Commented Apr 19, 2009 at 12:08
  • I think catching the exception is not the best way to do it, check the comment I left in dbr answer. Could you give me your opinion? because I am planning to write a proxy checker myself (im just starting with python and this will be my second python script). Commented Aug 1, 2010 at 0:49

6 Answers 6

30

The simplest was is to simply catch the IOError exception from urllib:

try:
    urllib.urlopen(
        "http://example.com",
        proxies={'http':'http://example.com:8080'}
    )
except IOError:
    print "Connection error! (Check proxy)"
else:
    print "All was fine"

Also, from this blog post - "check status proxy address" (with some slight improvements):

for python 2

import urllib2
import socket

def is_bad_proxy(pip):    
    try:
        proxy_handler = urllib2.ProxyHandler({'http': pip})
        opener = urllib2.build_opener(proxy_handler)
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib2.install_opener(opener)
        req=urllib2.Request('http://www.example.com')  # change the URL to test here
        sock=urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print 'Error code: ', e.code
        return e.code
    except Exception, detail:
        print "ERROR:", detail
        return True
    return False

def main():
    socket.setdefaulttimeout(120)

    # two sample proxy IPs
    proxyList = ['125.76.226.9:80', '213.55.87.162:6588']

    for currentProxy in proxyList:
        if is_bad_proxy(currentProxy):
            print "Bad Proxy %s" % (currentProxy)
        else:
            print "%s is working" % (currentProxy)

if __name__ == '__main__':
    main()

for python 3

import urllib.request
import socket
import urllib.error

def is_bad_proxy(pip):    
    try:
        proxy_handler = urllib.request.ProxyHandler({'http': pip})
        opener = urllib.request.build_opener(proxy_handler)
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib.request.install_opener(opener)
        req=urllib.request.Request('http://www.example.com')  # change the URL to test here
        sock=urllib.request.urlopen(req)
    except urllib.error.HTTPError as e:
        print('Error code: ', e.code)
        return e.code
    except Exception as detail:
        print("ERROR:", detail)
        return True
    return False

def main():
    socket.setdefaulttimeout(120)

    # two sample proxy IPs
    proxyList = ['125.76.226.9:80', '25.176.126.9:80']

    for currentProxy in proxyList:
        if is_bad_proxy(currentProxy):
            print("Bad Proxy %s" % (currentProxy))
        else:
            print("%s is working" % (currentProxy))

if __name__ == '__main__':
    main() 

Remember this could double the time the script takes, if the proxy is down (as you will have to wait for two connection-timeouts).. Unless you specifically have to know the proxy is at fault, handling the IOError is far cleaner, simpler and quicker..

Sign up to request clarification or add additional context in comments.

4 Comments

But some proxies can connect to the url but they don't open the actual html from that url, they show a custom error so you can't catch an exception there, wouldn't be better to check for a string in req.read()?
What's the difference between socket.setdefaulttimeout() and the urllib parameter timeout?
@macdonjo pretty sure the urllib timeout parameter is new in Python 3. It's probably much better than the socket.setdefaulttimeout which applies globally
checking an invalid proxy with this code seems to take very long time (1 minute ++).
7

you can use ip-getter website to get the IP by which you are sending a request, then check if the IP is the same as your proxy IP or some thing else. Here is a script for that matter:

import requests

proxy_ip = "<IP>"
proxy_port = "<PORT>"
proxy_user = "<USERNAME>"
proxy_pass = "<PASSWORD>"

proxies = {
    "http": f"http://{proxy_user}:{proxy_pass}@{proxy_ip}:{proxy_port}/",
    "https": f"http://{proxy_user}:{proxy_pass}@{proxy_ip}:{proxy_port}/"
}

url = 'https://api.ipify.org'

try:
    response = requests.get(url, proxies=proxies)
    assert response.text==proxy_ip
except:
    print("Proxy does not work")

Comments

5

you can use the Proxy-checker library which is as simple as this

from proxy_checker import ProxyChecker

checker = ProxyChecker()
checker.check_proxy('<ip>:<port>')

output :

{
  "country": "United States",
  "country_code": "US",
  "protocols": [
  "socks4",
  "socks5"
  ],
  "anonymity": "Elite",
  "timeout": 1649
}

with the possibility of generating your own proxies and check them with two lines of code

Comments

1

I think that the better approach is like dbr said, handling the exception.

Another solution that could be better in some cases, is to use an external online proxy checker tool to check if a proxy server is alive and then continue using your script without any modification.

Comments

0

There is one nice package Grab So, if it ok for you, you can write something like this(simple valid proxy checker-generator):

from grab import Grab, GrabError

def get_valid_proxy(proxy_list): #format of items e.g. '128.2.198.188:3124'
    g = Grab()
    for proxy in proxy_list:
        g.setup(proxy=proxy, proxy_type='http', connect_timeout=5, timeout=5)
        try:
            g.go('google.com')
        except GrabError:
            #logging.info("Test error")
            pass
        else:
            yield proxy

1 Comment

The documentation is hardly English
0

ok, so both solution 1 and 2 individually didn't work for me. But by combining those can work perfectly. So, here is my code which worked-

def is_bad_proxy(pip):
  try:
    proxy_handler = urllib.request.ProxyHandler(proxies=pip)
    opener = urllib.request.build_opener(proxy_handler)
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    urllib.request.install_opener(opener)
    req = urllib.request.Request('https://www.example.com')  # change the URL to test here
    sock = urllib.request.urlopen(req, timeout=5)
    if 200 <= sock.getcode() < 300:
        return False
    else:
        return True
  except urllib.error.HTTPError as e:
    print('Error code: ', e.code)
    return e.code
  except urllib.error.URLError as e:
    print('Error: ', e.reason)
    return True
  except Exception as detail:
    print("ERROR:", detail)
    return True

@convert_kwargs_to_snake_case
async def my_proxy(_, info):
    proxy_ip = "<IP>"
    proxy_port = "<PORT>"
    proxy_user = "<USERNAME>"
    proxy_pass = "<PASSWORD>"

    proxies = {
    "http": f"http://{proxy_user}:{proxy_pass}@{proxy_ip}:{proxy_port}/",
    "https": f"http://{proxy_user}:{proxy_pass}@{proxy_ip}:{proxy_port}/"
}
    if is_bad_proxy(proxies):
        return "not_working"
    else:
        return "working"
    

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.