How to access webpages using Python via a proxy [duplicate]

Question

I am writing a small program to fetch all hyperlinks from a webpage by providing a URL, but it seem like the network I am in is using proxy and it is not able to fetch .. My code:

import sys
import urllib
import urlparse

from bs4 import BeautifulSoup
def process(url):
    page = urllib.urlopen(url) 
    text = page.read()
    page.close()
    soup = BeautifulSoup(text) 
    with open('s.txt','w') as file:
        for tag in soup.findAll('a', href=True):
            tag['href'] = urlparse.urljoin(url, tag['href'])
            print tag['href']
            file.write('\n')
            file.write(tag['href'])


def main():
    if len(sys.argv) == 1:
        print 'No url !!'
        sys.exit(1)
    for url in sys.argv[1:]:
        process(url)

Based on your question your network may or may not have a proxy in use. Can you be a little more specific or just pass by your admins and ask? — frlan
– frlan, Commented Sep 22, 2015 at 8:59
yes , it have a proxy ,i tried at home it was working fine but when i took it to my Department to show to my teacher it dint work ...this is the error IOError: [Errno socket error] [Errno -2] Name or service not known — Shailang Kharsati
– Shailang Kharsati, Commented Sep 22, 2015 at 11:05
this is the proxy i used too connect "proxy4.nehu.ac.in:3128" how do i put it in codes in my program ..? please help , i am so stuck with it . — Shailang Kharsati
– Shailang Kharsati, Commented Sep 22, 2015 at 11:22
ok i will check on this and i will come back to you if i encounter some problem ..at this moment i cannot test it because i have to try it at the University itself since i dont have proxy network to test . If it ok with you? — Shailang Kharsati
– Shailang Kharsati, Commented Sep 22, 2015 at 11:56
you can easily set up a proxy on your own. E.g. squid is quiet popular. — frlan
– frlan, Commented Sep 22, 2015 at 11:57

blueteeth · Accepted Answer · 2021-02-21 02:31:39Z

3

You could use the requests module instead.

import requests

proxies = { 'http': 'http://host/' } 
# or if it requires authentication 'http://user:pass@host/' instead

r = requests.get(url, proxies=proxies)
text = r.text

edited Feb 21, 2021 at 2:31

answered Sep 22, 2015 at 11:49

blueteeth

3,5851 gold badge15 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Shailang Kharsati Over a year ago

should i put it this way proxies = { 'http': 'http://proxya4.nehu.ac.in }

blueteeth Over a year ago

You need the port and closing quote. So it would be proxies = { 'http': 'http://proxya4.nehu.ac.in:3128' }

Shailang Kharsati Over a year ago

Can i come back to you later i will try first an let u know how it goes?..i really want this to work ..im like crying inside so bad.

Shailang Kharsati Over a year ago

Hi, i tried your suggestion i got 'response 200' when i print r=requests.get("http://www.dota2.com",proxies=poxies) what does it means.

blueteeth Over a year ago

200 is the status code for the response. It is saying the response was ok. [1] To get the html from the page, you need to print r.text. [1]: w3.org/Protocols/rfc2616/rfc2616-sec10.html

|

Community · Accepted Answer · 2017-05-23 12:09:11Z

1

The urllib library you are using for HTTP access does not support proxy authentication (it does support un-authenticated proxies). From the docs:

Proxies which require authentication for use are not currently supported; this is considered an implementation limitation.

I suggest you switch to urllib2 and use it as demonstrated in the answer to this post.

edited May 23, 2017 at 12:09

CommunityBot

11 silver badge

answered Sep 22, 2015 at 8:56

shevron

3,6733 gold badges26 silver badges39 bronze badges

2 Comments

Shailang Kharsati Over a year ago

I am new to python so its hard for me to implement , just for the head start can u like somehow show me how should i put it in my program ..?

Shailang Kharsati Over a year ago

i have read in the python documentation that there is a proxyHandler in urllib2 that can handle proxy , how to i put it in such a way that it will go through the proxy i used to connect to the internet.Please help

Collectives™ on Stack Overflow

How to access webpages using Python via a proxy [duplicate]

2 Answers 2

7 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

2 Comments

Linked

Related