python 3 - HTTP proxy issue

Question

I'm using python 3.3.0 in Windows 7.

I made this script to bypass http proxy without authentication on a system. But when I execute, it gives the error:UnicodeEncodeError: 'charmap' codec can't encode characters in position 6242-6243: character maps to <undefined> It seems that it fails to decode unicode characters into a string.

So, what should I use or edit/do? Do anybody have any clue or solution?

my .py contains following:

import sys, urllib
import urllib.request

url = "http://www.python.org"
proxies = {'http': 'http://199.91.174.6:3128/'}

opener = urllib.request.FancyURLopener(proxies)

try:
    f = urllib.request.urlopen(url)
except urllib.error.HTTPError as  e:
    print ("[!] The connection could not be established.")
    print ("[!] Error code: ",  e.code)
    sys.exit(1)
except urllib.error.URLError as  e:
    print ("[!] The connection could not be established.")
    print ("[!] Reason: ",  e.reason)
    sys.exit(1)

source = f.read()

if "iso-8859-1" in str(source):
    source = source.decode('iso-8859-1')
else:
    source = source.decode('utf-8')

print("\n SOURCE:\n",source)

You just published an IP of an open proxy. If this machine is yours I'd strongly suggest securing it properly. — t-8ch
– t-8ch, Commented Mar 3, 2013 at 18:06
yeah, it's an open proxy. Advice me more about this also. Thanks. — magneto
– magneto, Commented Mar 7, 2013 at 4:00
If you are the owner of this proxy, or know the owner: Use authentication, if you don't know who owns it: I would stop using it. — t-8ch
– t-8ch, Commented Mar 8, 2013 at 15:32

t-8ch · Accepted Answer · 2013-03-08 15:45:29Z

2

This code doesn't even use your proxy
This form of encoding detection is really weak. You should only look for the declared encoding in the well defined locations: HTTP header 'Content-Type' and if the response is HTML in the charset meta-tag.
As you didn't include a stacktrace I assume the error happended in the line if "iso-8859-1" in str(source):. The call to str() decodes the bytes data using your systems default encoding (sys.getdefaultencoding()). If you really want to keep this check (see point 2) you should do if b"iso-8859-1" in source: This works on bytes instead of strings so no decoding has to be done beforehand.

Note: This code works fine for me, presumably because my system uses a default encoding of utf-8 while your windows system uses something different.

Update: I recommend using python-requests when doing http in python.

import requests

proxies = {'http': your_proxy_here}

with requests.Session(proxies=proxies) as sess:
    r = sess.get('http://httpbin.org/ip')
    print(r.apparent_encoding)
    print(r.text)
    # more requests

Note: this doesn't use the encoding specified in the HTML, you would need a HTML parser like beautifulsoup to extract that.

edited Mar 8, 2013 at 15:45

answered Mar 3, 2013 at 18:50

t-8ch

2,76317 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

magneto Over a year ago

sorry for late reply from me. I was out of town. Thanks for detailed answer. Please help me to sort out all the points you have mentioned. Please give me code/example, so I can have better idea.

magneto Over a year ago

My system also has utf-8 default encoding. Would please tell me why this code will not use proxy? Because I have seen this code in python documents itself!

magneto Over a year ago

Hmm, I tried this: b"iso-8859-1" in source: But it's also not working!

Collectives™ on Stack Overflow

python 3 - HTTP proxy issue

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related