0

I was trying to get a web page, but got into this problem. I've looked up some references, and this is what I've done so far:

import sys
import urllib2
from bs4 import BeautifulSoup

user = 'myuserID'
password = "mypassword"

ip = sys.argv[1]
url = "http://www.websites.com/" + ip

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
handler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)

header = {
    'Connection' : 'keep-alive',
    'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0',
    'Accept-Language' : 'en-US,en;q=0.5',
    'Accept-Encoding' : 'gzip, deflate'
    }
html = urllib2.urlopen(urllib2.Request(url, None, header))
soup = BeautifulSoup(html, 'html.parser')
# some if else function afterwards #

When I try to run the script, it shows this kind of error:

python checker.py 8.8.8.8
Traceback (most recent call last):
  File "checker.py", line 34, in <module>
    html = urllib2.urlopen(urllib2.Request(url, None, header))
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 437, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 469, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 656, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 437, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: authenticationrequired

But if I opened the page or other web page, and manually enter my credential, this script works fine after that. Am I missing something?

Just to add, my current network are using McAfee web gateway device. So sometimes we need to enter our credential to proceed browsing the net. Our user/pass are integrated with Active Directory. Is that may cause the issue?

7
  • Does websites.com require authentication? Commented Mar 15, 2016 at 7:48
  • By how you name it it seems the security is some form of .htaccess instead of a real basicauth system. I don't think those two are compatible. That's practically what the error is indirectly saying. Commented Mar 15, 2016 at 8:00
  • @LutzHorn, as far that I know, that website does not require any authentication. Commented Mar 15, 2016 at 8:09
  • Well, you get a HTTP Error 401. This indicates that the URL does require authentication. Since you don't tell us the real URL you try, we can not help you here. Commented Mar 15, 2016 at 8:12
  • 1
    @Allendar I just remember that my network are using McAfee web gateway device that use Active Directory to authenticate. Is that may cause the issues? Commented Mar 15, 2016 at 8:31

1 Answer 1

4

This seems to work really well (taken from another thread)

import urllib2
import base64
import sys

user = 'myuserID'
password = "mypassword"
ip = sys.argv[1]
url = "http://www.websites.com/" + ip
request = urllib2.Request(url)
base64string = base64.encodestring('%s:%s' % (user, password)).replace('\n', '')
request.add_header("Authorization", "Basic %s" % base64string)   
result = urllib2.urlopen(request)

Or you may use requests:

from requests.auth import HTTPBasicAuth

user = 'myuserID'
password = "mypassword"
ip = sys.argv[1]
url = "http://www.websites.com/" + ip
res=requests.get(url , auth=HTTPBasicAuth(user, password))
print res.text
Sign up to request clarification or add additional context in comments.

2 Comments

the first code works like a charm. I replace my code with yours and it works fine. Thanks Alexey! :)
I had 3 different URLs I was calling with the same code as in the question. 2 gave me this error, one didn't. I switch to this code and all 3 work. I'd like to know why the other code didn't work though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.