Running GET with SSL and authentication in Python

Question

I can download things from my controlled server in one way - by passing the document ID into a link like so :

https://website/deployLink/442/document/download/$NUMBER

If I navigate to this in my browser, it downloads the file with ID $NUMBER.

The problem is, I have 9,000 files on my server, which is SSL encrypted and usually requires signing in with a username/password on a dialog box popup which appears on the web-page.

I posted a similar thread to this already, where I downloaded the files via WGET. Now I would like to try and use Python, and I'd like to provide the username/password and get through the SSL encryption.

Here is my attempt to grab one file, which results in a 401 error. Full stacktrace below.

import urllib2
import ctypes
from HTMLParser import HTMLParser

# create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()

# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(handler)

# Install the opener.
# Now all calls to urllib2.urlopen use our opener.
urllib2.install_opener(opener)

# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
html = response.read()

class MyHTMLParser(HTMLParser):

url=''https://website/deployLink/442/document/download/1')'


# Save the file
webpage = urllib2.urlopen(url)
with open('Test.doc','wb') as localFile:
     localFile.write(webpage.read())

What have I done incorrectly here? Is what I am attempting possible?

C:\Python27\python.exe C:/Users/ADMIN/PycharmProjects/GetFile.py
Traceback (most recent call last):
  File "C:/Users/ADMIN/PycharmProjects/GetFile.py", line 22, in <module>
    response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 437, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Processed

Process finished with exit code 1

Here's my authent page with some info removed for privacy :

Authent url ends in :443.

benschumacher · Accepted Answer · 2015-04-09 05:20:02Z

Assuming your code above is accurate, then I think your problem is related to the URIs in your add_password method. You have this when setting up the username/password:

# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)

And then your subsequent request goes to this URI:

# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')

(I'm assuming they've been "scrubbed" incorrectly, and they should be the same, and just move on. See: "website" vs. "website.com")

The second URI is not a child of the first URI based on their respective path portions. The URI path /deployLink/442/document/download/1 is not a child of /home.html. From the perspective of the library, you'd have no auth data for the second URI.

Collectives™ on Stack Overflow

Running GET with SSL and authentication in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related