0

I can download things from my controlled server in one way - by passing the document ID into a link like so :

https://website/deployLink/442/document/download/$NUMBER

If I navigate to this in my browser, it downloads the file with ID $NUMBER.

The problem is, I have 9,000 files on my server, which is SSL encrypted and usually requires signing in with a username/password on a dialog box popup which appears on the web-page.

I posted a similar thread to this already, where I downloaded the files via WGET. Now I would like to try and use Python, and I'd like to provide the username/password and get through the SSL encryption.

Here is my attempt to grab one file, which results in a 401 error. Full stacktrace below.

import urllib2
import ctypes
from HTMLParser import HTMLParser

# create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()

# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(handler)

# Install the opener.
# Now all calls to urllib2.urlopen use our opener.
urllib2.install_opener(opener)

# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
html = response.read()

class MyHTMLParser(HTMLParser):

url=''https://website/deployLink/442/document/download/1')'


# Save the file
webpage = urllib2.urlopen(url)
with open('Test.doc','wb') as localFile:
     localFile.write(webpage.read())

What have I done incorrectly here? Is what I am attempting possible?

C:\Python27\python.exe C:/Users/ADMIN/PycharmProjects/GetFile.py
Traceback (most recent call last):
  File "C:/Users/ADMIN/PycharmProjects/GetFile.py", line 22, in <module>
    response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 437, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Processed

Process finished with exit code 1

Here's my authent page with some info removed for privacy :

Image

Authent url ends in :443.

1 Answer 1

1

Assuming your code above is accurate, then I think your problem is related to the URIs in your add_password method. You have this when setting up the username/password:

# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)

And then your subsequent request goes to this URI:

# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')

(I'm assuming they've been "scrubbed" incorrectly, and they should be the same, and just move on. See: "website" vs. "website.com")

The second URI is not a child of the first URI based on their respective path portions. The URI path /deployLink/442/document/download/1 is not a child of /home.html. From the perspective of the library, you'd have no auth data for the second URI.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.