1

I need to log into a website to access its html on a login-protected page for a project I'm doing.

I'm using this person's answer with the values I need:

from twill.commands import *
go('https://example.com/login')

fv("3", "email", "[email protected]")
fv("3", "password", "mypassword")

submit()

Assumedly this should log me in so I then run:

sock = urllib.urlopen("https://www.example.com/activities")
html_source = sock.read()
sock.close()
print html_source

Which I thought would print the html of the (now) accessible page but instead just gives me the html of the login page. I've tried other methods (e.g. with mechanize) but I get the identical result.

What am I missing? Do some sites restrict this type of login or does it not work with https or something? (The site is FitBit, since I couldn't use the url in the question)

1
  • Did you try with "example.com/activities" (without the "www") using urllib? I have encountered problems when I intermingle non-"www" and "www" URLs... Commented Oct 4, 2014 at 5:13

1 Answer 1

2

You're using one library to log in and another to then retrieve the subsequent page. twill and urllib are not sharing data about your sessions. (Similar issue to this one.) If you do that, then you need to manage the session cookie / authentication yourself. Specifically, you'll need to copy the cookie + data and add that to the post-login request in the other library.

Otherwise, and more logically, use the same one for both the login and post-login requests.

Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant, thank you. I just added a go('https://example.com/activities') and save_html('textfile.txt') and it works a charm.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.