Logging into a website and retrieving HTML with Python

Question

I need to log into a website to access its html on a login-protected page for a project I'm doing.

I'm using this person's answer with the values I need:

from twill.commands import *
go('https://example.com/login')

fv("3", "email", "[email protected]")
fv("3", "password", "mypassword")

submit()

Assumedly this should log me in so I then run:

sock = urllib.urlopen("https://www.example.com/activities")
html_source = sock.read()
sock.close()
print html_source

Which I thought would print the html of the (now) accessible page but instead just gives me the html of the login page. I've tried other methods (e.g. with mechanize) but I get the identical result.

What am I missing? Do some sites restrict this type of login or does it not work with https or something? (The site is FitBit, since I couldn't use the url in the question)

Did you try with "example.com/activities" (without the "www") using urllib? I have encountered problems when I intermingle non-"www" and "www" URLs... — Janaka Bandara
– Janaka Bandara, Commented Oct 4, 2014 at 5:13

Community · Accepted Answer · 2017-05-23 12:21:42Z

2

You're using one library to log in and another to then retrieve the subsequent page. twill and urllib are not sharing data about your sessions. (Similar issue to this one.) If you do that, then you need to manage the session cookie / authentication yourself. Specifically, you'll need to copy the cookie + data and add that to the post-login request in the other library.

Otherwise, and more logically, use the same one for both the login and post-login requests.

edited May 23, 2017 at 12:21

CommunityBot

11 silver badge

answered Oct 4, 2014 at 5:25

aneroid

16.7k3 gold badges42 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

doxyl Over a year ago

Brilliant, thank you. I just added a go('https://example.com/activities') and save_html('textfile.txt') and it works a charm.

Collectives™ on Stack Overflow

Logging into a website and retrieving HTML with Python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related