3

I use twill to navigate on a website protected by a login form.

from twill.commands import *

go('http://www.example.com/login/index.php') 
fv("login_form", "identifiant", "login")
fv("login_form", "password", "pass")
formaction("login_form", "http://www.example.com/login/control.php")
submit()
go('http://www.example.com/accueil/index.php')

On this last page I want to download an Excel file which is accessible through a div with the following attribute:

onclick="OpenWindowFull('../util/exports/control.php?action=export','export',200,100);"

With twill I am able to access the URL of the PHP script and show the content of the file.

go('http://www.example.com/util/exports/control.php?action=export')
show()

However a string is returned corresponding to the raw content: thus unusable. Is there a way to retrieve directly the Excel file in a way similar to urllib.urlretrieve()?

4
  • Looks like similar to stackoverflow.com/questions/16283799/… Commented Jun 19, 2016 at 19:03
  • Not exactly: in this case the access to the website is protected by a password. I need to post a login form. Thus using twill. (I would prefer to use requests but there seems to be an intricate control of login headers and after many attempts I could only make it work with twill). Commented Jun 19, 2016 at 19:14
  • EDIT: I edited my question: the file is in MS Excel format, not CSV, so binary data... Commented Jun 19, 2016 at 19:33
  • If you can show or read the content it means you can store it on your end in whatever format you read it - you can use StringIO docs.python.org/2/library/stringio.html or similar as an intermediary storage for whatever you read and then convert it to csv . Commented Jun 19, 2016 at 19:44

2 Answers 2

1

I managed to do it sending the cookie jar from twill to requests.

Nota: I could not use requests only due to an intricate control at login (was not able to figure out the correct headers or other options).

import requests
from twill.commands import *

# showing login form with twill
go('http://www.example.com/login/index.php') 
showforms()

# posting login form with twill
fv("login_form", "identifiant", "login")
fv("login_form", "password", "pass")
formaction("login_form", "http://www.example.com/login/control.php")
submit()

# getting binary content with requests using twill cookie jar
cookies = requests.utils.dict_from_cookiejar(get_browser()._session.cookies)
url = 'http://www.example.com/util/exports/control.php?action=export'

with open('out.xls', 'wb') as handle:
    response = requests.get(url, stream=True, cookies=cookies)

    if not response.ok:
        raise Exception('Could not get file from ' + url)

    for block in response.iter_content(1024):
        handle.write(block)
Sign up to request clarification or add additional context in comments.

2 Comments

where do you get the get_browser() from?
Used to be here github.com/twill-tools/twill/blob/… but it was refactored since this post. Don't know about the current API.
0

Another way using twill.commands.save_html modified to write as 'wb' instead of 'w': Python 2.7 using twill, saving downloaded file properly

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.