I need to fill form values on a target page then click a button via Python. I've looked at Selenium and Windmill, but these are testing frameworks - I'm not testing. I'm trying to log into a 3rd party website programatically, then download and parse a file we need to insert into our database. The problem with the testing frameworks is that they launch instances of browsers; I just want a script I can schedule to run daily to retrieve the page I want. Any way to do this?
5 Answers
You are looking for Mechanize
Form submitting sample:
import re
from mechanize import Browser
br = Browser()
br.open("http://www.example.com/")
br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__)
response = br.submit() # submit current form
5 Comments
Habaabiai
I'm stuck using Python 2.6 though, so sadly Mechanize isn't an option either. (GopherError dropped in 2.6, looks like).
Philippe F
Mechanize doc is usually a bit terse, but it works really really great !
Philippe F
I think you should insist, try debugging the gopher problem. In python 2.6, gopher support was removed IIRC, so fixing your problem is probably about commenting some import gopherlib and the few spots where gopher is actually used.
Vinko Vrsalovic
@Habaabiai: Mechanize advertises working in 2.6, you could ask a question about your problem with it. Also, you can try urllib2 (which will force you to write more code to submit a form.)
DarkLight
It seems Mechanize does not support python 3 (yet...?), I guess it means it is not maintained anymore (it is the first FAQ at wwwsearch.sourceforge.net/mechanize/faq.html).
Have a look on this example which use Mechanize: it will give the basic idea:
#!/usr/bin/python
import re
from mechanize import Browser
br = Browser()
# Ignore robots.txt
br.set_handle_robots( False )
# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Firefox')]
# Retrieve the Google home page, saving the response
br.open( "http://google.com" )
# Select the search box and search for 'foo'
br.select_form( 'f' )
br.form[ 'q' ] = 'foo'
# Get the search results
br.submit()
# Find the link to foofighters.com; why did we run a search?
resp = None
for link in br.links():
siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
if siteMatch:
resp = br.follow_link( link )
break
# Print the site
content = resp.get_data()
print content