3

how I can get input from html forms on other sites? I want it to return a dictionary such as:

form = [('name' = 'somename', 'type' = 'text', 'value':''},{' name' = 'somename', 'type' = 'submit', 'value': ' submit ').

Sorry for my English.

2
  • Are you trying to parse a HTML file (possibly returned from urllib.urlopen-ing a url), or is this some Django based thing? Commented Aug 22, 2010 at 10:27
  • 1
    I try parse forms from other sites. Commented Aug 22, 2010 at 10:29

3 Answers 3

3

you probably wont be able to retrieve form data from other users on other sites. If you wish to use a script to send data to a form, mechanize is one tool that makes this quite easy.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer, but forms unfortunately not static and each time different, therefore is necessary the full analysis. In mechanize it will be not absolutely convenient.
In that case, use lxml.html to parse the document, find form and input tags (possibly using xpath queries), and so on.
Derek, surely the forms are being generated using the <form> tag. This should be all that you need to get started. If the forms are indeterministic, no script will be able to assist you. If you mean the forms are generated by client-side JavaScript, then browser automation may help.
2

Yeah mechanize is sweet !

import mechanize

# Browser
br = mechanize.Browser()
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

# we inspect the all form element in the http://stackoverflow.com
br.open('http://stackoverflow.com')
for form in br.forms():
    print form

Comments

1

Look at mechanize, lxml.html and BeatifulSoup.

2 Comments

BeautifulSoup is discontinued. Better not to mention.
BeautifulSoup is also much slower than lxml.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.