1

My program takes an user input and searches it through a particular webpage . Further i want it to go and click on a particular link and then download the file present there .

Example :

  1. The webpage : http://www.rcsb.org/pdb/home/home.do
  2. The search Word :"1AW0"
  3. after you search the word on the website it takes you to : http://www.rcsb.org/pdb/explore/explore.do?structureId=1AW0

I want the program to go on the right hand side of the webpage and download the pdb file from the DOWNLOAD FILES option

I have managed to write a program using the mechanize module to automatically search the word however unable to find a way i can click on a link

my code :

import urllib2
import re
import mechanize

br = mechanize.Browser()
br.open("http://www.rcsb.org/pdb/home/home.do")
## name of the form that holds the search text area 
br.select_form("headerQueryForm")

## "q" name of the teaxtarea in the html script
br["q"] = str("1AW0")
response = br.submit()
print response.read() 

any help or any suggestions would help .

Btw i am intermediate programmer in Python and I am trying to learn the Jython module to try make this work .

Thanks in advance

1
  • 1
    If it is only about downloading the pdb file for a given protein, why don't you just use a http.client (or httplib) to download rcsb.org/pdb/download/…. (Hover over this link to see it completely) Apparently all download links look exactely the same. Commented Dec 9, 2012 at 6:30

1 Answer 1

1

Here's how I would have done it:

'''
Created on Dec 9, 2012

@author: Daniel Ng
'''

import urllib

def fetch_structure(structureid, filetype='pdb'):
  download_url = 'http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=%s&compression=NO&structureId=%s'
  filetypes = ['pdb','cif','xml']
  if (filetype not in filetypes):
    print "Invalid filetype...", filetype
  else:
    try:
      urllib.urlretrieve(download_url % (filetype,structureid), '%s.%s' % (structureid,filetype))
    except Exception, e:
      print "Download failed...", e
    else:
      print "Saved to", '%s.%s' % (structureid,filetype)

if __name__ == "__main__":
  fetch_structure('1AW0')
  fetch_structure('1AW0', filetype='xml')
  fetch_structure('1AW0', filetype='png')

Which provides this output:

Saved to 1AW0.pdb
Saved to 1AW0.xml
Invalid filetype... png

Along with the 2 files 1AW0.pdb and 1AW0.xml which are saved to the script directory (for this example).

http://docs.python.org/2/library/urllib.html#urllib.urlretrieve

Sign up to request clarification or add additional context in comments.

4 Comments

How can i save this file and retrieve without actually giving it hardcoded location , i mean what if i have to run this program on someone elses computer and had to retrieve the file and compute on it .
Not sure I understand... Are you asking how to change the location they are downloaded to?
Yes , if i had to ask for a input and get the file downloaded i would assign it to a variable x = "1AW0" . and use it as str(x) to get the download . the same will be done by the user . but the file will be downloaded on the users PC . i need access to the file since the futher part of the program will have me computing that file .. How can i achieve that ??
Well, you can just add it in as another variable and ask the user. The urlretrieve function takes the file name as the second argument. Simply append a path to the beginning and it will save the file there instead of the directory of the script.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.