I am currently pooling this function to check multiple urls. It reads an html page into a string and matches a progress percentage of a file transfer like this:
def check(server):
logging.info('Fetching {0}.'.format(server))
# Open page
response = urllib2.urlopen("http://"+server+"/avicapture.html")
tall = response.read() # puts the data into a string
html = tall.rstrip()
# Grab progress percentage.
match = re.search('.*In Progress \((.*)%\).*', html)
and then on this match, return the percentage number in a string to the parent process.
if match:
global temp
global results
temp = match.group(1)
results = temp
servers[server] = temp
if int(temp) >= 98 and int(temp) <= 99:
abort(server)
alertmail(temp, server)
rem = str(server)
complete(rem)
logging.info('{0} completed.'.format(server))
return str(temp)
Sometimes it will not say "In Progress" and have a percentage, however. It will say "Transfer Aborted" or "Ready". How would I structure this so it returns whichever it finds, In Progress (percentage), Transfer Aborted, or Ready?
Edit: I forgot to mention that I need it to match the most recent file transfer, based off End Time. (See: http://www.whatdoiknow.net/dump/avicapture_full.html#status )
Partial solution:
match = re.search('.*In Progress \((.*)%\).*', html)
match2 = re.search('.*Ready.*', html)
match3 = re.search('.*Transfer Aborted.*', html)
if match:
global temp
temp = match.group(1)
if int(temp) >= 98 and int(temp) <= 99:
logging.info('{0} completed.'.format(server))
return str(temp)
elif match2:
temp = "Ready"
logging.info('{0} is ready.'.format(server))
return str(temp)
elif match3:
temp = "Transfer Aborted"
logging.info('{0} was Aborted.'.format(server))
return str(temp)
This does not address my need for the identification of the most recent transfer, however..