0

I am currently pooling this function to check multiple urls. It reads an html page into a string and matches a progress percentage of a file transfer like this:

def check(server):
    logging.info('Fetching {0}.'.format(server))
    # Open page
    response = urllib2.urlopen("http://"+server+"/avicapture.html")
    tall = response.read() # puts the data into a string
    html = tall.rstrip()
    # Grab progress percentage.
    match = re.search('.*In Progress \((.*)%\).*', html)

and then on this match, return the percentage number in a string to the parent process.

    if match:
        global temp
        global results
        temp = match.group(1)
        results = temp
        servers[server] = temp
        if int(temp) >= 98 and int(temp) <= 99:
            abort(server)
            alertmail(temp, server)
            rem = str(server)
            complete(rem)
            logging.info('{0} completed.'.format(server))
        return str(temp)

Sometimes it will not say "In Progress" and have a percentage, however. It will say "Transfer Aborted" or "Ready". How would I structure this so it returns whichever it finds, In Progress (percentage), Transfer Aborted, or Ready?

Edit: I forgot to mention that I need it to match the most recent file transfer, based off End Time. (See: http://www.whatdoiknow.net/dump/avicapture_full.html#status )

Partial solution:

    match = re.search('.*In Progress \((.*)%\).*', html)
    match2 = re.search('.*Ready.*', html)
    match3 = re.search('.*Transfer Aborted.*', html)
    if match:
        global temp
        temp = match.group(1)
        if int(temp) >= 98 and int(temp) <= 99:
            logging.info('{0} completed.'.format(server))
        return str(temp)
    elif match2:
        temp = "Ready"
        logging.info('{0} is ready.'.format(server))
        return str(temp)
    elif match3:
        temp = "Transfer Aborted"
        logging.info('{0} was Aborted.'.format(server))
        return str(temp)

This does not address my need for the identification of the most recent transfer, however..

2
  • Can you provide actual example data? Commented Jul 31, 2014 at 15:31
  • Absolutely. This is the page the function grabs. We are looking at the 'Avi File Status' portion. I basically need it to identify the most recent transfer by date, and then check to see if the 'Upload Status' column for that transfer has either of those three strings, and return either the percentage number like I have above, or 'Transfer Aborted', or 'Ready' whatdoiknow.net/dump/avicapture_full.html Commented Jul 31, 2014 at 15:37

1 Answer 1

1

You just need to use | in regex:

match = re.search(r"(In Progress \((.*)%\)|Transfer Aborted|Ready)", html)

With this match.group(1) will contain all matches (either In Progress (00%), Transfer Aborted or Ready, while match.group(2) will have number 00 (00 is a placeholder) on None in second and third case.

UPDATE 1: about need to get most recent line. This http://www.whatdoiknow.net/dump/avicapture.html page is rather simple html, so my propose is to use some html parsing tool (I recommend beautifulsoup4, docs: http://www.crummy.com/software/BeautifulSoup/bs4/doc/) to parse it to tree, then find first row in table with N/A, get row before and apply re to its last column.

UPDATE 2: now that I think about it, there is probably no need to parse html. You can use re.findall (or re.finditer) to get list list of matched tuples of strings (match objects) and just get last item from it.

UPDATE 3: Update 1 and Update 2 came in assumption, that table is sorted by date. If not, then you'll need to include date pattern in regex and get match with max date from matches.

Sign up to request clarification or add additional context in comments.

2 Comments

I had not thought of this approach, despite being so simple. Unfortunately as it is, this returned 'Ready' regardless of what the page actually contained. The transfer said In Progress (7%), and it still returned Ready.
I forgot to mention this in my question. I need it to only match the line of the most recent transfer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.