0

I'm working on a series of scripts that pulls URLs from a database and uses the textstat package to calculate the readability of the page based on a set of predefined calculations. The function below takes a url (from a CouchDB), calculates the defined readability scores, and then saves the scores back to the same CouchDB document.

The issue I'm having is with error handling. As an example, the Flesch Reading Ease score calculation requires a count of the total number of sentences on the page. If this returns as zero, an exception is thrown. Is there a way to catch this exception, save a note of the exception in the database, and move on to the next URL in the list? Can I do this in the function below (preferred), or will I need to edit the package itself?

I know variations of this question have been asked before. If you know of one that might answer my question, please point me in that direction. My search has been fruitless thus far. Thanks in advance.

def get_readability_data(db, url, doc_id, rank, index):
    readability_data = {}
    readability_data['url'] = url
    readability_data['rank'] = rank
    user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    headers = { 'User-Agent' : user_agent }
    try:
        req = urllib.request.Request(url)
        response = urllib.request.urlopen(req)
        content = response.read()
        readable_article = Document(content).summary()
        soup = BeautifulSoup(readable_article, "lxml")
        text = soup.body.get_text()
        try:
            readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
            readability_data['smog_index'] = textstat.smog_index(text)
            readability_data['flesch_kincaid_grade'] = textstat.flesch_kincaid_grade(text)
            readability_data['coleman_liau'] = textstat.coleman_liau_index(text)
            readability_data['automated_readability_index'] = textstat.automated_readability_index(text)
            readability_data['dale_chall_score'] = textstat.dale_chall_readability_score(text)
            readability_data['linear_write_formula'] = textstat.linsear_write_formula(text)
            readability_data['gunning_fog'] = textstat.gunning_fog(text)
            readability_data['total_words'] = textstat.lexicon_count(text)
            readability_data['difficult_words'] = textstat.difficult_words(text)
            readability_data['syllables'] = textstat.syllable_count(text)
            readability_data['sentences'] = textstat.sentence_count(text)
            readability_data['readability_consensus'] = textstat.text_standard(text)
            readability_data['readability_scores_date'] = time.strftime("%a %b %d %H:%M:%S %Y")

            # use the doc_id to make sure we're saving this in the appropriate place
            readability = json.dumps(readability_data, sort_keys=True, indent=4 * ' ')
            doc = db.get(doc_id)
            data = json.loads(readability)
            doc['search_details']['search_details'][index]['readability'] = data
            #print(doc['search_details']['search_details'][index])
            db.save(doc)
            time.sleep(.5)

        except: # catch *all* exceptions
            e = sys.exc_info()[0]
            write_to_page( "<p>---ERROR---: %s</p>" % e )

    except urllib.error.HTTPError as err:
        print(err.code)

This is the error I receive:

Error(ASL): Sentence Count is Zero, Cannot Divide
Error(ASyPW): Number of words are zero, cannot divide
Traceback (most recent call last):
  File "new_get_readability.py", line 114, in get_readability_data
    readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
  File "/Users/jrs/anaconda/lib/python3.5/site-packages/textstat/textstat.py", line 118, in flesch_reading_ease
    FRE = 206.835 - float(1.015 * ASL) - float(84.6 * ASW)
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

This is the code that calls the function:

if __name__ == '__main__':
    db = connect_to_db(parse_args())
    print("~~~~~~~~~~" + " GETTING IDs " + "~~~~~~~~~~")
    ids = get_ids(db)
    for i in ids:
        details = get_urls(db, i)
        for d in details:
            get_readability_data(db, d['url'], d['id'], d['rank'], d['index'])
6
  • You obviously know how to use try/except so I'm having a hard problem understanding what the problem is. Commented Oct 19, 2016 at 15:45
  • 1
    check ASL and ASW, one of them might be None Commented Oct 19, 2016 at 15:47
  • Thanks, @MarkRansom. I thought I understood try/except too, but it isn't behaving as I would expect (so there's a knowledge gap somewhere). This exception is being thrown by one of the functions in the textstat package, so I want to know if I have to edit the package itself to get around it, or if there is another option. Commented Oct 19, 2016 at 15:49
  • Do you know the "continue" keyword ? Commented Oct 19, 2016 at 15:51
  • @TurtleIzzy ASL is Zero. That is the error, yes. I can't seem to continue the script after the error. Commented Oct 19, 2016 at 15:52

1 Answer 1

0

It is generally good practice to keep try: except: blocks as small as possible. I would wrap your textstat functions in some sort of decorator that catches the exception you expect, and returns the function output and the exception caught.

for example:

def catchExceptions(exception):  #decorator with args (sorta boilerplate)
    def decorator(func):
        def wrapper(*args, **kwargs):
            try:
                retval = func(*args, **kwargs)
            except exception as e:
                return None, e
            else:
                return retval, None
        return wrapper
    return decorator

@catchExceptions(ZeroDivisionError)
def testfunc(x):
    return 11/x

print testfunc(0)
print '-----'
print testfunc(3)

prints:

(None, ZeroDivisionError('integer division or modulo by zero',))
-----
(3, None)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.