Python - How to speed us Nested for loop with function and multiple return value

Question

I am writing a python code to compute if there is any fuzzy match between 2 strings. If there is a match, I have to store both the strings and the avg match value. The string to be compared are from a list that runs into thousands of entries The issue is that the code is taking too long to execute. To speed up, I looked the other answers in here but none of them had multiple return values from the inner function in the loop. Looking for optimized code here...

tokens=['abc','bcd','abe','efg','opq']
valid_list=['acb','abc','abf','bcd','rts','xyz']
for i in tokens:
    for j in valid_list:
        token,valid_entry,avg_match=get_match(i,j)
        if(token!=0):
            potential_entry.append(valid_entry)
            match_tokens.append(token)
            ag_match.append(avg_match)

def get_match(i,j):

   avg_value=(fuzz.ratio(token,chk_str)+fuzz.partial_ratio(token,chk_str)+fuzz.token_sort_ratio(token,chk_str)+fuzz.token_set_ratio(token,chk_str))/4
    if(int(avg_value)>70):
        return token,chk_Str,int(avg_value)
    else:
        return 0,0,0

yes plz. I want to check the match of each token in input to each token in the valid_list. — Sid
– Sid, Commented Nov 22, 2019 at 10:35

Sayse · Accepted Answer · 2019-11-22 10:52:05Z

1

The main obvious thing I can see is that you could short circuit out of the fuzzy checks if any are clearly not going to be a valid match.

So instead of doing them all in one line, do them individually, and check if they are below a threshold before getting the other ratios, prioritise checking the ratio you'd expect to provide the clearest answer for this first.

Also, consider:

using a single list of an object to avoid having to append to three lists
using sets for your tokens and valid list to ensure there aren't any duplicate checks being done
not casting the avg_value to an integer for the if statement, it doesn't really make a difference here.
add in an explicit i == j check to return a 100% ratio before doing any other checks

edited Nov 22, 2019 at 10:52

answered Nov 22, 2019 at 10:38

Sayse

43.4k14 gold badges85 silver badges150 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sid Over a year ago

Thanks @Sayse for the recommendations. I have removed the duplicates from both the lists. If I do not cast the value, I was getting float value error. I have also removed the tokens that are already matching exactly. The code I posted here is the example and so you see the exact match case. I will try to implement the list of object point.

Collectives™ on Stack Overflow

Python - How to speed us Nested for loop with function and multiple return value

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related