I am writing a python code to compute if there is any fuzzy match between 2 strings. If there is a match, I have to store both the strings and the avg match value. The string to be compared are from a list that runs into thousands of entries The issue is that the code is taking too long to execute. To speed up, I looked the other answers in here but none of them had multiple return values from the inner function in the loop. Looking for optimized code here...
tokens=['abc','bcd','abe','efg','opq']
valid_list=['acb','abc','abf','bcd','rts','xyz']
for i in tokens:
for j in valid_list:
token,valid_entry,avg_match=get_match(i,j)
if(token!=0):
potential_entry.append(valid_entry)
match_tokens.append(token)
ag_match.append(avg_match)
def get_match(i,j):
avg_value=(fuzz.ratio(token,chk_str)+fuzz.partial_ratio(token,chk_str)+fuzz.token_sort_ratio(token,chk_str)+fuzz.token_set_ratio(token,chk_str))/4
if(int(avg_value)>70):
return token,chk_Str,int(avg_value)
else:
return 0,0,0