I'm comparing two dataframe columns in Python, with the goal of finding, for each element of first column, the best match of the second one. The first column contains 19.000 rows, and I need to check for every string of it what is the best match of the second column. So, it is need to check 19.000 rows, 19.000 times each row, taking into consideration that the string itself has to be another one, not the same.
I have started with a simple comparison, finding a string in a list, and I succedeed. Then I applied it to a list, just to compare both of them, but obviously, gives the error "TypeError: expected string or bytes-like object", due to comparing string vs list. Finally, I have tried to create a loop, but the error is the same. Is there a way to create a list with the results expected? Maybe there is a better way to do it with another library, but, so far, I have found nothing. Here is the code at the moment:
#simple example
from fuzzywuzzy import process
string = "appl"
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(string,compare)
print(Ratios)
[('apple', 89), ('asple', 67), ('tab', 29), ('adfad.', 22)]
highest = process.extractOne(string,compare)
print(highest)
('apple', 89)
#data frame
from fuzzywuzzy import process
dataframecolumn = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(dataframecolumn,compare)
TypeError: expected string or bytes-like object
#expected (but I need a list)
highest = process.extractOne(dataframecolumn[0],compare)
print(highest)
('apple', 89)
highest = process.extractOne(dataframecolumn[1],compare)
print(highest)
('tab', 80)
#Result expected
results = ["apple, 89","tab, 80"]
#Error
myl = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
results = []
for x in myl:
results.append(process.extractOne(myl,compare)[1])
TypeError: expected string or bytes-like object