I have a unique question, and I am primarily hoping to identify ways to speed up this code a little. I have a set of strings stored in a dataframe, each of which has several names in it and I know the number of names before this step, like so:
print df
description num_people people
'Harry ran with sally' 2 []
'Joe was swinging with sally' 2 []
'Lola Dances alone' 1 []
I am using a dictionary with the keys that I am looking to find in description, like so:
my_dict={'Harry':'1283','Joe':'1828','Sally':'1298', 'Cupid':'1982'}
and then using iterrows to search each string for matches like so:
for index, row in df.iterrows():
row.people=[key for key in my_dict if re.findall(key,row.desciption)]
and when run it ends up with
print df
description num_people people
'Harry ran with sally' 2 ['Harry','Sally']
'Joe was swinging with sally' 2 ['Joe','Sally']
'Lola Dances alone' 1 ['Lola']
The problem that I see, is that this code is still fairly slow to get the job done, and I have a large number of descriptions and over 1000 keys. Is there a faster way of performing this operation, like maybe using the number of people found?