1

I read in information from a pandas dataframe. The column "keywords" can but doesn't have to contain comma-seperated keywords for which I later on want to search for in a text. This part is easy if I only have one list of keywords over which I iterate and then look for in the text. However, I need a list for every row. How do I do that?

The input is the following Dataframe (df):

Search  keywords
 1      Smurf, gummybear, Echo
 2      Blue, yellow, red
 3      Apple, Orange, Pear

l_search = df['search'].tolist()
l_kw = df['keywords'].tolist()

Now I have a list of lists of keywords. I want to split that up into as many lists as I have searches, basically:

i = 1
for s in l_search:
   l_kw_i = [] # here the list would be l_kw_1, then l_kw_2, ...
   l_kw_i.append(s)
   i = i+1
# l_kw_1 would be now "Smurf, gummybear, Echo".

After that I would like to split each list at the commas, so l_kw_1 would now contain "Smurf", "gummybear", "Echo". I would then interate over the results of each search and the respective list to determine if at least one keyword appears.

The main problem is to create a variable amount of lists of keywords based on how many searches there are.

4

1 Answer 1

0

The trick is to use a dictionary. You can do it in one line using a dictionary comprehension combined with a list comprehension :

df = pd.DataFrame({'Search':[1,2,3], 
                   'keywords' : ["Smurf, gummybear, Echo", "Blue, yellow, red", "Apple, Orange, Pear"] })

l_kw = {i:[y for y in x['keywords'].split(',')] for i, x in df.iterrows()}

Output :

{0: ['Smurf', ' gummybear', ' Echo'],
 1: ['Blue', ' yellow', ' red'],
 2: ['Apple', ' Orange', ' Pear']}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.