2

I am very new to Python so apologies for this basic question. I am trying to match columns of keywords with a list of text. If the keyword(s) can be found in the text, these should be appended to the spreadsheet which currently ends at the 'Engagement' column.

I currently get the following error message in the 2nd line of the 'for-loop': TypeError: 'in ' requires string as left operand, not float

What's wrong with my code and how should I correct it? Thank you.

df_rawdata = pd.read_excel (r'test.xlsx', sheet_name ='rawdata')
my_rawdatalist = df_rawdata['Text'].tolist()


df_all_words = pd.read_excel (r'test.xlsx', sheet_name ='pet_dict')

keywords_list = set(df_all_words['Animals'].tolist()+df_all_words['Cities'].tolist())

matchlist = []

for rawdata in my_rawdatalist:
        matches = [keyword for keyword in keywords_list if keyword in rawdata]
        matchlist.append("|".join(matches))

print(matchlist)

enter image description here

1
  • 1
    Can a element of my_rawdatalist have more than one keyword in it and if so, what should happen? Commented Apr 24, 2021 at 13:43

2 Answers 2

2

I actually don't get why you want to have an empty string there, but maybe this helps you:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

I think a list comprehension might go a long way to making this easier. Note that it will also allow you to deal with a phrase that contains multiple keywords:

my_rawdatalist = [
    "The cat is out",
    "The zoo is fun",
    "The dog is tired",
    "The dog chases the cat"
]
keywords_list = ["cat", "dog", "NaN"]
matchlist = []

for rawdata in my_rawdatalist:
    matches = [keyword for keyword in keywords_list if keyword in rawdata]
    matchlist.append("|".join(matches))

print(matchlist)

Will give you:

['cat', '', 'dog', 'cat|dog']

If you have "many" keywords, then you can cast your keyword_list to a set() as that will help make lookup more efficient.

keywords_list = set(["cat", "dog", "NaN"])

If you have multiple columns of keywords (if I understand what you are saying) then I would just append each column to the set.

keywords_list = set(
    ["cat", "dog", "NaN"] ## keywords from column A
    + ["Person", "Woman", "Man", "Camera", "TV"] ## keywords from column B
)

The code should continue to work:

my_rawdatalist = [
    "The cat is out",
    "The zoo is fun",
    "The dog is tired",
    "The dog chases the cat on TV"
]

keywords_list = set(
    ["cat", "dog", "NaN"] ## keywords from column A
    + ["Person", "Woman", "Man", "Camera", "TV"] ## keywords from column B
)

matchlist = []

for rawdata in my_rawdatalist:
    matches = [keyword for keyword in keywords_list if keyword in rawdata]
    matchlist.append("|".join(matches))

print(matchlist)

Gives you:

['cat', '', 'dog', 'dog|cat|TV']

3 Comments

Hi, How can this code be modified if there are multiple keyword columns that need to be matched with the same column of text? Turning the keywords into keyword:category dictionaries may work if you only have a couple of words to search for...but what if you have 100 words spread across many columns? Thank you.
I updated the answer to use a set() that might address performance concerns about "many" keywords.
Thank you JonSG, but I keep getting an error message in the for-loop. Can you help? Have updated the code in the example above so that it's easier for you/others to troubleshoot. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.