1

I would like to return a filtered dataframe that does NOT contain a specific integer, like 2. However, it does need to return rows that have integers like 12, or 22, or 200...etc.

Example:

d = {'num_list': ["1,2,3,10,11,12,13","","4,5,6","11,12,13","2,3,4,12","12,13"]}
searchfor = "2"

df = pd.DataFrame(data=d)

filtered_df = df[~df['num_list'].str.contains(searchfor)]

The dataframe:

                num_list
0      1,2,3,10,11,12,13
1
2                  4,5,6
3               11,12,13
4               2,3,4,12
5                  12,13

Expected result:

                num_list
1
2                  4,5,6
3               11,12,13
5                  12,13

Actual result:

                num_list
1
2                  4,5,6

This code is matching the string "2" which also exists in row 3 and 5. Trying to find the right method to solve this. I'm thinking of changing colum num_list to a list, but I don't know how to filter a dataframe list.

d = {'num_list': [[1,2,3,10,11,12,13],[],[4,5,6],[11,12,13],[2,3,4,12],[12,13]]}
searchfor = 2

df = pd.DataFrome(data=d)

??

The dataframe:

                   num_list
0 [1, 2, 3, 10, 11, 12, 13]
1                        []
2                 [4, 5, 6]
3              [11, 12, 13]
4             [2, 3, 4, 12]
5                  [12, 13]

Is this the right approach? How do I return rows that does not have the specific integer 2 (i.e. return row 1,2,3,5)? Thanks in advance.

1 Answer 1

1

As suggested in this great answer, you can use a mask and the apply function to solve your problem statement.

d = {'num_list': [[1,2,3,10,11,12,13],[],[4,5,6],[11,12,13],[2,3,4,12],[12,13]]}
searchfor = 2
df = pd.DataFrame(data=d)

# Here we create our mask that is essentially a list of True and False for
# each row on which the condition applies. 
mask = df.num_list.apply(lambda x: searchfor not in x)

# Now we can apply the mask to df
df_filtered = df[mask]

Unfiltered DataFrame:

>>> df
                    num_list
0  [1, 2, 3, 10, 11, 12, 13]
1                         []
2                  [4, 5, 6]
3               [11, 12, 13]
4              [2, 3, 4, 12]
5                   [12, 13]

And the result of df_filtered now contains all rows except the ones that consist of the value in searchfor:

>>> df_filtered
    num_list
1            []
2     [4, 5, 6]
3  [11, 12, 13]
5      [12, 13]
Sign up to request clarification or add additional context in comments.

2 Comments

When reading in the dataframe from a file, I also had to convert num_list column from a string to a list and I found this handy. link
Great, thanks for sharing! If my answer worked please don't forget to accept it :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.