Filtering Pandas DataFrame by Substring Match at Start of Strings [duplicate]

Question

Trying to filter out rows in which the data of specific column start with a given substring.

I have a pandas.DataFrame as shown below (simplified):

price	DRUG_CODE
123	A12D958
234	B564F3C
...	...

I'm trying to filter out rows in which the DRUG_CODE does not start with the substring B21. However, most of the articles I found online about filtering DataFrames using substrings focus on identifying those that contain the substring, allowing it to appear anywhere within the cell (at the beginning, middle, or end)(eg: .str.contains() method). This doesn't align with my current requirement.

out = df[~df['DRUG_CODE'].str.startswith('B21')] to inverse the logic — mozway
– mozway, Commented May 3, 2024 at 10:31
@mozway thank you so much for your help. As a complete beginner in python, your answer is very helpful and has pointed me in the direction for further learning :) — Warren Chen
– Warren Chen, Commented May 9, 2024 at 2:23

Serge de Gosson de Varennes · Accepted Answer · 2024-05-03 10:05:25Z

0

You can simply do this:

import pandas as pd

data = {
    'price': [123, 234, 345],
    'DRUG_CODE': ['A12D958', 'B564F3C', 'B21X456']
}

df = pd.DataFrame(data)

filtered_df = df[~df['DRUG_CODE'].str.startswith('B21')]

print(filtered_df)


filtered_df_with = df[df['DRUG_CODE'].str.startswith('B21')]

print(filtered_df_with)

which gives

   price DRUG_CODE
0    123   A12D958
1    234   B564F3C
   price DRUG_CODE
2    345   B21X456

answered May 3, 2024 at 10:05

Serge de Gosson de Varennes

11.6k4 gold badges30 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Filtering Pandas DataFrame by Substring Match at Start of Strings [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related