0

Trying to filter out rows in which the data of specific column start with a given substring.

I have a pandas.DataFrame as shown below (simplified):

price DRUG_CODE
123 A12D958
234 B564F3C
... ...

I'm trying to filter out rows in which the DRUG_CODE does not start with the substring B21. However, most of the articles I found online about filtering DataFrames using substrings focus on identifying those that contain the substring, allowing it to appear anywhere within the cell (at the beginning, middle, or end)(eg: .str.contains() method). This doesn't align with my current requirement.

3
  • out = df[df['DRUG_CODE'].str.startswith('B21')] Commented May 3, 2024 at 10:04
  • out = df[~df['DRUG_CODE'].str.startswith('B21')] to inverse the logic Commented May 3, 2024 at 10:31
  • @mozway thank you so much for your help. As a complete beginner in python, your answer is very helpful and has pointed me in the direction for further learning :) Commented May 9, 2024 at 2:23

1 Answer 1

0

You can simply do this:

import pandas as pd

data = {
    'price': [123, 234, 345],
    'DRUG_CODE': ['A12D958', 'B564F3C', 'B21X456']
}

df = pd.DataFrame(data)

filtered_df = df[~df['DRUG_CODE'].str.startswith('B21')]

print(filtered_df)


filtered_df_with = df[df['DRUG_CODE'].str.startswith('B21')]

print(filtered_df_with)

which gives

   price DRUG_CODE
0    123   A12D958
1    234   B564F3C
   price DRUG_CODE
2    345   B21X456
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.