2

Suppose I have a pandas dataframe like this:

         Word      Rating
   0     Bear      1
   1     Yuck      2
   2     Girl      3
   3     Yellow    4

How can I use regex in pandas to filter out the rows that have the word that starts with the letter "y" but keep the dataframe formatting? I know the regex pattern would be r"\b[^y]\w+\b"

Expected output:

         Word    Rating
    0    Bear    1
    2    Girl    3

3 Answers 3

6

Using startswith

In [1187]: df[~df.Word.str.startswith('Y')]
Out[1187]:
   Word  Rating
0  Bear       1
2  Girl       3

Or, regex match

In [1203]: df[df.Word.str.match('^[^Y]')]
Out[1203]:
   Word  Rating
0  Bear       1
2  Girl       3
Sign up to request clarification or add additional context in comments.

Comments

0

Regular expressions are not necessary. Just check the first letter:

df[df.Word.str[0] != 'Y']

Comments

0

Use lower and startswith to get both uppercase 'Y' and lowercase 'y':

df[~df.Word.str.lower().str.startswith('y')]

Input:

df

     Word  Rating
0    Bear       1
1    Yuck       2
2    Girl       3
3  Yellow       4
4     yes       5
5   color       6

Output:

    Word  Rating
0   Bear       1
2   Girl       3
5  color       6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.