Regular Expression to Remove Rows

Question

It's a very interesting question and I am seeking help of experts to understand more about it and how to do it. I have a DataFrame (which I made while parsing data from Big Iron... still exists). Now I want to remove some rows by using regular expression but don't know how does it work in Pandas.

24 | DRFT.146.856 |    Dollar-  |  (60.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00   
25 |   0616-21.01 |      2407   |  WAYZAT  |   TMCD   | JUNE  |      16,DRA  |2013   
26 |          AND | CORRECTION  |JOURNAL00 |    <DB>  |KLRETY | CATEGORYDRA- |    *   
27 | DRFT.146.867 |    Dollar-  | (200.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00   
28 | DRFT.146.922 |   Dollar-   | (25.00)  |DEBITS-   | 0.00  | CREDITSDRA-  |0.00   
29 | DRFT.146.963 |    Dollar-  | (100.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00   
30 | DRFT.146.964 |    Dollar-  | (100.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00

The row of concern is 25 & 26 where the data is not following any pattern. Any clue.

You just want to remove id 25 & 26 ? or is there a pattern on why you want to remove it ? — HamZa
– HamZa, Commented Jun 24, 2013 at 19:59
I'd suggest filtering them out before putting them in a DataFrame. It looks like certain columns should have easy to check for patterns or a limited set of valid values. As far as you understand this data - what field do you think you can filter by most effectively? — Jon Clements
– Jon Clements, Commented Jun 24, 2013 at 20:00
@HamZa There is a pattern .. And thats the problem since I dont know the location and just the pattern — LonelySoul
– LonelySoul, Commented Jun 24, 2013 at 20:00
@JonClements Prolem is there are bunch of already "not required" complex program running before making the Data Frame which I have very less control. — LonelySoul
– LonelySoul, Commented Jun 24, 2013 at 20:02
Is there any equivalent function like "apply" for rows. Since apply seems to work on "columns" only. — LonelySoul
– LonelySoul, Commented Jun 24, 2013 at 20:26

Andy Hayden · Accepted Answer · 2013-06-24 20:25:41Z

4

A couple of possible contenders:

In [11]: df[2].str.contains('Dollar')
Out[11]:
0     True
1    False
2    False
3     True
4     True
5     True
6     True
Name: 2, dtype: bool

In [12]: df[3].str.startswith('(')
Out[12]:
0     True
1    False
2    False
3     True
4     True
5     True
6     True
Name: 3, dtype: bool

Doing this kind of thing is always a bit of a dark art (as there is usually a lot of data and some could look very similar to the good data)...

In [13]: df[df[3].str.startswith('(')]
Out[13]:
    0             1        2         3        4       5            6   7
0  24  DRFT.146.856  Dollar-    (60.00)  DEBITS-   0.00  CREDITSDRA-   0
3  27  DRFT.146.867  Dollar-   (200.00)  DEBITS-   0.00  CREDITSDRA-   0
4  28  DRFT.146.922  Dollar-    (25.00)  DEBITS-   0.00  CREDITSDRA-   0
5  29  DRFT.146.963  Dollar-   (100.00)  DEBITS-   0.00  CREDITSDRA-   0
6  30  DRFT.146.964  Dollar-   (100.00)  DEBITS-   0.00  CREDITSDRA-   0

answered Jun 24, 2013 at 20:25

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

LonelySoul Over a year ago

Thats interesting Andy. I am trying to find a way by which "Regex" can be used in determining which rows to "keep". It seems I may have more sucess over there.

Andy Hayden Over a year ago

contains and several of the other string methods accept regular expressions.

LonelySoul Over a year ago

Yeah. sounds awesome. Just a quick question. How to remove rows where condition is "False"

Andy Hayden Over a year ago

See line [13] that's the easiest way (df = df[df[3].str...])

LonelySoul Over a year ago

Thanks a ton for the answer. I am close to the solution now.

Collectives™ on Stack Overflow

Regular Expression to Remove Rows

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related