3

It's a very interesting question and I am seeking help of experts to understand more about it and how to do it. I have a DataFrame (which I made while parsing data from Big Iron... still exists). Now I want to remove some rows by using regular expression but don't know how does it work in Pandas.

24 | DRFT.146.856 |    Dollar-  |  (60.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00   
25 |   0616-21.01 |      2407   |  WAYZAT  |   TMCD   | JUNE  |      16,DRA  |2013   
26 |          AND | CORRECTION  |JOURNAL00 |    <DB>  |KLRETY | CATEGORYDRA- |    *   
27 | DRFT.146.867 |    Dollar-  | (200.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00   
28 | DRFT.146.922 |   Dollar-   | (25.00)  |DEBITS-   | 0.00  | CREDITSDRA-  |0.00   
29 | DRFT.146.963 |    Dollar-  | (100.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00   
30 | DRFT.146.964 |    Dollar-  | (100.00) | DEBITS-  |  0.00 |  CREDITSDRA- | 0.00  

The row of concern is 25 & 26 where the data is not following any pattern. Any clue.

7
  • You just want to remove id 25 & 26 ? or is there a pattern on why you want to remove it ? Commented Jun 24, 2013 at 19:59
  • I'd suggest filtering them out before putting them in a DataFrame. It looks like certain columns should have easy to check for patterns or a limited set of valid values. As far as you understand this data - what field do you think you can filter by most effectively? Commented Jun 24, 2013 at 20:00
  • @HamZa There is a pattern .. And thats the problem since I dont know the location and just the pattern Commented Jun 24, 2013 at 20:00
  • @JonClements Prolem is there are bunch of already "not required" complex program running before making the Data Frame which I have very less control. Commented Jun 24, 2013 at 20:02
  • Is there any equivalent function like "apply" for rows. Since apply seems to work on "columns" only. Commented Jun 24, 2013 at 20:26

1 Answer 1

4

A couple of possible contenders:

In [11]: df[2].str.contains('Dollar')
Out[11]:
0     True
1    False
2    False
3     True
4     True
5     True
6     True
Name: 2, dtype: bool

In [12]: df[3].str.startswith('(')
Out[12]:
0     True
1    False
2    False
3     True
4     True
5     True
6     True
Name: 3, dtype: bool

Doing this kind of thing is always a bit of a dark art (as there is usually a lot of data and some could look very similar to the good data)...

In [13]: df[df[3].str.startswith('(')]
Out[13]:
    0             1        2         3        4       5            6   7
0  24  DRFT.146.856  Dollar-    (60.00)  DEBITS-   0.00  CREDITSDRA-   0
3  27  DRFT.146.867  Dollar-   (200.00)  DEBITS-   0.00  CREDITSDRA-   0
4  28  DRFT.146.922  Dollar-    (25.00)  DEBITS-   0.00  CREDITSDRA-   0
5  29  DRFT.146.963  Dollar-   (100.00)  DEBITS-   0.00  CREDITSDRA-   0
6  30  DRFT.146.964  Dollar-   (100.00)  DEBITS-   0.00  CREDITSDRA-   0
Sign up to request clarification or add additional context in comments.

5 Comments

Thats interesting Andy. I am trying to find a way by which "Regex" can be used in determining which rows to "keep". It seems I may have more sucess over there.
contains and several of the other string methods accept regular expressions.
Yeah. sounds awesome. Just a quick question. How to remove rows where condition is "False"
See line [13] that's the easiest way (df = df[df[3].str...])
Thanks a ton for the answer. I am close to the solution now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.