Filter rows in pandas dataframe based on values in columns

Question

I have a following dataframe:

In [48]: df.head(10)
Out[48]: 
                      beat1   beat2   beat3   beat4   beat5   beat6   beat7  
filename                                                                      
M46_MI_RhHy61d.dat   0.7951  0.8554  0.9161  1.0789  0.6664  0.7839  0.6076   
M60_MI_AH53d.dat     0.7818  0.7380  0.8657  0.9980  0.7491  0.9272  0.8781   
M57_Car_AF0489d.dat  1.1040  1.1670  1.7740  1.3080  1.2190  1.0800  1.2390   
F62_MI_AH39d.dat     1.2150  0.9360  0.9890  1.1960  0.8420  1.1530  1.1360   
F81_MI_DM10d.dat     1.0650  1.1190  1.1330  1.2040  1.1220  1.1640  1.0600   
M61_My_508d.dat      0.6963  0.7910  0.6362  0.6938  0.7410  0.7198  0.7060   
M69_MI_554d.dat      1.0400  1.0890  1.0190  0.9600  1.0720  1.0870  1.0100   
F78_MI_548d.dat      1.1410  1.3290  0.8620  0.0000  1.3160  1.2180  1.2870   
F68_MI_AH152d.dat    1.1910  1.1170  1.1030  1.2430  1.0100  0.0000  0.0000   
M46_Myo_484d.dat     0.6799  0.7278  0.6808  0.7059  0.7973  0.6956  0.6685

In some cases, some (but need not all) of the values in columns are equal to 0 and I don't know which columns would they appear in for a given row. For example, in the dataframe given above, the last two values in the second last row are zero. I want to remove such rows from the dataframe. I can do it if I know the columns in which these values would appear, however, exactly that is what I don't know. Any ideas about doing this?

piRSquared · Accepted Answer · 2017-03-28 09:56:06Z

3

IIUC:

You want to drop any row with a zero in it?

option 1
pd.DataFrame.mask returns a dataframe with np.nan where the boolean array argument is True. I can then dropna which defaults to dropping rows where there exist any null values.

df.mask(df == 0).dropna()

                      beat1   beat2   beat3   beat4   beat5   beat6   beat7
filename                                                                   
M46_MI_RhHy61d.dat   0.7951  0.8554  0.9161  1.0789  0.6664  0.7839  0.6076
M60_MI_AH53d.dat     0.7818  0.7380  0.8657  0.9980  0.7491  0.9272  0.8781
M57_Car_AF0489d.dat  1.1040  1.1670  1.7740  1.3080  1.2190  1.0800  1.2390
F62_MI_AH39d.dat     1.2150  0.9360  0.9890  1.1960  0.8420  1.1530  1.1360
F81_MI_DM10d.dat     1.0650  1.1190  1.1330  1.2040  1.1220  1.1640  1.0600
M61_My_508d.dat      0.6963  0.7910  0.6362  0.6938  0.7410  0.7198  0.7060
M69_MI_554d.dat      1.0400  1.0890  1.0190  0.9600  1.0720  1.0870  1.0100
M46_Myo_484d.dat     0.6799  0.7278  0.6808  0.7059  0.7973  0.6956  0.6685

option 2
use loc where all values in row are not zero

df.loc[(df != 0).all(1)]

option 3
numpy gives a lot of efficiency. Similar concept to option 2. However, we reconstruct from scratch.

v = df.values
mask = (v != 0).all(1)
pd.DataFrame(v[mask], df.index[mask], df.columns)

naive time testing

edited Mar 28, 2017 at 9:56

answered Mar 28, 2017 at 9:47

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Peaceful Over a year ago

Worked! Can you kindly give an explanation also?

Peaceful Over a year ago

In particular, I want to know why df = df[df > 0] doesn't work.

piRSquared Over a year ago

@Peaceful well df[df > 0] kinda does work. It returns the parts of df where df > 0 is True. It doesn't have an answer for where df > 0 is False so you get nulls. df[df > 0].dropna() would also work.

Collectives™ on Stack Overflow

Filter rows in pandas dataframe based on values in columns

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related