1

I have a following dataframe:

In [48]: df.head(10)
Out[48]: 
                      beat1   beat2   beat3   beat4   beat5   beat6   beat7  
filename                                                                      
M46_MI_RhHy61d.dat   0.7951  0.8554  0.9161  1.0789  0.6664  0.7839  0.6076   
M60_MI_AH53d.dat     0.7818  0.7380  0.8657  0.9980  0.7491  0.9272  0.8781   
M57_Car_AF0489d.dat  1.1040  1.1670  1.7740  1.3080  1.2190  1.0800  1.2390   
F62_MI_AH39d.dat     1.2150  0.9360  0.9890  1.1960  0.8420  1.1530  1.1360   
F81_MI_DM10d.dat     1.0650  1.1190  1.1330  1.2040  1.1220  1.1640  1.0600   
M61_My_508d.dat      0.6963  0.7910  0.6362  0.6938  0.7410  0.7198  0.7060   
M69_MI_554d.dat      1.0400  1.0890  1.0190  0.9600  1.0720  1.0870  1.0100   
F78_MI_548d.dat      1.1410  1.3290  0.8620  0.0000  1.3160  1.2180  1.2870   
F68_MI_AH152d.dat    1.1910  1.1170  1.1030  1.2430  1.0100  0.0000  0.0000   
M46_Myo_484d.dat     0.6799  0.7278  0.6808  0.7059  0.7973  0.6956  0.6685 

In some cases, some (but need not all) of the values in columns are equal to 0 and I don't know which columns would they appear in for a given row. For example, in the dataframe given above, the last two values in the second last row are zero. I want to remove such rows from the dataframe. I can do it if I know the columns in which these values would appear, however, exactly that is what I don't know. Any ideas about doing this?

1 Answer 1

3

IIUC:

You want to drop any row with a zero in it?

option 1
pd.DataFrame.mask returns a dataframe with np.nan where the boolean array argument is True. I can then dropna which defaults to dropping rows where there exist any null values.

df.mask(df == 0).dropna()

                      beat1   beat2   beat3   beat4   beat5   beat6   beat7
filename                                                                   
M46_MI_RhHy61d.dat   0.7951  0.8554  0.9161  1.0789  0.6664  0.7839  0.6076
M60_MI_AH53d.dat     0.7818  0.7380  0.8657  0.9980  0.7491  0.9272  0.8781
M57_Car_AF0489d.dat  1.1040  1.1670  1.7740  1.3080  1.2190  1.0800  1.2390
F62_MI_AH39d.dat     1.2150  0.9360  0.9890  1.1960  0.8420  1.1530  1.1360
F81_MI_DM10d.dat     1.0650  1.1190  1.1330  1.2040  1.1220  1.1640  1.0600
M61_My_508d.dat      0.6963  0.7910  0.6362  0.6938  0.7410  0.7198  0.7060
M69_MI_554d.dat      1.0400  1.0890  1.0190  0.9600  1.0720  1.0870  1.0100
M46_Myo_484d.dat     0.6799  0.7278  0.6808  0.7059  0.7973  0.6956  0.6685

option 2
use loc where all values in row are not zero

df.loc[(df != 0).all(1)]

option 3
numpy gives a lot of efficiency. Similar concept to option 2. However, we reconstruct from scratch.

v = df.values
mask = (v != 0).all(1)
pd.DataFrame(v[mask], df.index[mask], df.columns)

naive time testing

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Worked! Can you kindly give an explanation also?
In particular, I want to know why df = df[df > 0] doesn't work.
@Peaceful well df[df > 0] kinda does work. It returns the parts of df where df > 0 is True. It doesn't have an answer for where df > 0 is False so you get nulls. df[df > 0].dropna() would also work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.