Pandas dataframe get column names with values in cell

Question

I am trying to get the column names which have cell values less than .2, without repeating a combination of columns. I tried this to iterate over the column names without success:

pvals2=pd.DataFrame({'col1': [1, .2,.7], 
                     'col2': [.2, 1,.01],
                     'col3': [.7,.01,1]},
                    index = ['col1', 'col2', 'col3'])
print(pvals2)
print('---')
pvals2.transpose().join(pvals2, how='outer')

My goal is:

col3 col2 .01
#col2 col3 .01 #NOT INCLUDED (because it it a repeat)

jpp · Accepted Answer · 2018-03-02 20:46:16Z

1

A list comprehension is one way:

pvals2 = pd.DataFrame({'col1': [1, .2,.7], 'col2': [.2, 1,.01], 'col3': [.7,.01,1]},
                      index = ['col1', 'col2', 'col3'])

res = [col for col in pvals2 if (pvals2[col] < 0.2).any()]

# ['col2', 'col3']

To get values as well, as in your desired output, requires more specification, as a column may have more than one value less than 0.2.

answered Mar 2, 2018 at 20:46

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Metropolis · Accepted Answer · 2018-03-02 20:51:57Z

0

Iterate through the columns and check if any value meets your conditions:

pvals2=pd.DataFrame({'col1': [1, .2,.7], 
                 'col2': [.2, 1,.01],
                 'col3': [.7,.01,1]})

cols_with_small_values = set()
for col in pvals2.columns:     
    if any(i < 0.2 for i in pvals2[col]):
        cols_with_small_values.add(col)
        cols_with_small_values.add(pvals2[col].min())

print(cols_with_small_values)


RESULT: {'col3', 0.01, 'col2'}

any is a built-in. This question has a good explanation for how any works. And we can use a set to assure each column will only appear once.

We use DataFrame.min() to get the small value that caused us to select this column.

edited Mar 2, 2018 at 20:51

answered Mar 2, 2018 at 20:40

Metropolis

2,1282 gold badges20 silver badges39 bronze badges

Comments

DJK · Accepted Answer · 2018-03-02 21:47:20Z

0

You could use stack and then filter out values < 0.2. Then keep the last duplicated value

pvals2.stack()[pvals2.stack().lt(.2)].drop_duplicates(keep='last')

col3  col2    0.01
dtype: float64

answered Mar 2, 2018 at 21:47

DJK

9,3424 gold badges28 silver badges41 bronze badges

Comments

Sean Storey · Accepted Answer · 2018-03-03 12:32:20Z

0

pvals2=pd.DataFrame({'col1': [1, .2,.7], 
             'col2': [.2, 1,.01],
             'col3': [.7,.01,1]},
            index = ['col1', 'col2', 'col3'])


pvals2.min().where(lambda x : x<0.1).dropna()

Output

col2    0.01
col3    0.01
dtype: float64

answered Mar 3, 2018 at 12:32

Sean Storey

311 bronze badge

Collectives™ on Stack Overflow

Pandas dataframe get column names with values in cell

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related