Get column names of a data frame based on values from a list in pandas python

Question

I have a dataframe df as follows (only 1 row):

col1    col2    col3    col4    col5
a1       b1     c_d      d1      e10

I have another list val = [a1, c_d, e10]. I want to get the column names for the values present in val. In this case the column names will be in a list, colnm = [col1,col3,col5]. I did the same in R using:

names(df)[which((df %in% val) == TRUE)]

But not able to figure out in python as I am new in Python. Any help will be appreciated. TIA.

There is always one row DataFrame ? Or multiple rows?

jezrael
– jezrael

2020-05-19 05:36:40 +00:00
Commented May 19, 2020 at 5:36 — jezrael
– jezrael, Commented May 19, 2020 at 5:36
in my case there is always one row

user3642360
– user3642360

2020-05-19 05:38:15 +00:00
Commented May 19, 2020 at 5:38 — user3642360
– user3642360, Commented May 19, 2020 at 5:38

jezrael · Accepted Answer · 2020-05-25 06:06:09Z

2

General soluion for multiple rows - tested if at least one value or if all values per columns has values from val.

You can test membership by DataFrame.isin and then test by DataFrame.any or DataFrame.all:

#added new row for see difference
print (df)
  col1 col2 col3 col4 col5
0   a1   b1  c_d   d1  e10
1   a1   d1  c_e   f1  e10

val = ['a1', 'c_d', 'e10']

#tested membership
print (df.isin(val))
   col1   col2   col3   col4  col5
0  True  False   True  False  True
1  True  False  False  False  True

#test if at least one True per column
print (df.isin(val).any())
col1     True
col2    False
col3     True
col4    False
col5     True
dtype: bool

#test if all Trues per column
print (df.isin(val).all())
col1     True
col2    False
col3    False
col4    False
col5     True
dtype: bool

names = df.columns[df.isin(val).any()]
print (names)
Index(['col1', 'col3', 'col5'], dtype='object')

names = df.columns[df.isin(val).all()]
print (names)
Index(['col1', 'col5'], dtype='object')

If DataFrame has only one row is possible seelct first row for Series by DataFrame.iloc and then test membership by Series.isin:

names = df.columns[df.iloc[0].isin(val)]

EDIT: If not help upgdare to last version of pandas here is one solution for repalce all object columns with no strings to missing values:

data = [
    {'id': 1, 'content': [{'values': 3}]},
    {'id': 2, 'content': 'a1'},
    {'id': 3, 'content': 'c_d'},
    {'id': 4, 'content': np.array([4,5])}
]

df = pd.DataFrame(data)

mask1 = ~df.columns.isin(df.select_dtypes(object).columns)
mask2 = df.applymap(lambda x: isinstance(x, str))

df = df.where(mask2 | mask1)
print (df)
   id content
0   1     NaN
1   2      a1
2   3     c_d
3   4     NaN

val = ['a1', 'c_d', 'e10']
print (df.isin(val))
      id  content
0  False    False
1  False     True
2  False     True
3  False    False

edited May 25, 2020 at 6:06

answered May 19, 2020 at 5:35

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user3642360 Over a year ago

Hi The above code works fine for some cases, for some other cases I am getting following error. I don't know why: SystemError: <built-in method view of numpy.ndarray object at 0x125e48f30> returned a result with an error set

user3642360 Over a year ago

Sorry but I can not share the real data as it is confidential. I will check what's wrong. But is there any other way to do it?

jezrael Over a year ago

@user3642360 - hmmm, it seems problem is this

jezrael Over a year ago

@user3642360 - So problem is there are some arrays in data, pandas isin failed.

user3642360 Over a year ago

Thanks @jezrael. I will look into it.

|

Collectives™ on Stack Overflow

Get column names of a data frame based on values from a list in pandas python

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related