1

I have a dataframe df as follows (only 1 row):

col1    col2    col3    col4    col5
a1       b1     c_d      d1      e10

I have another list val = [a1, c_d, e10]. I want to get the column names for the values present in val. In this case the column names will be in a list, colnm = [col1,col3,col5]. I did the same in R using:

names(df)[which((df %in% val) == TRUE)]

But not able to figure out in python as I am new in Python. Any help will be appreciated. TIA.

2
  • There is always one row DataFrame ? Or multiple rows? Commented May 19, 2020 at 5:36
  • in my case there is always one row Commented May 19, 2020 at 5:38

1 Answer 1

2

General soluion for multiple rows - tested if at least one value or if all values per columns has values from val.

You can test membership by DataFrame.isin and then test by DataFrame.any or DataFrame.all:

#added new row for see difference
print (df)
  col1 col2 col3 col4 col5
0   a1   b1  c_d   d1  e10
1   a1   d1  c_e   f1  e10

val = ['a1', 'c_d', 'e10']

#tested membership
print (df.isin(val))
   col1   col2   col3   col4  col5
0  True  False   True  False  True
1  True  False  False  False  True

#test if at least one True per column
print (df.isin(val).any())
col1     True
col2    False
col3     True
col4    False
col5     True
dtype: bool

#test if all Trues per column
print (df.isin(val).all())
col1     True
col2    False
col3    False
col4    False
col5     True
dtype: bool

names = df.columns[df.isin(val).any()]
print (names)
Index(['col1', 'col3', 'col5'], dtype='object')

names = df.columns[df.isin(val).all()]
print (names)
Index(['col1', 'col5'], dtype='object')

If DataFrame has only one row is possible seelct first row for Series by DataFrame.iloc and then test membership by Series.isin:

names = df.columns[df.iloc[0].isin(val)]

EDIT: If not help upgdare to last version of pandas here is one solution for repalce all object columns with no strings to missing values:

data = [
    {'id': 1, 'content': [{'values': 3}]},
    {'id': 2, 'content': 'a1'},
    {'id': 3, 'content': 'c_d'},
    {'id': 4, 'content': np.array([4,5])}
]

df = pd.DataFrame(data)

mask1 = ~df.columns.isin(df.select_dtypes(object).columns)
mask2 = df.applymap(lambda x: isinstance(x, str))

df = df.where(mask2 | mask1)
print (df)
   id content
0   1     NaN
1   2      a1
2   3     c_d
3   4     NaN

val = ['a1', 'c_d', 'e10']
print (df.isin(val))
      id  content
0  False    False
1  False     True
2  False     True
3  False    False
Sign up to request clarification or add additional context in comments.

6 Comments

Hi The above code works fine for some cases, for some other cases I am getting following error. I don't know why: SystemError: <built-in method view of numpy.ndarray object at 0x125e48f30> returned a result with an error set
Sorry but I can not share the real data as it is confidential. I will check what's wrong. But is there any other way to do it?
@user3642360 - hmmm, it seems problem is this
@user3642360 - So problem is there are some arrays in data, pandas isin failed.
Thanks @jezrael. I will look into it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.