1

I get a dataframe from an interface whith cryptically named columns, of which I know some substrings which are mutually exclusive over all columns.

An simplified example looks like this:

df = pandas.DataFrame({'d10432first34sf':[1,2,3],'d10432second34sf':[4,5,6]})
df
   d10432first34sf  d10432second34sf
0                1                 4
1                2                 5
2                3                 6

Since I know the column substrings, I can access individual columns in the following way:

df.filter(like='first')
   d10432first34sf
0                1
1                2
2                3

df.filter(like='second')
   d10432second34sf
0                 4
1                 5
2                 6

But now, I also need to get the exact column name of each column, which are unknown to me. How can I achieve that?

1 Answer 1

2

Add .columns:

cols = df.filter(like='first').columns
print (cols)
Index(['d10432first34sf'], dtype='object')

Or better boolean indexing with contains:

cols = df.columns[df.columns.str.contains('first')]
print (cols)
Index(['d10432first34sf'], dtype='object')

Timings are not same:

 df = pd.DataFrame({'d10432first34sf':[1,2,3],'d10432second34sf':[4,5,6]})
df = pd.concat([df]*10000, axis=1).reset_index(drop=True)
df = pd.concat([df]*1000).reset_index(drop=True)
df.columns = df.columns + pd.Series(range(10000 * 2)).astype('str')

print (df.shape)
(3000, 20000)

In [267]: %timeit df.filter(like='first').columns
10 loops, best of 3: 117 ms per loop

In [268]: %timeit df.columns[df.columns.str.contains('first')]
100 loops, best of 3: 11.9 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

I think the second method will be much faster for bigger DataFrames with lots of rows
@MaxU - Good idea, I am going to test it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.