Selecting specific columns from a Data Frame

Question

everybody!! I have a question. Imagine a Data Frame with columns [a, b, c, e, f, g, h, i, j]. I want to create a 2nd DF having only columns a, c-g. How can I do this in a single coman without creating a list putting ao the columns? For example, I'm writing in that way:

columns = ['a', 'c', 'e', 'f', 'g']
df2 = df.loc[:,~df.columns.isin(columns)]

I would know if there's something more like:

df2 = df.loc[:,'a': 'g']

But excluing the 'b' column.

This second way I did 2 comands, one to select from a-g and the second, to drop b.

I would like to know if I can selct from a-g and drop b at the same time

Cameron Riddell · Accepted Answer · 2022-07-07 15:18:49Z

1

The easiest way will be to use slice notation .loc as you demonstrated along with a call to .drop to remove any specific unwanted columns:

Create data

>>> df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])
>>> df
   a  b  c  d  e  f  g  h  i  j
0  0  1  2  3  4  5  6  7  8  9
1  0  1  2  3  4  5  6  7  8  9
2  0  1  2  3  4  5  6  7  8  9
3  0  1  2  3  4  5  6  7  8  9
4  0  1  2  3  4  5  6  7  8  9

`.loc` and dropping

Fairly straightforward, use .loc to perform your slicing then drop anything you don't want from there.

>>> df.loc[:, 'a':'g'].drop(columns='b')
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

Working With the Index

If you want to work as efficiently as possible with the index, you can use Index.slice_indexer along with .drop so that you don't create temporary subsets of your data (like we did above):

>>> columns = df.columns[df.columns.slice_indexer('a', 'g')].drop('b')
>>> df[columns]
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

edited Jul 7, 2022 at 15:18

answered Jul 7, 2022 at 15:12

Cameron Riddell

13.8k14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lucas Cordeiro Romão Over a year ago

Thank you!! The lock and dropping method was really helpful!

Marco Tartaglione · Accepted Answer · 2022-07-07 14:57:17Z

0

you can use

df2 = df[[a, c, d, e, f, g]].copy()

or

df2 = df.copy()
del df2[b]

edited Jul 7, 2022 at 14:57

answered Jul 7, 2022 at 14:55

Marco Tartaglione

416 bronze badges

1 Comment

Lucas Cordeiro Romão Over a year ago

I was wondering if a have a solution like your 1st one, but with the necessity to specify column by column.

ArchAngelPwn · Accepted Answer · 2022-07-07 15:04:01Z

0

There are a couple ways you could solve this if you did not want to manually have to write in the columns into a list

#Firstly, if you wanted to simply pull back only columns that are sequential you could use an np.arange() to get the column indexes pulled back
df.iloc[:,np.arange(2, 5).tolist()]

#Secondly, if you wanted to pull back some columns sequential, but remove one in the middle you could use a pop on a list of ints to represent your column index
column_list = np.arange(2, 5).tolist()
#This pop will remove the 1 index of the list you created in the np.arange() above
column_list.pop(1)
df.iloc[:,column_list]

answered Jul 7, 2022 at 15:04

ArchAngelPwn

3,0461 gold badge6 silver badges17 bronze badges

1 Comment

Lucas Cordeiro Romão Over a year ago

Thank your fow your answer. I was thinking in some to do all this as one liner

sammywemmy · Accepted Answer · 2022-07-08 01:55:49Z

0

One option is with select_columns from pyjanitor, which offers an abstraction.

I'll be reusing @CameronRiddell's sample data:

# pip install pyjanitor
import pandas as pd 
import janitor

df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])

# pass in the arguments:
df.select_columns('a', slice('c','g'))

   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

You can pull this off without another library using Pandas filter:

df.filter(regex = '[ac-g]')
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

answered Jul 8, 2022 at 1:55

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Selecting specific columns from a Data Frame

4 Answers 4

Create data

`.loc` and dropping

Working With the Index

1 Comment

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Create data

.loc and dropping

Working With the Index

1 Comment

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related

`.loc` and dropping