2

everybody!! I have a question. Imagine a Data Frame with columns [a, b, c, e, f, g, h, i, j]. I want to create a 2nd DF having only columns a, c-g. How can I do this in a single coman without creating a list putting ao the columns? For example, I'm writing in that way:

columns = ['a', 'c', 'e', 'f', 'g']
df2 = df.loc[:,~df.columns.isin(columns)]

I would know if there's something more like:

df2 = df.loc[:,'a': 'g']

But excluing the 'b' column.

This second way I did 2 comands, one to select from a-g and the second, to drop b.

I would like to know if I can selct from a-g and drop b at the same time

4 Answers 4

1

The easiest way will be to use slice notation .loc as you demonstrated along with a call to .drop to remove any specific unwanted columns:

Create data

>>> df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])
>>> df
   a  b  c  d  e  f  g  h  i  j
0  0  1  2  3  4  5  6  7  8  9
1  0  1  2  3  4  5  6  7  8  9
2  0  1  2  3  4  5  6  7  8  9
3  0  1  2  3  4  5  6  7  8  9
4  0  1  2  3  4  5  6  7  8  9

.loc and dropping

Fairly straightforward, use .loc to perform your slicing then drop anything you don't want from there.

>>> df.loc[:, 'a':'g'].drop(columns='b')
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

Working With the Index

If you want to work as efficiently as possible with the index, you can use Index.slice_indexer along with .drop so that you don't create temporary subsets of your data (like we did above):

>>> columns = df.columns[df.columns.slice_indexer('a', 'g')].drop('b')
>>> df[columns]
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you!! The lock and dropping method was really helpful!
0

you can use

df2 = df[[a, c, d, e, f, g]].copy()

or

df2 = df.copy()
del df2[b]

1 Comment

I was wondering if a have a solution like your 1st one, but with the necessity to specify column by column.
0

There are a couple ways you could solve this if you did not want to manually have to write in the columns into a list

#Firstly, if you wanted to simply pull back only columns that are sequential you could use an np.arange() to get the column indexes pulled back
df.iloc[:,np.arange(2, 5).tolist()]

#Secondly, if you wanted to pull back some columns sequential, but remove one in the middle you could use a pop on a list of ints to represent your column index
column_list = np.arange(2, 5).tolist()
#This pop will remove the 1 index of the list you created in the np.arange() above
column_list.pop(1)
df.iloc[:,column_list]

1 Comment

Thank your fow your answer. I was thinking in some to do all this as one liner
0

One option is with select_columns from pyjanitor, which offers an abstraction.

I'll be reusing @CameronRiddell's sample data:

# pip install pyjanitor
import pandas as pd 
import janitor

df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])

# pass in the arguments:
df.select_columns('a', slice('c','g'))

   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

You can pull this off without another library using Pandas filter:

df.filter(regex = '[ac-g]')
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.