Creating new pandas dataframe from certain columns of existing dataframe

Question

I have read a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:

names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']

I would like to create a new dataframe with the columns A and D from the original dataframe.

Pass a list of the cols of interest to sub-select: new_dataset = dataset[['A','D']] note that if you're intending to operate on a copy then call copy(): new_dataset = dataset[['A','D']].copy() — EdChum
– EdChum, Commented Jul 11, 2017 at 13:28

jezrael · Accepted Answer · 2017-07-11 13:43:46Z

47

It is called subset - passed list of columns in []:

dataset = pandas.read_csv('file.csv', names=names)

new_dataset = dataset[['A','D']]

what is same as:

new_dataset = dataset.loc[:, ['A','D']]

If need only filtered output add parameter usecols to read_csv:

new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])

EDIT:

If use only:

new_dataset = dataset[['A','D']]

and use some data manipulation, obviously get:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

If you modify values in new_dataset later you will find that the modifications do not propagate back to the original data (dataset), and that Pandas does warning.

As pointed EdChum add copy for remove warning:

new_dataset = dataset[['A','D']].copy()

edited Jul 11, 2017 at 13:43

answered Jul 11, 2017 at 13:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cottontail · Accepted Answer · 2023-02-02 05:48:01Z

0

You must pass a list of column names to select columns. Otherwise, it will be interpreted as MultiIndex; df['A','D'] would work if df.columns was MultiIndex.

The most obvious way is df.loc[:, ['A', 'B']] but there are other ways (note how all of them take lists):

df1 = df.filter(items=['A', 'D'])

df1 = df.reindex(columns=['A', 'D'])

df1 = df.get(['A', 'D']).copy()

N.B. items is the first positional argument, so df.filter(['A', 'D']) also works.

Note that filter() and reindex() return a copy as well, so you don't need to worry about getting SettingWithCopyWarning later.

answered Feb 2, 2023 at 5:48

cottontail

25.5k25 gold badges184 silver badges176 bronze badges

Collectives™ on Stack Overflow

Creating new pandas dataframe from certain columns of existing dataframe

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related