How to delete columns without headers in python pandas read_csv

Question

Currently, I have to read the CSV file and set the headers in advance. And then drop the columns which I don't want. Is there any way to do this directly?

# Current Code
columns_name = ['station', 'date', 'observation', 'value', 'other_1', 
'other_2', 'other_3', 'other_4']
del_columns_name = ['other_1', 'other_2', 'other_3', 'other_4']
df =pd.read_csv('filename', names = columns_name)
df.drop(del_columns_name, axis=1)

I don't see anything wrong. Possibly you could avoid reading them from the start already. df.drop(del_columns_name, axis=1, inplace=True) or df = df.drop(del_columns_name, axis=1) — Anton vBR
– Anton vBR, Commented May 11, 2018 at 23:45
It's right. But I want to know whether there is a direct way to do my 4 lines codes. — matcha latte
– matcha latte, Commented May 11, 2018 at 23:51
Did one of the below solutions help? Feel free to accept one (tick on left), or ask for clarification. — jpp
– jpp, Commented May 16, 2018 at 11:43

jpp · Accepted Answer · 2018-05-11 23:46:34Z

2

One way is to use your two lists to resolve the indices and column names required.

Then use usecols and names arguments for pd.read_csv to specify column indices and names respectively.

idx, cols = list(zip(*((i, x) for i, x in enumerate(columns_name) \
                 if x not in del_columns_name)))

df = pd.read_csv('filename', usecols=idx, names=cols, header=None)

As explained in the docs, you should also specify header=None explicitly when no header exists.

Explanation

Use a generator expression to iterate columns_name and remove items not in del_columns_name.
Use enumerate to extract indices.
Use zip to create separate tuples for indices and column names.

answered May 11, 2018 at 23:46

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Anton vBR Over a year ago

I liked the expression but it seems a bit overkill for the small example above but almost necessary if you have a more complex system.

jpp Over a year ago

@AntonvBR, Yeh I'm not really sure where the column names come from. It could be from a static config file, for example. In which case, you may be forced into something like this.

Anton vBR · Accepted Answer · 2018-05-11 23:52:07Z

2

I think you might even specify the indexes right away. In this case you are insterested in: [0,1,2,3]. Consider this example which also parses dates.

import pandas as pd

cols = ['station', 'date', 'observation', 'value']

data = '''\
1, 2018-01-01, 1, 1, 1, 1, 1, 1
2, 2018-01-02, 2, 2, 2, 2, 2, 2'''

file = pd.compat.StringIO(data)
df = pd.read_csv(file, names=cols, usecols=[0,1,2,3], parse_dates=[1])

print(df)

Returns:

   station       date  observation  value
0        1 2018-01-01            1      1
1        2 2018-01-02            2      2

answered May 11, 2018 at 23:52

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

Collectives™ on Stack Overflow

How to delete columns without headers in python pandas read_csv

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related