Drop columns with no header on csv import with pandas

Question

Here is a sample csv:

|  Header A |      | Unnamed: 1 |  Header D |
|-----------|------|------------|-----------|
| a1        | b1   | c1         | d1        |
| a2        | b2   | c2         | d2        |

If I import it with pandas.read_csv, it turns into this:

  Header A Unnamed: 1 Unnamed: 1.1 Header D
0      a1         b1           c1       d1
1      a2         b2           c2       d2

My goal is dropping all the columns with empty headers, in this case the second column, but I cannot use the assigned column names by pandas to filter them, because there might also be non-empty columns starting with Unnamed, like the third column in the example.

Columns are not known before hand, so I do not have any control over them.

I have tried the following args with read_csv, but have not had any luck with them:

prefix: it just does not work!
usecols: Empty headers already have a name when they are passed to usecols, which makes it unusable to me.

I have looked at some other answers on SO, like the ones below, but none of them cover my case:

How to get rid of `Unnamed:` column in a pandas dataframe

Remove Unnamed columns in pandas dataframe

You have a column in the CSV file with a name Unnamed: 1 that you want to keep? Are you writing this CSV file beforehand? — roganjosh
– roganjosh, Commented Mar 21, 2019 at 22:01
The might be a column in the csv starting with Unnamedm but I do not know before hand. I would like to cover all possible cases. — kaveh
– kaveh, Commented Mar 21, 2019 at 22:03
@Wen-Ben I didn't. It's an example that shows it's possible to have such column names! — kaveh
– kaveh, Commented Mar 21, 2019 at 22:23

roganjosh · Accepted Answer · 2019-03-21 22:08:35Z

2

The only way I can think of is to "peek" at the headers beforehand and get the indices of non-empty headers. Then it's not a case of dropping them, but not including them in the original df.

import csv

import pandas as pd

with open('test.csv') as infile:
    reader = csv.reader(infile)
    headers = next(reader)

header_indices = [i for i, item in enumerate(headers) if item]

df = pd.read_csv('test.csv', usecols=header_indices)

answered Mar 21, 2019 at 22:08

roganjosh

13.3k4 gold badges33 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kaveh Over a year ago

Thanks. That was short and efficient.

roganjosh Over a year ago

@Bazingaa thanks mate, took a while to get there :)

Lior Cohen · Accepted Answer · 2019-03-21 22:09:57Z

0

Read your columns to list with df.columns
create a tf_list with True/False based on your logic (search for None, Unnamed etc)
filter_df = df.loc[:, tf_list]

answered Mar 21, 2019 at 22:09

Lior Cohen

5,7202 gold badges18 silver badges33 bronze badges

1 Comment

roganjosh Over a year ago

This wouldn't work. The damage would already have been done because Pandas would have filled the blank columns with "Unnamed: x" and it would be indistinguishable from a column that also shared that style of name in the file

Collectives™ on Stack Overflow

Drop columns with no header on csv import with pandas

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related