0

Here is a sample csv:

|  Header A |      | Unnamed: 1 |  Header D |
|-----------|------|------------|-----------|
| a1        | b1   | c1         | d1        |
| a2        | b2   | c2         | d2        |

If I import it with pandas.read_csv, it turns into this:

  Header A Unnamed: 1 Unnamed: 1.1 Header D
0      a1         b1           c1       d1
1      a2         b2           c2       d2

My goal is dropping all the columns with empty headers, in this case the second column, but I cannot use the assigned column names by pandas to filter them, because there might also be non-empty columns starting with Unnamed, like the third column in the example.

Columns are not known before hand, so I do not have any control over them.

I have tried the following args with read_csv, but have not had any luck with them:

  • prefix: it just does not work!
  • usecols: Empty headers already have a name when they are passed to usecols, which makes it unusable to me.

I have looked at some other answers on SO, like the ones below, but none of them cover my case:

How to get rid of `Unnamed:` column in a pandas dataframe

Remove Unnamed columns in pandas dataframe

4
  • You have a column in the CSV file with a name Unnamed: 1 that you want to keep? Are you writing this CSV file beforehand? Commented Mar 21, 2019 at 22:01
  • The might be a column in the csv starting with Unnamedm but I do not know before hand. I would like to cover all possible cases. Commented Mar 21, 2019 at 22:03
  • why you named your csv columns like this ...Unnamed: 1 Commented Mar 21, 2019 at 22:07
  • @Wen-Ben I didn't. It's an example that shows it's possible to have such column names! Commented Mar 21, 2019 at 22:23

2 Answers 2

2

The only way I can think of is to "peek" at the headers beforehand and get the indices of non-empty headers. Then it's not a case of dropping them, but not including them in the original df.

import csv

import pandas as pd

with open('test.csv') as infile:
    reader = csv.reader(infile)
    headers = next(reader)

header_indices = [i for i, item in enumerate(headers) if item]

df = pd.read_csv('test.csv', usecols=header_indices)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. That was short and efficient.
@Bazingaa thanks mate, took a while to get there :)
0
  1. Read your columns to list with df.columns
  2. create a tf_list with True/False based on your logic (search for None, Unnamed etc)
  3. filter_df = df.loc[:, tf_list]

1 Comment

This wouldn't work. The damage would already have been done because Pandas would have filled the blank columns with "Unnamed: x" and it would be indistinguishable from a column that also shared that style of name in the file

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.