138

I have a data file from columns A-G like below but when I am reading it with pd.read_csv('data.csv') it prints an extra unnamed column at the end for no reason.

colA    ColB    colC    colD    colE    colF    colG    Unnamed: 7
44      45      26      26      40      26      46        NaN
47      16      38      47      48      22      37        NaN
19      28      36      18      40      18      46        NaN
50      14      12      33      12      44      23        NaN
39      47      16      42      33      48      38        NaN

I have seen my data file various times but I have no extra data in any other column. How I should remove this extra column while reading ? Thanks

2
  • 1
    Your first column is probably the index col see related: stackoverflow.com/questions/36519086/… Commented May 15, 2017 at 15:43
  • 2
    I just had the same issue. I examined my data file.. and found that there was an extra separator at the end of the header row (row 0). Commented Sep 2, 2021 at 17:56

4 Answers 4

347
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

In [162]: df
Out[162]:
   colA  ColB  colC  colD  colE  colF  colG
0    44    45    26    26    40    26    46
1    47    16    38    47    48    22    37
2    19    28    36    18    40    18    46
3    50    14    12    33    12    44    23
4    39    47    16    42    33    48    38

NOTE: very often there is only one unnamed column Unnamed: 0, which is the first column in the CSV file. This is the result of the following steps:

  1. a DataFrame is saved into a CSV file using parameter index=True, which is the default behaviour
  2. we read this CSV file into a DataFrame using pd.read_csv() without explicitly specifying index_col=0 (default: index_col=None)

The easiest way to get rid of this column is to specify the parameter pd.read_csv(..., index_col=0):

df = pd.read_csv('data.csv', index_col=0)
Sign up to request clarification or add additional context in comments.

2 Comments

For some reason, the above response was unsuccessful in my case, but this link resolved it for me. Using match instead of contains df2.loc[:,~df2.columns.str.match("Unnamed")]
The index_col=0 comment is very useful - most occurrences of Unnamed columns are for "Unnamed: 0", which is likely to be an index. Although not the case in this question.
41

First, find the columns that have 'unnamed', then drop those columns. Note: You should Add inplace = True to the .drop parameters as well.

df.drop(df.columns[df.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)

Comments

15

The pandas.DataFrame.dropna function removes missing values (e.g. NaN, NaT).

For example the following code would remove any columns from your dataframe, where all of the elements of that column are missing.

df.dropna(how='all', axis='columns')

1 Comment

From Review: Welcome to Stack Overflow! Try to provide a nice description about how your solution works. See: How do I write a good answer?. Thanks
8

The approved solution doesn't work in my case, so my solution is the following one:

    ''' The column name in the example case is "Unnamed: 7"
 but it works with any other name ("Unnamed: 0" for example). '''

        df.rename({"Unnamed: 7":"a"}, axis="columns", inplace=True)

        # Then, drop the column as usual.

        df.drop(["a"], axis=1, inplace=True)

Hope it helps others.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.