How to get rid of "Unnamed: 0" column in a pandas DataFrame read in from CSV file?

Question

I have a situation wherein sometimes when I read a csv from df I get an unwanted index-like column named unnamed:0.

file.csv

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

The CSV is read with this:

pd.read_csv('file.csv')

   Unnamed: 0  A  B  C
0           0  1  2  3
1           1  4  5  6
2           2  7  8  9

This is very annoying! Does anyone have an idea on how to get rid of this?

Don't assume the CSV file was necessarily written out either in pandas or by the OP. (Often, the CSV came from some other user/tool/script, so they can't control its format.) Hence the read_csv(..., index_col=[0]) workaround is the main thing, rather than "Do pd.to_csv(..., index=False)". — smci
– smci, Commented Dec 15, 2020 at 4:10

smci · Accepted Answer · 2020-12-15 04:06:43Z

406

It's the index column, pass pd.to_csv(..., index=False) to not write out an unnamed index column in the first place, see the to_csv() docs.

Example:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

compare with:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

You could also optionally tell read_csv that the first column is the index column by passing index_col=0:

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

edited Dec 15, 2020 at 4:06

smci

34.2k21 gold badges118 silver badges152 bronze badges

answered Apr 9, 2016 at 15:50

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Calvin Ku Over a year ago

A lot of times the datasets you get from elsewhere already contain this column so it doesn't really help knowing how to produce the "right" dataset using the right parameters. Is there a way to eliminate this column when you load it when it's already there?

EdChum Over a year ago

@CalvinKu unfortunately there is no skipcols arg for read_csv, after reading in the csv you could just do df = df.drop(columns=df.columns[0]) or you could just read the columns in first and then pass the cols minus the first column something like cols = pd.read_csv( ....., nrows=1).columns and then re-read again df = pd.read_csv(....., usecols=cols[1:]) this avoids the overhead of reading a superfluous column and then dropping it afterwards

smci Over a year ago

It's better practice to include the square brackets in read_csv(..., index_col=[0]). Also avoids tickling the whole pre-0.16.1 deprecated index_col = False bug.

Mr.Robot Over a year ago

@smci I saw similar solution posted by @cs95. Just wondering whey index_col=[0] is better than index_col=0?

smci Over a year ago

@MrRobot: I posted the link above which tells you why: index_col = False was deprecated way back in 0.16.1, and Python can't really tell the diference between bool False and int 0, so index_col = 0 is deprecated too. Put list brackets around the zero: [0].

cs95 · Accepted Answer · 2023-04-10 20:07:06Z

128

This is usually caused by your CSV having been saved along with an (unnamed) index (RangeIndex).

(The fix would actually need to be done when saving the DataFrame, but this isn't always an option.)

Workaround: `read_csv` with `index_col=[0]` argument

IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0] argument to pd.read_csv, this reads in the first column as the index. (Note the square brackets).

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')

pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Note
You could have avoided this in the first place by using index=False if the output CSV was created in pandas, if your DataFrame does not have an index to begin with:
df.to_csv('file.csv', index=False)
But as mentioned above, this isn't always an option.

Stopgap Solution: Filtering with `str.match`

If you cannot modify the code to read/write the CSV file, you can just remove the column by filtering with str.match:

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]
 
   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

edited Apr 10, 2023 at 20:07

answered Jan 25, 2019 at 3:59

cs95

406k106 gold badges744 silver badges797 bronze badges

4 Comments

user48115 Over a year ago

Thanks a lot! That index_col=[0] fix easily solved this annoying problem of 'unnamed:0' and spares code from verbose reinventing the wheel.

Sarah Over a year ago

To get ride of Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

Mohseen Mulla Over a year ago

The stopgap solution was relevant to my issue. Great answer!

mic Over a year ago

I think the stopgap solution is the right one. If I have no assurance that the Unnamed column exists, I need to deal with both cases, and removing the first column every time is not an option.

Sarah · Accepted Answer · 2019-09-29 19:35:08Z

38

To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

answered Sep 29, 2019 at 19:35

Sarah

2,00219 silver badges18 bronze badges

2 Comments

RK1 Over a year ago

This worked really well for me as str.match above created nans for my column names that were date formatted.

ah bon Over a year ago

I had the same problem with my data column names that had integers, this code solved the problem.

smci · Accepted Answer · 2022-06-21 21:40:33Z

12

You can do either of the following with 'Unnamed' Columns:

Delete unnamed columns
Rename them (if you want to use them)

Method 1: Delete Unnamed Columns

# delete one by one like column is 'Unnamed: 0' so use it's name
df.drop('Unnamed: 0', axis=1, inplace=True)

#delete all Unnamed Columns in a single code of line using regex
df.drop(df.filter(regex="Unnamed"),axis=1, inplace=True)

Method 2: Rename Unnamed Columns

df.rename(columns = {'Unnamed: 0':'Name'}, inplace = True)

If you want to write out with a blank header as in the input file, just choose 'Name' above to be ''.

where the OP's input data 'file.csv' was:

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

#read file df = pd.read_csv('file.csv')

edited Jun 21, 2022 at 21:40

smci

34.2k21 gold badges118 silver badges152 bronze badges

answered Oct 5, 2020 at 10:04

Jatin Kaushik

1712 silver badges6 bronze badges

1 Comment

chesney C. Over a year ago

Only Method 2 worked for me. Thanks so much. Made my deadline!

Brendan · Accepted Answer · 2018-07-26 05:41:33Z

11

Another case that this might be happening is if your data was improperly written to your csv to have each row end with a comma. This will leave you with an unnamed column Unnamed: x at the end of your data when you try to read it into a df.

answered Jul 26, 2018 at 5:41

Brendan

2,1242 gold badges25 silver badges32 bronze badges

2 Comments

Nash Over a year ago

I used usecols=range(0,10) to cut off the unnamed column

Mark Over a year ago

An example of this is having data such as musical_instruments_store.csv color,item\nbrown,piano, Either: - you have a missing column on the first line (e.g price which is null in this case so it should've been colour, item, price) - Trailing comma on the second line so the data needs to be cleaned (so it should be brown,piano instead.

ssareen · Accepted Answer · 2020-01-17 12:33:44Z

4

Simply delete that column using: del df['column_name']

answered Jan 17, 2020 at 12:33

ssareen

2762 silver badges6 bronze badges

2 Comments

Anshuman Kumar Over a year ago

This is probably the easiest way

smci Over a year ago

@AnshumanKumar: no it's not, the easiest way is simply to use read_csv(..., index_col=[0]). Generally with pandas, trying to hack around an issue instead of taking 5 min to read the documentation then experimenting till you solve it only causes/defers further problems. Admittedly the pandas doc is unclear, incomplete, and lagging the code (welcome to open source!). That's what github is for.

HITESH GUPTA · Accepted Answer · 2020-10-13 12:55:31Z

4

Simple do this:

df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

answered Oct 13, 2020 at 12:55

HITESH GUPTA

1694 bronze badges

Comments

Csepreghy Andras · Accepted Answer · 2021-03-22 09:26:22Z

4

Alternatively:

df = df.drop(columns=['Unnamed: 0'])

edited Mar 22, 2021 at 9:26

answered Mar 19, 2021 at 8:44

Csepreghy Andras

938 bronze badges

Comments

Salvatore Pannozzo Capodiferro · Accepted Answer · 2022-01-08 11:51:48Z

3

from IPython.display import display
import pandas as pd
import io


df = pd.read_csv('file.csv',index_col=[0])
df = pd.read_csv(io.StringIO(df.to_csv(index=False)))
display(df.head(5))

answered Jan 8, 2022 at 11:51

Salvatore Pannozzo Capodiferro

1,26913 silver badges21 bronze badges

1 Comment

Kees Over a year ago

index_col=[0] worked for me

user3602939 · Accepted Answer · 2022-09-14 19:50:22Z

3

In my experience, there are many reasons you might not want to set that column as index_col =[0] as so many people suggest above. For example it might contain jumbled index values because data were saved to csv after being indexed or sorted without df.reset_index(drop=True) leading to instant confusion.

So if you know the file has this column and you don't want it, as per the original question, the simplest 1-line solutions are:

df = pd.read_csv('file.csv').drop(columns=['Unnamed: 0'])

or

df = pd.read_csv('file.csv',index_col=[0]).reset_index(drop=True)

answered Sep 14, 2022 at 19:50

user3602939

613 bronze badges

Comments

wjandrea · Accepted Answer · 2024-12-14 17:48:48Z

1

I think this is the case where the column you are referring is the index. Anyway I would suggest this more general solution:

df = pd.read_csv("file.csv", usecols=lambda col: not col.startswith("Unnamed"))

edited Dec 14, 2024 at 17:48

wjandrea

33.8k10 gold badges69 silver badges105 bronze badges

answered Dec 14, 2024 at 14:52

Anisa B.

1584 silver badges13 bronze badges

Comments

gsandhu · Accepted Answer · 2022-08-12 14:47:50Z

0

A solution that is agnostic to whether the index has been written or not when utilizing df.to_csv() is shown below:

df = pd.read_csv(file_name)
if 'Unnamed: 0' in df.columns:
    df.drop('Unnamed: 0', axis=1, inplace=True)

If an index was not written, then index_col=[0] will utilize the first column as the index which is behavior that one would not want.

answered Aug 12, 2022 at 14:47

gsandhu

5115 silver badges15 bronze badges

Comments

DarkHark · Accepted Answer · 2023-08-02 22:06:37Z

0

My issue was slightly different. The first header was unnamed, which caused all my data afterwards to shift left, making it misaligned, and made the last column all NaN.

I resolved it using the following:

df = pd.read_csv('filename.csv', nrows=1)
columns = list(df.columns)[1:]
df = pd.read_csv('filename.csv', skiprows=1, header=None, names=columns)

answered Aug 2, 2023 at 22:06

DarkHark

6931 gold badge8 silver badges21 bronze badges

Collectives™ on Stack Overflow

How to get rid of "Unnamed: 0" column in a pandas DataFrame read in from CSV file?

13 Answers 13

5 Comments

Workaround: `read_csv` with `index_col=[0]` argument

Stopgap Solution: Filtering with `str.match`

4 Comments

2 Comments

Method 1: Delete Unnamed Columns

Method 2: Rename Unnamed Columns

1 Comment

2 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

5 Comments

Workaround: read_csv with index_col=[0] argument

Stopgap Solution: Filtering with str.match

4 Comments

2 Comments

Method 1: Delete Unnamed Columns

Method 2: Rename Unnamed Columns

1 Comment

2 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Related

Workaround: `read_csv` with `index_col=[0]` argument

Stopgap Solution: Filtering with `str.match`