Python pandas: output dataframe to csv with integers

Question

I have a pandas.DataFrame that I wish to export to a CSV file. However, pandas seems to write some of the values as float instead of int types. I couldn't not find how to change this behavior.

Building a data frame:

df = pandas.DataFrame(columns=['a','b','c','d'], index=['x','y','z'], dtype=int)
x = pandas.Series([10,10,10], index=['a','b','d'], dtype=int)
y = pandas.Series([1,5,2,3], index=['a','b','c','d'], dtype=int)
z = pandas.Series([1,2,3,4], index=['a','b','c','d'], dtype=int)
df.loc['x']=x; df.loc['y']=y; df.loc['z']=z

View it:

>>> df
    a   b    c   d
x  10  10  NaN  10
y   1   5    2   3
z   1   2    3   4

Export it:

>>> df.to_csv('test.csv', sep='\t', na_rep='0', dtype=int)
>>> for l in open('test.csv'): print l.strip('\n')
        a       b       c       d
x       10.0    10.0    0       10.0
y       1       5       2       3
z       1       2       3       4

Why do the tens have a dot zero ?

Sure, I could just stick this function into my pipeline to reconvert the whole CSV file, but it seems unnecessary:

def lines_as_integer(path):
    handle = open(path)
    yield handle.next()
    for line in handle:
        line = line.split()
        label = line[0]
        values = map(float, line[1:])
        values = map(int, values)
        yield label + '\t' + '\t'.join(map(str,values)) + '\n'
handle = open(path_table_int, 'w')
handle.writelines(lines_as_integer(path_table_float))
handle.close()

@Andy Why should I do that ? Namespaces are a great idea... until you abbreviate them all and it becomes unreadable. — xApple
– xApple, Commented Sep 22, 2015 at 15:08
@AndyHayden Longer to type, but definitely easier to read. To a novice stumbling on the code, pd signifies Police Department. Or worse if he speaks french. — xApple
– xApple, Commented Nov 18, 2015 at 16:31
It's just a convention - use it, or don't use it - depends on the expectation of who your audience is likely to be - For many pandas users, the convention is to use pd, just as in the UK, the convention is to drive on the left. It's not a problem until you have to share the same stretch of road. — Thomas Kimber
– Thomas Kimber, Commented Nov 1, 2016 at 17:11
I don't think that analogy is adequate, because driving on the left is incompatible with driving on the right. However, using the full package name works fine for a veteran that knows about the abbreviation standard, while the opposite is not true (a novice is baffled by pd). — xApple
– xApple, Commented Feb 24, 2019 at 15:51

Eric Leung · Accepted Answer · 2021-04-01 08:08:44Z

22

The answer I was looking for was a slight variation of what @Jeff proposed in his answer. The credit goes to him. This is what solved my problem in the end for reference:

import pandas
df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])
df = df.fillna(0)
df = df.astype(int)
df.to_csv('test.csv', sep='\t')

edited Apr 1, 2021 at 8:08

Eric Leung

2,6521 gold badge18 silver badges25 bronze badges

answered Sep 3, 2013 at 9:42

xApple

6,5249 gold badges50 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Cyrille Over a year ago

This gets around having any floats but you lose the NaN info. Perhaps fill NA with -9999 or some value that you know is not 'real' in your data set.

Tad Over a year ago

you may refer to my answer below to preserve NaN

Sigur Over a year ago

How to do that only for one column? My df has mixed types, strings and numbers.

laviex Over a year ago

if your data are natural numbers (nonnegative integers), using df.fillna(-1) is an option.

Andy Hayden · Accepted Answer · 2013-06-13 16:50:13Z

18

This is a "gotcha" in pandas (Support for integer NA), where integer columns with NaNs are converted to floats.

This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object arrays instead.

answered Jun 13, 2013 at 16:50

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

2 Comments

xApple Over a year ago

So no way to get them as integers without reparsing the whole file ? How about if I use df.fillna() ?

Andy Hayden Over a year ago

Use dtype=object (rather than int) when creating x and df.

Jeff · Accepted Answer · 2013-06-13 17:29:20Z

10

The problem is that since you are assigning things by rows, but dtypes are grouped by columns, so things get cast to object dtype, which is not a good thing, you lose all efficiency. So one way is to convert which will coerce to float/int dtype as needed.

As we answered in another question, if you construct the frame all at once (or construct column by column) this step will not be needed

In [23]: def convert(x):
   ....:     try:
   ....:         return x.astype(int)
   ....:     except:
   ....:         return x
   ....:     

In [24]: df.apply(convert)
Out[24]: 
    a   b   c   d
x  10  10 NaN  10
y   1   5   2   3
z   1   2   3   4

In [25]: df.apply(convert).dtypes
Out[25]: 
a      int64
b      int64
c    float64
d      int64
dtype: object

In [26]: df.apply(convert).to_csv('test.csv')

In [27]: !cat test.csv
,a,b,c,d
x,10,10,,10
y,1,5,2.0,3
z,1,2,3.0,4

edited Jun 13, 2013 at 17:29

answered Jun 13, 2013 at 17:05

Jeff

130k21 gold badges223 silver badges189 bronze badges

11 Comments

Andy Hayden Over a year ago

But then there is .0s in the c columns... :s

Jeff Over a year ago

because its a float! no choice there (well you CAN pass float_format='%.0f' to to_csv but that is could lead to loss of precision –

Andy Hayden Over a year ago

But but..., if you use dtype=object (e.g. in x and df via OP's construction, which I agree is not best way) then 2, 3 and 10s are all ints... it's almost always not worth worrying about anyway. This seems just like the transpose of OP's effort :s

Jeff Over a year ago

yep...keep stressing that having object dtype for numbers is bad....maybe we should put in a PerformanceWarning if that occurs (e.g. like in this case)....

Andy Hayden Over a year ago

If they have gone out of their way to choose dtype=object though, surely they deserve what they get (if they don't they'd get a float). A better solution would for numpy to support NaNs in integer arrays... ;)

|

MERose · Accepted Answer · 2021-09-07 14:44:35Z

9

The simplest solution is to use float_format in pd.read_csv():

df.to_csv('test.csv', sep='\t', na_rep=0, float_format='%.0f')

But this applies to all float columns. BTW: Using your code on pandas 1.1.5, all of my columns are float.

Output:

    a   b   c   d
x   10  10  0   10
y   1   5   2   3
z   1   2   3   4

Without float_format:

    a   b   c   d
x   10.0    10.0    0    10.0
y    1.0     5.0    2.0   3.0
z    1.0     2.0    3.0   4.0

answered Sep 7, 2021 at 14:44

MERose

4,4817 gold badges57 silver badges86 bronze badges

1 Comment

Deeepdigger Over a year ago

This is by far the best and most precise answer, it does exactly what was asked for in the question. Should get more upvotes. Solved my (same) problem, thanks!

Tad · Accepted Answer · 2018-08-19 18:57:33Z

8

If you want to preserve NaN info in the csv which you have exported, then do the below. P.S : I'm concentrating on column 'C' in this case.

df[c] = df[c].fillna('')       #filling Nan with empty string
df[c] = df[c].astype(str)      #convert the column to string 
>>> df
    a   b    c     d
x  10  10         10
y   1   5    2.0   3
z   1   2    3.0   4

df[c] = df[c].str.split('.')   #split the float value into list based on '.'
>>> df
        a   b    c          d
    x  10  10   ['']       10
    y   1   5   ['2','0']   3
    z   1   2   ['3','0']   4

df[c] = df[c].str[0]            #select 1st element from the list
>>> df
    a   b    c   d
x  10  10       10
y   1   5    2   3
z   1   2    3   4

Now, if you export the dataframe to csv, Column 'c' will not have float values and the NaN info is preserved.

answered Aug 19, 2018 at 18:57

Tad

9139 silver badges19 bronze badges

1 Comment

xApple Over a year ago

This solution is nice, but it supposes you know in which column you have missing data, which is rarely the case.

appsdownload · Accepted Answer · 2019-02-23 05:35:29Z

1

You can use astype() to specify data type for each column

For example:

import pandas
df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])

df = df.astype({"a": int, "b": complex, "c" : float, "d" : int})

answered Feb 23, 2019 at 5:35

appsdownload

8612 gold badges11 silver badges20 bronze badges

Comments

Jean-François Fabre · Accepted Answer · 2021-03-09 09:49:25Z

1

Just write it out as string to csv:

df.to_csv('test.csv', sep='\t', na_rep='0', dtype=str)

edited Mar 9, 2021 at 9:49

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

answered Mar 9, 2021 at 3:37

Sam Wang

375 bronze badges

2 Comments

Jingnan Jia Over a year ago

It does not work at all. TypeError: to_csv() got an unexpected keyword argument 'dtype'

Sam Wang Over a year ago

if it doesn't work, use astype() to convert the data type

LearnDude · Accepted Answer · 2019-09-25 19:29:44Z

0

You can change your DataFrame into Numpy array as a workaround:

 np.savetxt(savepath, np.array(df).astype(np.int), fmt='%i', delimiter = ',', header= 'PassengerId,Survived', comments='')

answered Sep 25, 2019 at 19:29

LearnDude

12310 bronze badges

Comments

arthurq · Accepted Answer · 2023-01-15 12:12:58Z

0

Here is yet another solution:

df['IntColumnWithNAValues'].fillna(0, inplace=True) #Fill with a value that is out of your range

df['IntColumnWithNAValues'] = df['IntColumnWithNAValues'].astype(int)

df['IntColumnWithNAValues'].replace(0, '', inplace=True)

.csv files doesn't differentiate between NA or '' (empty string) as it as a text file, so you get to keep your missing fields while converting non null values to int.

You can do this for every column that you want; If you have lots of columns it might be a problem.

edited Jan 15, 2023 at 12:12

answered Jan 14, 2023 at 13:36

arthurq

3741 silver badge7 bronze badges

Collectives™ on Stack Overflow

Python pandas: output dataframe to csv with integers

9 Answers 9

4 Comments

2 Comments

11 Comments

1 Comment

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

4 Comments

2 Comments

11 Comments

1 Comment

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related