How to insert a second header row in pandas df for csv write

Question

I have a very large pandas df I am writeing out to csv. I need to add a second header row containing the data types. The below code works but produces a third unexpected empty row in the CSV:

#! /usr/bin/env python
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

# get count of header columns, add REAL for each one
types_header_for_insert = list(df.columns.values)
for idx, val in enumerate(types_header_for_insert):
    types_header_for_insert[idx] = 'REAL'

# count number of index columns, then add STRING for each one
index_count = len(df.index.names)
for idx in range(0, index_count):
    df.reset_index(level=0, inplace=True)
    types_header_for_insert.insert(0, 'STRING')

# insert the new types column
df.columns = pd.MultiIndex.from_tuples(zip(df.columns, types_header_for_insert))

print df.columns.values

df.to_csv("./test.csv", index=False)

output:

index,A,B
STRING,REAL,REAL
,,
0,1,2
1,3,4

How can I get rid of this extra blank row? Where does it come from?

James · Accepted Answer · 2016-01-25 13:22:16Z

3

I used a work around in the end (a) write the original headers to csv (b) replace the headers with the second header line and append whole df to first file:

# write the header to the file only
pd.DataFrame(data=[df.columns]).to_csv("outfile.csv", header=False, index=False)

# now replace header
types_header_for_insert = list(df.columns.values)
for idx, val in enumerate(df.columns.values):
    if df[val].dtype == 'float64':
        types_header_for_insert[idx] = 'REAL'

    elif self.grouped[val].dtype == 'int64':
        types_header_for_insert[idx] = 'INTEGER'

    else:
        types_header_for_insert[idx] = 'STRING'

df.columns = types_header_for_insert

# append the whole df with new header
df.to_csv("outfile.csv", mode="a", float_format='%.3f', index=False)

edited Jan 25, 2016 at 13:22

answered Jan 25, 2016 at 11:57

James

4241 gold badge4 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2016-01-25 11:28:37Z

2

I think it is bug, see opened issue 6618.

Maybe help little trick - add types_header_for_insert before first row to data:

#! /usr/bin/env python
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

# get count of header columns, add REAL for each one
types_header_for_insert = list(df.columns.values)
for idx, val in enumerate(types_header_for_insert):
    types_header_for_insert[idx] = 'REAL'

# count number of index columns, then add STRING for each one
index_count = len(df.index.names)
for idx in range(0, index_count):
    df.reset_index(level=0, inplace=True)
    types_header_for_insert.insert(0, 'STRING')

# insert the new types column
#df.columns = pd.MultiIndex.from_tuples(zip(df.columns, types_header_for_insert))

#set new value to dataframe
df.loc[-1]  = types_header_for_insert

#sort index 
df = df.sort_index()
print df
#     index     A     B
#-1  STRING  REAL  REAL
# 0       0     1     2
# 1       1     3     4

print df.to_csv(index=False)
#index,A,B
#STRING,REAL,REAL
#0,1,2
#1,3,4

EDIT

In large df you can use append:

#empty df with column from df
df1 = pd.DataFrame(columns = df.columns)
#create series from types_header_for_insert
s = pd.Series(types_header_for_insert, index=df.columns)
print s
index    STRING
A          REAL
B          REAL
dtype: object

df1 = df1.append(s, ignore_index=True).append(df, ignore_index=True)
print df1
    index     A     B
0  STRING  REAL  REAL
1       0     1     2
2       1     3     4

print df1.to_csv(index=False)
index,A,B
STRING,REAL,REAL
0,1,2
1,3,4

edited Jan 25, 2016 at 11:28

answered Jan 22, 2016 at 21:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

James Over a year ago

Yes, works but the sort operation is not efficient on a large table with a more complex multikey index (takes 30 mins to sort for my dataframe). In this case it may be more efficient to create a new dataframe with a single row and the same headers then merge, instead of append and sort.

Parfait · Accepted Answer · 2016-01-23 01:44:27Z

0

In Python 3, the MultiIndex.from_tuples() fails with object of type 'zip' has no len(). However, wrapping the zip in list() works with no blank row. Consider trying it in Python 2:

df.columns = pd.MultiIndex.from_tuples(list(zip(df.columns, types_header_for_insert)))

print df.columns.values

df.to_csv("./test.csv", index=False)

#   index    A    B
#  STRING REAL REAL
#       0    1    2
#       1    3    4

Alternatively, to circumnavigate zip with list comprehension:

data = [df.columns, types_header_for_insert]
newcolumns = [tuple(i[j] for i in data) for j in range(min(len(l) for l in data))]
df.columns = pd.MultiIndex.from_tuples(newcolumns)

print df.columns.values

df.to_csv("./test.csv", index=False)

#   index    A    B
#  STRING REAL REAL
#       0    1    2
#       1    3    4

answered Jan 23, 2016 at 1:44

Parfait

108k19 gold badges102 silver badges138 bronze badges

2 Comments

James Over a year ago

The first approach with list(zip()) still gives me the blank line in pandas 0.16.1 - for various reasons, I'm not able to update at his point. @jezrael points to this known bug as cause - issue 6618.

James Over a year ago

No luck for second approach either - avoiding the approach with zip still gives the third empty line as in my first code snippet ",,". What pd version was this with?

Collectives™ on Stack Overflow

How to insert a second header row in pandas df for csv write

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related