3

Running Python 3.8.1, 64 bit, on Windows 10.

I have a csv file with two columns. The first column does not have numeric values on every row (=empty cells in between cells with values) and second has numeric values on every row.

column_1 column_2
         200
13       201
         202
         203
         204
         205
129      206
16       207
         208

I read the csv file (shown above) with Pandas:

df = pd.read_csv("old.csv")

I make modifications to the Pandas dataframe and write to a new csv file with Pandas without the index column.

df.to_csv("new.csv", sep=',', encoding='utf-8', index=False)

The result is a csv file that has zeros in place of the original empty cells.

column_1,column_2
0,200
13,201
0,202
0,203
0,204
0,205
129,206
16,207
0,208

My question: how to modify my script to write empty cells instead of zeros (0) in the csv file (i.e. the rows where column_2 value is 200, 202, 203, 204, 205 and 208)?

2
  • I make modifications to the Pandas dataframe do you replace missing values to 0 ? Because pandas write only 0 if exist, if missing values write no value (so get for last value ,208) Commented Feb 20, 2020 at 11:13
  • @jezrael I am replacing every empty value to "" (empty string), and after that I print df.head(50) and it correctly shows the dataframe with empty cells. After that, I write the dataframe to csv and the zeros appear, which I don't want. Commented Feb 20, 2020 at 11:55

1 Answer 1

2

You can set 0 to missing values by Series.mask and for integers, convert the output to Int64, working in pandas 0.24+:

df = pd.DataFrame({'column_1': [0, 13, 0, 0, 0, 0, 129, 16, 0],
                   'column_2': [200, 201, 202, 203, 204, 205, 206, 207, 208]})
print (df)
   column_1  column_2
0         0       200
1        13       201
2         0       202
3         0       203
4         0       204
5         0       205
6       129       206
7        16       207
8         0       208

df['column_1'] = df['column_1'].mask(df['column_1'].eq(0)).astype('Int64')
print (df)
   column_1  column_2
0       NaN       200
1        13       201
2       NaN       202
3       NaN       203
4       NaN       204
5       NaN       205
6       129       206
7        16       207
8       NaN       208

df.to_csv("new.csv", sep=',', encoding='utf-8', index=False)

column_1,column_2
,200
13,201
,202
,203
,204
,205
129,206
16,207
,208

Another idea is to replace the empty strings:

df['column_1'] = df['column_1'].mask(df['column_1'].eq(0), '')
print (df)
  column_1  column_2
0                200
1       13       201
2                202
3                203
4                204
5                205
6      129       206
7       16       207
8                208

df.to_csv("new.csv", sep=',', encoding='utf-8', index=False)

column_1,column_2
,200
13,201
,202
,203
,204
,205
129,206
16,207
,208
Sign up to request clarification or add additional context in comments.

4 Comments

I would like to set empty string to missing values, not zero. Is this possible with your suggestion?
@jeppoo1 - in my solution is converted 0 values to NaNs, so if written to file get emty string, for you not working?
@jeppoo1 - Added sample data, for me working perfectly.
Thank you! Your sample code is very good. I just noticed that once again, Excel is not working as expected and it adds the zeros to the empty cells... If I open the csv in notepad or notepad++ there are empty cells as expected. So I believe my original solution works also, I just got distracted by Excel. Very annoying, thank you Microsoft!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.