Python: Read CSV and write to file using a custom format

Question

I have this .csv file ...

id,first_name,last_name,email,date,opt-in
1,Jimmy,Reyes,[email protected],12/29/2016,FALSE
2,Doris,Wood,[email protected],04/22/2016,
3,Steven,Miller,[email protected],07/31/2016,FALSE
4,Earl,Parker,[email protected],01-08-17,FALSE
5,Barbara,Cruz,[email protected],12/30/2016,FALSE

I want to read the above shown csv file, transform data, and finally write the data in another text file, which should look like this ....

1,<tab>"first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE"
2,<tab>"first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016,,"opt-in"="0"

Also, If the opt-in value is empty, its should print "0".

Here is my code so far ....

import csv
import time

# Do the reading
with open('my-scripts/mock.csv', 'r') as f1:
 #next(f1, None)  # skip the headers
 reader = csv.reader(f1)
 new_rows_list = []
 for row in reader:
   if row[5] == '':
      new_row = [row[0],'\t',row[1], row[2], row[3], row[4], '0']
      new_rows_list.append(new_row)
   else:
      new_row = [row[0],'\t',row[1], row[2], row[3], row[4], row[5]]
      new_rows_list.append(new_row)   
 f1.close()   # <---IMPORTANT

# Do the writing
newfilename = 'my-scripts/ftp_745198_'+str(int(time.time()))
with open(newfilename, 'w', newline='') as f2:
 writer = csv.writer(f2, quoting=csv.QUOTE_NONNUMERIC)
 writer.writerows(new_rows_list)
 f2.close()

The above code is generating this output, which is not what I exactly want ... I am unable to figure out how to print column names in each row as shown above in the desired output ...!

"id","  ","first_name","last_name","email","date","opt-in"
"1","   ","Jimmy","Reyes","[email protected]","12/29/2016","FALSE"
"2","   ","Doris","Wood","[email protected]","04/22/2016","0"
"3","   ","Steven","Miller","[email protected]","07/31/2016","FALSE"
"4","   ","Earl","Parker","[email protected]","01-08-17","FALSE"
"5","   ","Barbara","Cruz","[email protected]","12/30/2016","FALSE"

New CSV

id,first_name,last_name,email,date,opt-in,unique_code
1,Jimmy,Reyes,[email protected],12/29/2016,FALSE,ER45DH
2,Doris,Wood,[email protected],04/22/2016,,MU34T3
3,Steven,Miller,[email protected],07/31/2016,FALSE,G34FGH
4,Earl,Parker,[email protected],01-08-17,FALSE,ASY67J
5,Barbara,Cruz,[email protected],12/30/2016,FALSE,NHG67P

New expected output

ER45DH<tab>"id"="1","first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE"
MU34T3<tab>"id"="2","first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016,"opt-in"="0"

I will really appreciate any help/ideas/pointers.

Thanks

@glibdud The file I am opening for reading has about 50 columns and contains about 150K rows. So just to be on safer side, I am closing it explicitly so that I don't get any out of memory errors. Although I read somewhere that, this is now not required, as its closed by default. — Slyper
– Slyper, Commented Mar 23, 2017 at 12:33
@PuneetSharma Yeah, as long as you use the with open... construct, it should be closed automatically when you exit the block. — glibdud
– glibdud, Commented Mar 23, 2017 at 12:39

Karthikeyan KR · Accepted Answer · 2017-03-23 13:56:02Z

1

Initially extract the header as a new list.
Then append header with each row elements as a string.
Write it to the file.

Please try this code,

import csv

with open('newfilename.csv', 'w') as f2:
    with open('mycsvfile.csv', mode='r') as infile:
        reader = csv.reader(infile)
        for i,rows in enumerate(reader):
            if i == 0:
               header = rows 
            else:
                if rows[5] == '':
                   rows[5] = 0;
                pat = rows[0]+'\t'+'''"%s=%%s",'''*(len(header)-1)+'\n'
                print pat
                f2.write(pat % tuple(header[1:]) % tuple(rows[1:]))
    f2.close()

Output:

1   "first_name=Jimmy","last_name=Reyes","[email protected]","date=12/29/2016","opt-in=FALSE",
2   "first_name=Doris","last_name=Wood","[email protected]","date=04/22/2016","opt-in=0",
3   "first_name=Steven","last_name=Miller","[email protected]","date=07/31/2016","opt-in=FALSE",
4   "first_name=Earl","last_name=Parker","[email protected]","date=01-08-17","opt-in=FALSE",
5   "first_name=Barbara","last_name=Cruz","[email protected]","date=12/30/2016","opt-in=FALSE",

Please let me know in terms of any queries.

answered Mar 23, 2017 at 13:56

Karthikeyan KR

1,1841 gold badge18 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Slyper Over a year ago

Thanks @Karthikeyan KR. I changed the following line to produce output according to my specifications and all worked great ... Many thanks for your help. pat = rows[0]+'\t'+'''"%s"="%%s",'''*(len(header)-1)+'\n'

Slyper Over a year ago

Quick question. How do I not add a comma after the last value ?

Karthikeyan KR Over a year ago

Please change to the following pat = rows[0]+'\t'+'''"%s"="%%s",'''*(len(header)-2)+'''"%s"="%%s"\n'''

Slyper Over a year ago

Thanks, Now I am getting a weird error ...

File "my-scripts/filecon.py", line 21, in <module>     f2.write(pat % tuple(header[1:]) % tuple(rows[1:]))   File "c:\Python36\lib\encodings\cp1252.py", line 19, in encode     return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 112-113: character maps to <undefined>

Karthikeyan KR Over a year ago

It's an Unicode error . You may have foreign characters. So try to encode the string in UTF-8 format.

|

Gang YIN · Accepted Answer · 2017-03-23 10:59:15Z

1

You could keep the header in a list, then use the the list (like first_name, etc.) to match the elements in the followed lines (like Jimmy, etc.) to generate the output you want ("first_name"="Jimmy").

answered Mar 23, 2017 at 10:59

Gang YIN

2,6072 gold badges23 silver badges26 bronze badges

2 Comments

Slyper Over a year ago

Thanks @Gang Yes, I was thinking of that approach, but I have around 50 columns. Doing it manually is a bit tedious, so I was thinking there must be some other more quicker/faster/efficient way...?

Gang YIN Over a year ago

@PuneetSharma, as soon as the number of columns is fixed, a loop should do the job as you expects, I think.

Zealseeker · Accepted Answer · 2017-03-23 12:55:43Z

1

Firstly, save the header into a variable. For example:

for i,row in enumerate(reader):
    if i == 0:
        header = row
    else:
        new_row = [row[0],'\t'] + ['%s=%s' % (header[j],row[j]) for j in range(1,6)]
        ....
...

Secondly, code such as [row[1], row[2], row[3], row[4], row[5]] can be simplified into [row[i] for i in range(1,6)] (generator)

Thridly, format is a good tool: print('"%s"="%s"'% (header[1],row[1])) will output "first_name"="Jimmy"

Use this knowledge and consider how to make it work.

edited Mar 23, 2017 at 12:55

answered Mar 23, 2017 at 12:06

Zealseeker

8431 gold badge8 silver badges25 bronze badges

2 Comments

Slyper Over a year ago

Thanks @Zealseeker, I tried your suggestions, I am getting this error, header = row.split(',') AttributeError: 'list' object has no attribute 'split'

Zealseeker Over a year ago

@PuneetSharma Sorry to have mistaken you. row was already the list of header. So you don't need to split it and just header=row is enough. Then you can use header[1],header[2] to get first_name and last_name. I'll revise my answer

Collectives™ on Stack Overflow

Python: Read CSV and write to file using a custom format

3 Answers 3

7 Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related