1

I have this .csv file ...

id,first_name,last_name,email,date,opt-in
1,Jimmy,Reyes,[email protected],12/29/2016,FALSE
2,Doris,Wood,[email protected],04/22/2016,
3,Steven,Miller,[email protected],07/31/2016,FALSE
4,Earl,Parker,[email protected],01-08-17,FALSE
5,Barbara,Cruz,[email protected],12/30/2016,FALSE

I want to read the above shown csv file, transform data, and finally write the data in another text file, which should look like this ....

1,<tab>"first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE"
2,<tab>"first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016,,"opt-in"="0"

Also, If the opt-in value is empty, its should print "0".

Here is my code so far ....

import csv
import time

# Do the reading
with open('my-scripts/mock.csv', 'r') as f1:
 #next(f1, None)  # skip the headers
 reader = csv.reader(f1)
 new_rows_list = []
 for row in reader:
   if row[5] == '':
      new_row = [row[0],'\t',row[1], row[2], row[3], row[4], '0']
      new_rows_list.append(new_row)
   else:
      new_row = [row[0],'\t',row[1], row[2], row[3], row[4], row[5]]
      new_rows_list.append(new_row)   
 f1.close()   # <---IMPORTANT

# Do the writing
newfilename = 'my-scripts/ftp_745198_'+str(int(time.time()))
with open(newfilename, 'w', newline='') as f2:
 writer = csv.writer(f2, quoting=csv.QUOTE_NONNUMERIC)
 writer.writerows(new_rows_list)
 f2.close()

The above code is generating this output, which is not what I exactly want ... I am unable to figure out how to print column names in each row as shown above in the desired output ...!

"id","  ","first_name","last_name","email","date","opt-in"
"1","   ","Jimmy","Reyes","[email protected]","12/29/2016","FALSE"
"2","   ","Doris","Wood","[email protected]","04/22/2016","0"
"3","   ","Steven","Miller","[email protected]","07/31/2016","FALSE"
"4","   ","Earl","Parker","[email protected]","01-08-17","FALSE"
"5","   ","Barbara","Cruz","[email protected]","12/30/2016","FALSE"

New CSV

id,first_name,last_name,email,date,opt-in,unique_code
1,Jimmy,Reyes,[email protected],12/29/2016,FALSE,ER45DH
2,Doris,Wood,[email protected],04/22/2016,,MU34T3
3,Steven,Miller,[email protected],07/31/2016,FALSE,G34FGH
4,Earl,Parker,[email protected],01-08-17,FALSE,ASY67J
5,Barbara,Cruz,[email protected],12/30/2016,FALSE,NHG67P

New expected output

ER45DH<tab>"id"="1","first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE"
MU34T3<tab>"id"="2","first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016,"opt-in"="0"

I will really appreciate any help/ideas/pointers.

Thanks

5
  • you need '<print tab>' or tab? Commented Mar 23, 2017 at 11:00
  • I need tab '\t' Commented Mar 23, 2017 at 11:04
  • Curious, what makes you say the f1.close() is important? Commented Mar 23, 2017 at 12:22
  • @glibdud The file I am opening for reading has about 50 columns and contains about 150K rows. So just to be on safer side, I am closing it explicitly so that I don't get any out of memory errors. Although I read somewhere that, this is now not required, as its closed by default. Commented Mar 23, 2017 at 12:33
  • @PuneetSharma Yeah, as long as you use the with open... construct, it should be closed automatically when you exit the block. Commented Mar 23, 2017 at 12:39

3 Answers 3

1
  • Initially extract the header as a new list.

  • Then append header with each row elements as a string.

  • Write it to the file.

Please try this code,

import csv

with open('newfilename.csv', 'w') as f2:
    with open('mycsvfile.csv', mode='r') as infile:
        reader = csv.reader(infile)
        for i,rows in enumerate(reader):
            if i == 0:
               header = rows 
            else:
                if rows[5] == '':
                   rows[5] = 0;
                pat = rows[0]+'\t'+'''"%s=%%s",'''*(len(header)-1)+'\n'
                print pat
                f2.write(pat % tuple(header[1:]) % tuple(rows[1:]))
    f2.close()

Output:

1   "first_name=Jimmy","last_name=Reyes","[email protected]","date=12/29/2016","opt-in=FALSE",
2   "first_name=Doris","last_name=Wood","[email protected]","date=04/22/2016","opt-in=0",
3   "first_name=Steven","last_name=Miller","[email protected]","date=07/31/2016","opt-in=FALSE",
4   "first_name=Earl","last_name=Parker","[email protected]","date=01-08-17","opt-in=FALSE",
5   "first_name=Barbara","last_name=Cruz","[email protected]","date=12/30/2016","opt-in=FALSE",

Please let me know in terms of any queries.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks @Karthikeyan KR. I changed the following line to produce output according to my specifications and all worked great ... Many thanks for your help. pat = rows[0]+'\t'+'''"%s"="%%s",'''*(len(header)-1)+'\n'
Quick question. How do I not add a comma after the last value ?
Please change to the following pat = rows[0]+'\t'+'''"%s"="%%s",'''*(len(header)-2)+'''"%s"="%%s"\n'''
Thanks, Now I am getting a weird error ... File "my-scripts/filecon.py", line 21, in <module> f2.write(pat % tuple(header[1:]) % tuple(rows[1:])) File "c:\Python36\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 112-113: character maps to <undefined>
It's​ an Unicode error . You may have foreign characters. So try to encode the string in UTF-8 format.
|
1

You could keep the header in a list, then use the the list (like first_name, etc.) to match the elements in the followed lines (like Jimmy, etc.) to generate the output you want ("first_name"="Jimmy").

2 Comments

Thanks @Gang Yes, I was thinking of that approach, but I have around 50 columns. Doing it manually is a bit tedious, so I was thinking there must be some other more quicker/faster/efficient way...?
@PuneetSharma, as soon as the number of columns is fixed, a loop should do the job as you expects, I think.
1

Firstly, save the header into a variable. For example:

for i,row in enumerate(reader):
    if i == 0:
        header = row
    else:
        new_row = [row[0],'\t'] + ['%s=%s' % (header[j],row[j]) for j in range(1,6)]
        ....
...

Secondly, code such as [row[1], row[2], row[3], row[4], row[5]] can be simplified into [row[i] for i in range(1,6)] (generator)

Thridly, format is a good tool: print('"%s"="%s"'% (header[1],row[1])) will output "first_name"="Jimmy"

Use this knowledge and consider how to make it work.

2 Comments

Thanks @Zealseeker, I tried your suggestions, I am getting this error, header = row.split(',') AttributeError: 'list' object has no attribute 'split'
@PuneetSharma Sorry to have mistaken you. row was already the list of header. So you don't need to split it and just header=row is enough. Then you can use header[1],header[2] to get first_name and last_name. I'll revise my answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.