python csv copy column

Question

I have a file containing following

first_name,last_name,uid,email,dep_code,dep_name
john,smith,jsmith,[email protected],finance,21230
john,king,jking,[email protected],human resource,31230

I want to copy column "email" and create a new column "email2" and then replace gmail.com from column email2 to hotmail.com

I'm new to python so need help from experts, I tried few script, but if there is a better way to do it then please let me know. The original file contains 60000 rows.

with open('c:\\Python27\\scripts\\colnewfile.csv', 'rb') as fp_in1, open('c:\\Python27\\scripts\\final.csv', 'wb') as fp_out1:
    writer1 = csv.writer(fp_out1, delimiter=",")
    reader1 = csv.reader(fp_in1, delimiter=",")
    domain = "@hotmail.com"
    for row in reader1:
        if row[2:3] == "uid":
            writer1.append("Email2")
        else:
            writer1.writerow(row+[row[2:3]])

Here is the final script, only problem is that it does not complete the entire outfile, it only show 61409 rows, whereas in the input file there are 61438 rows.

inFile = 'c:\Python27\scripts\in-093013.csv' outFile = 'c:\Python27\scripts\final.csv'

with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1: writer = csv.writer(fp_out1, delimiter=",") reader = csv.reader(fp_in1, delimiter=",") for col in reader: del col[6:] writer.writerow(col) headers = next(reader) writer.writerow(headers + ['email2']) for row in reader: if len(row) > 3: email = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(row + [email])

Martijn Pieters · Accepted Answer · 2013-10-11 18:41:26Z

1

If you call next() on the reader you get one row at at a time; use that to copy over the headers. Copying the email column is easy enough:

import csv

infilename = r'c:\Python27\scripts\colnewfile.csv'
outfilename = r'c:\Python27\scripts\final.csv'

with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
    reader = csv.reader(fp_in, delimiter=",")
    headers = next(reader)  # read first row

    writer = csv.writer(fp_out, delimiter=",")
    writer.writerow(headers + ['email2'])

    for row in reader:
        if len(row) > 3:
            # make sure there are at least 4 columns
            email = row[3].split('@', 1)[0] + '@hotmail.com'
        writer.writerow(row + [email])

This code splits the email address on the first @ sign, takes the first part of the split and adds @hotmail.com after it:

>>> '[email protected]'.split('@', 1)[0]
'example'
>>> '[email protected]'.split('@', 1)[0] + '@hotmail.com'
'[email protected]'

The above produces:

first_name,last_name,uid,email,dep_code,dep_name,email2
john,smith,jsmith,[email protected],finance,21230,[email protected]
john,king,jking,[email protected],human resource,31230,[email protected]

for your sample input.

edited Oct 11, 2013 at 18:41

answered Oct 11, 2013 at 18:35

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user2820987 Over a year ago

Here is the error I'm getting email = row[3] IndexError: list index out of range

Martijn Pieters Over a year ago

@user2820987: then you have empty rows in your input file, or at the very least rows that are too short. I'll adjust.

Martijn Pieters Over a year ago

Most likely the last line is empty; the rest of the data was successfully written.

user2820987 Over a year ago

inFile = 'c:\\Python27\\scripts\\in-093013.csv' outFile = 'c:\\Python27\\scripts\\final.csv' with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1: writer = csv.writer(fp_out1, delimiter=",") reader = csv.reader(fp_in1, delimiter=",") for col in reader: del col[6:] writer.writerow(col) headers = next(reader) writer.writerow(headers + ['email2']) for row in reader: if len(row) > 3: email = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(row + [email])

Martijn Pieters Over a year ago

You cannot loop over an open reader object twice; if you want to remove a column from the output do so in the one loop were you add a column as well.

yardstick17 · Accepted Answer · 2017-01-29 08:31:19Z

1

This can be done very cleanly using pandas . Here it goes:

In [1]: import pandas as pd

In [3]: df = pd.read_csv('your_csv_file.csv')

In [4]: def rename_email(row):
   ...:     return row.email.replace('gmail.com', 'hotmail.com')
   ...:

In [5]: df['email2'] = df.apply(rename_email, axis=1)

In [6]: """axis = 1 or ‘columns’: apply function to each row"""

In [7]: df
Out[7]:
  first_name last_name     uid             email        dep_code  dep_name              email2
0       john     smith  jsmith  [email protected]         finance     21230  [email protected]
1       john      king   jking   [email protected]  human resource     31230   [email protected]

In [8]: df.to_csv('new_update_email_file.csv')

edited Jan 29, 2017 at 8:31

answered Jan 29, 2017 at 8:25

yardstick17

4,6321 gold badge29 silver badges36 bronze badges

Collectives™ on Stack Overflow

python csv copy column

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related