2

I have a file containing following

first_name,last_name,uid,email,dep_code,dep_name
john,smith,jsmith,[email protected],finance,21230
john,king,jking,[email protected],human resource,31230

I want to copy column "email" and create a new column "email2" and then replace gmail.com from column email2 to hotmail.com

I'm new to python so need help from experts, I tried few script, but if there is a better way to do it then please let me know. The original file contains 60000 rows.

with open('c:\\Python27\\scripts\\colnewfile.csv', 'rb') as fp_in1, open('c:\\Python27\\scripts\\final.csv', 'wb') as fp_out1:
    writer1 = csv.writer(fp_out1, delimiter=",")
    reader1 = csv.reader(fp_in1, delimiter=",")
    domain = "@hotmail.com"
    for row in reader1:
        if row[2:3] == "uid":
            writer1.append("Email2")
        else:
            writer1.writerow(row+[row[2:3]])

Here is the final script, only problem is that it does not complete the entire outfile, it only show 61409 rows, whereas in the input file there are 61438 rows.

inFile = 'c:\Python27\scripts\in-093013.csv' outFile = 'c:\Python27\scripts\final.csv'

with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1: writer = csv.writer(fp_out1, delimiter=",") reader = csv.reader(fp_in1, delimiter=",") for col in reader: del col[6:] writer.writerow(col) headers = next(reader) writer.writerow(headers + ['email2']) for row in reader: if len(row) > 3: email = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(row + [email])

2 Answers 2

1

If you call next() on the reader you get one row at at a time; use that to copy over the headers. Copying the email column is easy enough:

import csv

infilename = r'c:\Python27\scripts\colnewfile.csv'
outfilename = r'c:\Python27\scripts\final.csv'

with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
    reader = csv.reader(fp_in, delimiter=",")
    headers = next(reader)  # read first row

    writer = csv.writer(fp_out, delimiter=",")
    writer.writerow(headers + ['email2'])

    for row in reader:
        if len(row) > 3:
            # make sure there are at least 4 columns
            email = row[3].split('@', 1)[0] + '@hotmail.com'
        writer.writerow(row + [email])

This code splits the email address on the first @ sign, takes the first part of the split and adds @hotmail.com after it:

>>> '[email protected]'.split('@', 1)[0]
'example'
>>> '[email protected]'.split('@', 1)[0] + '@hotmail.com'
'[email protected]'

The above produces:

first_name,last_name,uid,email,dep_code,dep_name,email2
john,smith,jsmith,[email protected],finance,21230,[email protected]
john,king,jking,[email protected],human resource,31230,[email protected]

for your sample input.

Sign up to request clarification or add additional context in comments.

5 Comments

Here is the error I'm getting email = row[3] IndexError: list index out of range
@user2820987: then you have empty rows in your input file, or at the very least rows that are too short. I'll adjust.
Most likely the last line is empty; the rest of the data was successfully written.
inFile = 'c:\\Python27\\scripts\\in-093013.csv' outFile = 'c:\\Python27\\scripts\\final.csv' with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1: writer = csv.writer(fp_out1, delimiter=",") reader = csv.reader(fp_in1, delimiter=",") for col in reader: del col[6:] writer.writerow(col) headers = next(reader) writer.writerow(headers + ['email2']) for row in reader: if len(row) > 3: email = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(row + [email])
You cannot loop over an open reader object twice; if you want to remove a column from the output do so in the one loop were you add a column as well.
1

This can be done very cleanly using pandas . Here it goes:

In [1]: import pandas as pd

In [3]: df = pd.read_csv('your_csv_file.csv')

In [4]: def rename_email(row):
   ...:     return row.email.replace('gmail.com', 'hotmail.com')
   ...:

In [5]: df['email2'] = df.apply(rename_email, axis=1)

In [6]: """axis = 1 or ‘columns’: apply function to each row"""

In [7]: df
Out[7]:
  first_name last_name     uid             email        dep_code  dep_name              email2
0       john     smith  jsmith  [email protected]         finance     21230  [email protected]
1       john      king   jking   [email protected]  human resource     31230   [email protected]

In [8]: df.to_csv('new_update_email_file.csv')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.