5

I have 2 csv files. I need to replace a column in one file with a column from the other file but they have to stay sorted according to an ID column.

Here's an example:

file1:

ID, transect, 90mdist                                      
1, a, 10,                                                  
2, b, 20,                                                
3, c, 30,     

file2:

ID, transect, 90mdist                                
1, a, 50                                                   
2, b, 70                                                     
3, c, 90          

basically I created a new file with the correct 90mdist and I need to insert it into the old file but it has to line up with the same ID #.

It's my understanding that Python treats csv files as a string. so I can either use a dictionary or convert the data into a list and then change it? which way is best?

Any help would be greatly appreciated!!

3
  • in you example file2 is already corrected files, isn't it? so you could just rename the file. Commented Jul 21, 2009 at 14:25
  • Since the two files have the same set of ID values and columns do you mean you want to totally overwrite the first with the second? I'm not sure what you mean by 'insert' then -- maybe the example data is badly chosen and you can edit to clarify (desired output as well as inputs)? Commented Jul 21, 2009 at 14:28
  • exact duplicate stackoverflow.com/questions/1019200/… Commented Jul 21, 2009 at 15:11

4 Answers 4

7

The CSV Module in the Python Library is what you need here.

It allows you to read and write CSV files, treating lines a tuples or lists of items.

Just read in the file with the corrected values, store the in a dictionary keyed with the line's ID.

Then read in the second file, replacing the relevant column with the data from the dict and write out to a third file.

Done.

Sign up to request clarification or add additional context in comments.

1 Comment

+1: Write to a third file. Don't try to update a file in place.
2

Try this:

from __future__ import with_statement

import csv

def twiddle_csv(file1, file2):
    def mess_with_record(record):
        record['90mdist'] = 2 * int(record['90mdist']) + 30
    with open(file1, "r") as fin:
        with open(file2, "w") as fout:
            fields = ['ID', 'transect', '90mdist']
            reader = csv.DictReader(fin, fieldnames=fields)
            writer = csv.DictWriter(fout, fieldnames=fields)
            fout.write(",".join(fields) + '\n')
            reader.next()   # Skip the column header
            for record in reader:
                mess_with_record(record)
                writer.writerow(record)

if __name__ == '__main__':
    twiddle_csv('file1', 'file2')

A couple of caveats:

  • DictReader seems to use the first row as data, even if it matches the fields. Call reader.next() to skip.
  • Data rows cannot have trailing commas. They will be interpreted as empty columns.
  • DictWriter does not appear to write out the column headers. DIY.

1 Comment

Thank you for the sample. (At least for csv.reader and csv.writer) the python csv module requires you to open files in binary mode or you will get accidental newlines (see bugs.python.org/issue7198 ).
0

Once you have your csv lists, one easy way to replace a column in one matrix with another would be to transpose the matrices, replace the row, and then transpose back your edited matrix. Here is an example with your data:

csv1 = [['1', 'a', '10'], ['2', 'b', '20'], ['3', 'c', '30']]
csv2 = [['1', 'a', '50'], ['2', 'b', '70'], ['3', 'c', '90']]

# transpose in Python is zip(*myData)
transposedCSV1, transposedCSV2 = zip(*csv1), zip(*csv2)
print transposedCSV1
>>> [['1', '2', '3'], ['a', 'b', 'c'], ['10', '20', '30']]

csv1 = transposedCSV1[:2] + [transposedCSV2[2]]
print csv1
>>> [['1', '2', '3'], ['a', 'b', 'c'], ['50', '70', '90']]

csv1 = zip(*csv1)
print csv1
>>> [['1', 'a', '50'], ['2', 'b', '70'], ['3', 'c', '90']]

Comments

-1

If you're only doing this as a one-off, why bother with Python at all? Excel or OpenOffice Calc will open the two CSV files for you, then you can just cut and paste the column from one to the other.

If the two lists of IDs are not exactly the same then a simple VB macro would do it for you.

1 Comment

I like the first approach (using excel/calc), but writing a VB macro is the same effort as writing a small python script.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.