1

I am trying to create a clean csv file by merging some of variables together from an old file and appending them to a new csv file.

I have no problem running the data the first time. I get the output I want but whenever I try to append the data with a new variable (i.e. new column) it appends the variable to the bottom and the output is wonky.

I have basically been running the same code for each variable, except changing the groupvariables variable to my desired variables and then using the f2= open('outputfile.csv', "ab") <--- but with an ab for amend. Any help would be appreciated

groupvariables=['x','y']

f2  = open('outputfile.csv', "wb")
writer = csv.writer(f2, delimiter=",")
writer.writerow(("ID","Diagnosis"))

for line in csv_f:
    line = line.rstrip('\n')
    columns  = line.split(",")
    tempname = columns[0]
    tempindvar = columns[1:]

templist = []

for j in groupvariables:
    tempvar=tempindvar[headers.index(j)]
    if tempvar != ".":
        templist.append(tempvar)

newList = list(set(templist))

if len(newList) > 1:
    output = 'nomatch'
elif len(newList) == 0:
    output = "."
else:
    output = newList[0]

tempoutrow = (tempname,output)
writer.writerow(tempoutrow)

f2.close()

3
  • Not a real answer, but if you're looking to do anything significant with tabular data, including grouping and serializing to/from CSV consider looking into a library like Pandas Commented Dec 18, 2013 at 2:48
  • It's unclear from the code you've provide what you're trying to accomplish (because it doesn't match your description very well). Please provide a SSCCE with sample data. Commented Dec 18, 2013 at 3:08
  • Opening a file with mode='a' for append is for adding data starting at end of the file (new lines/rows). To add a column of data to a csv file will generally require appending something to every line of the original file and completely rewriting it. Commented Dec 18, 2013 at 3:13

2 Answers 2

2

CSV is a line-based file format, so the only way to add a column to an existing CSV file is to read it into memory and overwrite it entirely, adding the new column to each line.

If all you want to do is add lines, though, appending will work fine.

Sign up to request clarification or add additional context in comments.

2 Comments

Well, reading all of it into memory is one way. Another is to write to a tempfile and then os.rename() it afterward.
I have solved this problem many times with @dstromberg approach. Batch read each line in file A, transform in memory and append to File B. This keeps memory down to the # of lines in the Batch window.
0

Here is something that might help. I assumed the first field on each row in each csv file is a primary key for the record and can be used to match rows between the two files. The code below reads the records in from one file, stored them in a dictionary, then reads in the records from another file, appended the values to the dictionary, and writes out a new file. You can adapt this example to better fit your actual problem.

import csv
# using python3

db = {}
reader = csv.reader(open('t1.csv', 'r'))
for row in reader:
    key, *values = row
    db[key] = ','.join(values)

reader = csv.reader(open('t2.csv', 'r'))
for row in reader:
    key, *values = row
    if key in db:
        db[key] = db[key] + ',' + ','.join(values)
    else:
        db[key] = ','.join(values)

writer = open('combo.csv', 'w')
for key in sorted(db.keys()):
    writer.write(key + ',' + db[key] + '\n')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.