Replacing and deleting columns from a csv using python

Question

Here is a code that I am writing

import csv
import openpyxl

def read_file(fn):
    rows = []

    with open(fn) as f:
        reader = csv.reader(f, quotechar='"',delimiter=",")
        for row in reader:
            if row:                     
                rows.append(row)
    return rows 


replace = {x[0]:x[1:] for x in read_file("replace.csv")}


delete = set( (row[0] for row in read_file("delete.csv")) )  


result = []

input_file="input.csv"
with open(input_file) as f:
    reader = csv.reader(f, quotechar='"')
    for row in reader:
        if row:
            if row[7] in delete:
                continue                                   
            elif row[7] in replace:

                result.append(replace[row[7]])   
            else:
                result.append(row)                       



with open ("done.csv", "w+", newline="") as f:
    w = csv.writer(f,quotechar='"', delimiter= ",")
    w.writerows(result)

here are my files:

input.csv:

c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-","aaaaa","-","-","bbbbb","-",","
"-","-","-","-","-","-","-","ccccc","-","-","ddddd","-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","

this is a 13 column csv. I am interested only in the 8th and the 11th fields.

this is my replace.csv:

"aaaaa","11111","22222"

delete.csv:

ccccc

so what I am doing is compare the first column of replace.csv(line by line) with the 8th column of input.csv and if they match then replace 8th column of input.csv with the second column of replace.csv and 11th column of input with the 3rd column of replace.csv and for delete.csv it compares both files line by line and if match is found it deletes the entire row. and if any line is not present in either replace.csv or delete.csv then print the line as it is. so my desired output is:

c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-",11111,"-","-",22222,"-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","

but when I run this code it gives me an output like this:

c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
11111,22222

where am I going wrong? I am trying to make changes to my program that I had earlier posted a question about.Since the input file has changed I am trying to make changes to my program. https://stackoverflow.com/a/54388144/9279313

Possible duplicate of How to delete and replace columns in a csv file by comparing it to other csv files in python? — Andreas
– Andreas, Commented Jan 31, 2019 at 6:18
@Andreas that post is also mine.I have referenced it here. I am making changes to that program as the input files have changed.So i need help — Anuj Kulkarni
– Anuj Kulkarni, Commented Jan 31, 2019 at 6:19
@SafeDev if (8th column of input.csv==1st column of replace.csv) then 8th column of input.csv=2nd column of replace.csv and 11th column of input.csv=3rd column of replace.csv this is what i am trying to do — Anuj Kulkarni
– Anuj Kulkarni, Commented Jan 31, 2019 at 6:27

Vikramd · Accepted Answer · 2019-01-31 08:58:20Z

2

@anuj I think SafeDev's solution is optimal but if you don't want to go with pandas, just make little changes in your code.

for row in reader:
    if row:
        if row[7] in delete:
            continue                                   
        elif row[7] in replace:
            key = row[7]
            row[7] = replace[key][0]
            row[10]= replace[key][1]
            result.append(row)
        else:
            result.append(row)

Hope this solves your issue.

answered Jan 31, 2019 at 8:58

Vikramd

2241 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

SafeDev · Accepted Answer · 2019-01-31 06:54:46Z

1

It's actually quite simple. Instead of making it by scratch just use the panda library. From there it's easier to handle any dataset. This is how you would do it:

EDIT:

import pandas as pd

input_csv = pd.read_csv('input.csv')
replace_csv = pd.read_csv('replace.csv', header=None)
delete_csv = pd.read_csv('delete.csv')

r_lst = [i for i in replace_csv.iloc[:, 0]]
d_lst = [i for i in delete_csv]

input2_csv = pd.DataFrame.copy(input_csv)
for i, row in input_csv.iterrows():
    if row['c8'] in r_lst:
        input2_csv.loc[i, 'c8'] = replace_csv.iloc[r_lst.index(row['c8']), 1]
        input2_csv.loc[i, 'c11'] = replace_csv.iloc[r_lst.index(row['c8']), 2]
    if row['c8'] in d_lst:
        input2_csv = input2_csv[input2_csv.c8 != row['c8']]

input2_csv.to_csv('output.csv', index=False)

This process can be made even more dynamic by turning it into a function that has parameters of column names and replacing 'c8' and 'c11' with those two parameters.

edited Jan 31, 2019 at 6:54

answered Jan 31, 2019 at 6:19

SafeDev

6715 silver badges16 bronze badges

8 Comments

Anuj Kulkarni Over a year ago

in the delete part I dont want data to be hardcoded I want to take input from a file.Since these are dummy files I have provided less data.Actual data ma contain multiple lines

Anuj Kulkarni Over a year ago

also pandas library will add the numerical indexes which is undesired. also this code strips the commas and all the columns get appended to one column

SafeDev Over a year ago

I was originally confused with question but I believe this approach should better fit your needs

Anuj Kulkarni Over a year ago

sorry if it was confusing.It inlcudes different tasks and i tried my best to simplify it.

SafeDev Over a year ago

does this answer your question though?

|

Collectives™ on Stack Overflow

Replacing and deleting columns from a csv using python

2 Answers 2

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related