1

Here is a code that I am writing

import csv
import openpyxl

def read_file(fn):
    rows = []

    with open(fn) as f:
        reader = csv.reader(f, quotechar='"',delimiter=",")
        for row in reader:
            if row:                     
                rows.append(row)
    return rows 


replace = {x[0]:x[1:] for x in read_file("replace.csv")}


delete = set( (row[0] for row in read_file("delete.csv")) )  


result = []

input_file="input.csv"
with open(input_file) as f:
    reader = csv.reader(f, quotechar='"')
    for row in reader:
        if row:
            if row[7] in delete:
                continue                                   
            elif row[7] in replace:

                result.append(replace[row[7]])   
            else:
                result.append(row)                       



with open ("done.csv", "w+", newline="") as f:
    w = csv.writer(f,quotechar='"', delimiter= ",")
    w.writerows(result)

here are my files:

input.csv:

c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-","aaaaa","-","-","bbbbb","-",","
"-","-","-","-","-","-","-","ccccc","-","-","ddddd","-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","

this is a 13 column csv. I am interested only in the 8th and the 11th fields.

this is my replace.csv:

"aaaaa","11111","22222"

delete.csv:

ccccc

so what I am doing is compare the first column of replace.csv(line by line) with the 8th column of input.csv and if they match then replace 8th column of input.csv with the second column of replace.csv and 11th column of input with the 3rd column of replace.csv and for delete.csv it compares both files line by line and if match is found it deletes the entire row. and if any line is not present in either replace.csv or delete.csv then print the line as it is. so my desired output is:

c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-",11111,"-","-",22222,"-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","

but when I run this code it gives me an output like this:

c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
11111,22222

where am I going wrong? I am trying to make changes to my program that I had earlier posted a question about.Since the input file has changed I am trying to make changes to my program. https://stackoverflow.com/a/54388144/9279313

4
  • Possible duplicate of How to delete and replace columns in a csv file by comparing it to other csv files in python? Commented Jan 31, 2019 at 6:18
  • @Andreas that post is also mine.I have referenced it here. I am making changes to that program as the input files have changed.So i need help Commented Jan 31, 2019 at 6:19
  • Does this answer your question? @AnujKulkarni Commented Jan 31, 2019 at 6:21
  • @SafeDev if (8th column of input.csv==1st column of replace.csv) then 8th column of input.csv=2nd column of replace.csv and 11th column of input.csv=3rd column of replace.csv this is what i am trying to do Commented Jan 31, 2019 at 6:27

2 Answers 2

2

@anuj I think SafeDev's solution is optimal but if you don't want to go with pandas, just make little changes in your code.

for row in reader:
    if row:
        if row[7] in delete:
            continue                                   
        elif row[7] in replace:
            key = row[7]
            row[7] = replace[key][0]
            row[10]= replace[key][1]
            result.append(row)
        else:
            result.append(row)  

Hope this solves your issue.

Sign up to request clarification or add additional context in comments.

Comments

1

It's actually quite simple. Instead of making it by scratch just use the panda library. From there it's easier to handle any dataset. This is how you would do it:

EDIT:

import pandas as pd

input_csv = pd.read_csv('input.csv')
replace_csv = pd.read_csv('replace.csv', header=None)
delete_csv = pd.read_csv('delete.csv')

r_lst = [i for i in replace_csv.iloc[:, 0]]
d_lst = [i for i in delete_csv]

input2_csv = pd.DataFrame.copy(input_csv)
for i, row in input_csv.iterrows():
    if row['c8'] in r_lst:
        input2_csv.loc[i, 'c8'] = replace_csv.iloc[r_lst.index(row['c8']), 1]
        input2_csv.loc[i, 'c11'] = replace_csv.iloc[r_lst.index(row['c8']), 2]
    if row['c8'] in d_lst:
        input2_csv = input2_csv[input2_csv.c8 != row['c8']]

input2_csv.to_csv('output.csv', index=False)

This process can be made even more dynamic by turning it into a function that has parameters of column names and replacing 'c8' and 'c11' with those two parameters.

8 Comments

in the delete part I dont want data to be hardcoded I want to take input from a file.Since these are dummy files I have provided less data.Actual data ma contain multiple lines
also pandas library will add the numerical indexes which is undesired. also this code strips the commas and all the columns get appended to one column
I was originally confused with question but I believe this approach should better fit your needs
sorry if it was confusing.It inlcudes different tasks and i tried my best to simplify it.
does this answer your question though?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.