How to Delete Rows CSV in python

Question

I'm trying to compare two csv files (fileA and fileB), and remove any rows from fileA that are not found in fileB. I want to be able to do this without creating a third file. I thought I could do this using the csv writer module but now I'm second guessing myself.

Currently, I'm using the following code to record my comparison data from file B:

removal_list = set()
with open('fileB', 'rb') as file_b:
    reader1 = csv.reader(file_b)
    next(reader1)
    for row in reader1:
        removal_list.add((row[0], row[2]))

This is where I'm stuck and do not know how to delete the rows:

with open('fileA', 'ab') as file_a:
    with open('fileB', 'rb') as file_b:
        writer = csv.writer(file_a)
            reader2 = csv.reader(file_b)
            next(reader2)
            for row in reader2:
                if (row[0], row[2]) not in removal_list:
                # If row was not present in file B, Delete it from file A.
                #stuck here:  writer.<HowDoIRemoveRow>(row)

sqlite is a flat-file based database and the drivers for it are included in modern versions of Python. It might be a better option considering what you are trying to do. — Burhan Khalid
– Burhan Khalid, Commented Apr 29, 2013 at 5:14
Sorry for the silly question but this will create an exact copy of the fileB, isn't it? — G M
– G M, Commented Jul 5, 2016 at 14:40

jamylak · Accepted Answer · 2013-04-29 05:15:12Z

8

This solution uses fileinput with inplace=True, which writes to a temporary file and then automatically renames it at the end to your file name. You can't remove rows from a file but you can rewrite it with only the ones you want.

if the keyword argument inplace=1 is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place.

fileA

h1,h2,h3
a,b,c
d,e,f
g,h,i
j,k,l

fileB

h1,h2,h3
a,b,c
1,2,3
g,h,i
4,5,6

import fileinput, sys, csv

with open('fileB', 'rb') as file_b:
    r = csv.reader(file_b)
    next(r) #skip header
    seen = {(row[0], row[2]) for row in r}

f = fileinput.input('fileA', inplace=True) # sys.stdout is redirected to the file
print next(f), # write header as first line

w = csv.writer(sys.stdout) 
for row in csv.reader(f):
   if (row[0], row[2]) in seen: # write it if it's in B
       w.writerow(row)

fileA

h1,h2,h3
a,b,c    
g,h,i

edited Apr 29, 2013 at 5:15

answered Apr 29, 2013 at 5:04

jamylak

134k30 gold badges238 silver badges240 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

David Cain Over a year ago

A subtle improvement not addressed in the explanation: this code uses a set, a far more optimal data structure for answering "is this data present?" than a list (which must be iterated over each time).

jamylak Over a year ago

@David Op also used a set though

David Cain Over a year ago

D'oh. S/he clearly did. Well, small bit of advice- don't call it a removal "list", or bone-headed people like me will get confused as to the variable's type. =)

justin Over a year ago

what version of python? I don't believe this syntax is 2.4 compatible

jamylak Over a year ago

@justin You tagged it as 2.7? You can just use set((row[0], row[1]) for row in r) instead

|

Lennart Regebro · Accepted Answer · 2013-04-29 04:51:35Z

3

CSV is not a database format. It is read and written as a whole. You can't remove rows in the middle. So the only way to do this without creating a third file is to read in the file completely in memory and then write it out, without the offending rows.

But in general it's better to use a third file.

answered Apr 29, 2013 at 4:51

Lennart Regebro

173k45 gold badges230 silver badges254 bronze badges

Comments

David Cain · Accepted Answer · 2013-04-29 04:58:24Z

3

As Lennart described, you can't modify a CSV file in-place as you iterate over it.

If you're really opposed to creating a third file, you might want to look into using a string buffer with StringIO, the idea being that you build up the new desired contents of file A in memory. At the end of your script, you can write the contents of the buffer over file A.

from cStringIO import StringIO


with open('fileB', 'rb') as file_b:
    new_a_buf = StringIO()
    writer = csv.writer(new_a_buf)
    reader2 = csv.reader(file_b)
    next(reader2)
    for row in reader2:
        if (row[0], row[2]) not in removal_list:
            writer.writerow(row)

# At this point, the contents (new_a_buf) exist in memory
with open('fileA', 'wb') as file_a:
    file_a.write(new_a_buf.getvalue())

edited Apr 29, 2013 at 4:58

answered Apr 29, 2013 at 4:53

David Cain

17.5k14 gold badges69 silver badges76 bronze badges

3 Comments

Burhan Khalid Over a year ago

A word of caution here: you may exhaust the available memory for your system if your input files are large.

jamylak Over a year ago

You may as well just write to a different file and rename it at the end, that is what my solution does

David Cain Over a year ago

@jamylak, I completely agree with you. And that's exactly what I would do in this situation. I just figured this would be useful in that in technically meets what the asker is looking for.

Collectives™ on Stack Overflow

How to Delete Rows CSV in python

3 Answers 3

6 Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related