Deleting row from CSV using python

Question

I have a csv file which contains links to webpages. I'm collecting data from each link and saving it in a separate csv file.
Now in case I have to resume the file from the point where I left it, I have to manually delete the entries from the csv file and then run the code.
I went through documentation for csv module, but couldn't find any function that serves this purpose.
I also went through all other questions on Stackoverflow and other sites regarding this, but none helps.
Is there a way to delete rows the way I want them to?

Here is what I have right now

import pandas as p

df = p.read_csv("All_Links.csv")

for i in df.index:
    try:
        url= df.ix[i]['MatchLink']

        #code process the data in the link

        #made sure that processing has finished
        #Now need to delete that row

the process of deleting content from the middle of a file is only accomplished by reading over a file and writing everything except the line(s) you want to skip. You can read in all of the lines of a CSV and then splice the array and then write the array back out to a file, but this is only accomplishing the same thing but with greater memory requirements — Jason Sperske
– Jason Sperske, Commented Aug 17, 2013 at 8:06
have you considered to use df.drop(i, 1), look at api doc: pandas.pydata.org/pandas-docs/stable/generated/… — lowtech
– lowtech, Commented Aug 20, 2013 at 18:17

Viktor Kerkez · Accepted Answer · 2013-08-17 10:41:15Z

If you want to write the rest of the data that isn't processed back to the csv file, that is delete only the data that is processed you can just modify your algorithm to:

import pandas as p

df = p.read_csv("All_Links.csv")

for i in df.index:
    try:
        url= df.ix[i]['MatchLink']
        #code process the data in the link
        #made sure that processing has finished
        df.iloc[i:].to_csv('All_links.csv', index=False)

But this will write your file on every iteration, maybe it's best to remember the value of i and do it once you finished all the iterations:

import pandas as p

df = p.read_csv("All_Links.csv")

i = 0
for i in df.index:
    try:
        url= df.ix[i]['MatchLink']
        #code process the data in the link
        #made sure that processing has finished
    except:
        # something broke, this row isn't processed decrease i
        i -= 1
        break

# Now write the rest of unprocessed lines to a csv file
df.iloc[i:].to_csv('All_links.csv', index=False)

elyase · Accepted Answer · 2013-08-17 06:15:40Z

1

Since you are already reading the whole file into the dataframe you can just start iterating from the point you left. Lets say you left on i=23, you can do:

import pandas as p

df = p.read_csv("All_Links.csv")

last_line_number = 23
for i in df.index[last_line_number:]:
    try:
        url= df.ix[i]['MatchLink']
        #code process the data in the link
        #made sure that processing has finished
        #Now need to delete that row

This is the simplest way. Something more robust would be to have 2 files, one for lines to be processed and one for processed lines.

edited Aug 17, 2013 at 6:15

answered Aug 17, 2013 at 6:09

elyase

41.2k12 gold badges121 silver badges123 bronze badges

4 Comments

md1hunox Over a year ago

Thanks for the answer, yes, thats one way to do it. But i'd wait if someone can answer the original question ie. "How I could delete the row", which would be the best for my application

elyase Over a year ago

Unfortunately with text files the only way is to write a new file or overwrite the existing one with the files you want each time. This is expensive. There is no way to delete only one line.

md1hunox Over a year ago

:-/ yeah, you are right, there are about 100,000 rows, and processing takes place in a single loop, anything related to file handling inside loop makes it super expensive. Thus I think @viktor's method is the best I can do.

elyase Over a year ago

yes thats a practical solution which should be performant enough and is more complete than mine.

Collectives™ on Stack Overflow

Deleting row from CSV using python

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related