Removing csv rows in Python

Question

A script is getting links from a csv file and scrapes some info from webpages. Some links don't work and the script fumbles. I've included a try/except, but this messes up my output, since I need the exact amount of output rows as in the original file.

for row in reader:
    try:
        url = row[4]
        req=urllib2.Request(url)
        tree = lxml.html.fromstring(urllib2.urlopen(req).read())
    except:
        continue

Is there a way to delete the row from a csv file where there's a faulty link? Something like:

for row in reader:
    try:
        url = row[4]
        req=urllib2.Request(url)
        tree = lxml.html.fromstring(urllib2.urlopen(req).read())
    except:
        continue
        DELETE_THE_ROW

Why do you "need the exact amount of output rows as in the original file" — wwii
– wwii, Commented Oct 3, 2014 at 15:42

wwii · Accepted Answer · 2014-10-03 15:41:07Z

1

The best possible approach would be to create a new csv file and keep on writing only those rows whose links are valid.

f = open('another_csv.csv','w+')
for row in reader:
    try:
       url = row[4]
       req=urllib2.Request(url)
       tree = lxml.html.fromstring(urllib2.urlopen(req).read())
       print >>f,','.join(row)
    except:
       #can log the faulty links in another file
       continue
f.close()

You can rename the new csv to the original one, or keep both.

edited Oct 3, 2014 at 15:41

wwii

23.9k7 gold badges42 silver badges80 bronze badges

answered Oct 3, 2014 at 13:55

Anant Gupta

5161 gold badge8 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Zlo Over a year ago

That works, but with some complications. Since there are commas in the original file (like in article headlines), the new file with ',' delimiter is super messed up. Is there a way to circumvent this problem?

Anant Gupta Over a year ago

Here you go : print>>f, '"' + '","'.join(row) + '"'

Anant Gupta Over a year ago

Or you could directly user csv.writer as mentioned in @Yann. It'll quote only those fields which have comma in them. Using quotations for all fields also increases the file size.

Yann · Accepted Answer · 2014-10-03 13:53:26Z

0

If all goes well, why don't you write the good rows to another file?

writer = csv.writer(out_file_handle)
for row in reader:
    try:
        url = row[4]
        req=urllib2.Request(url)
        tree = lxml.html.fromstring(urllib2.urlopen(req).read())
    except:
        continue
    else:
       writer.writerow(row)

answered Oct 3, 2014 at 13:53

Yann

35.8k9 gold badges84 silver badges71 bronze badges

Collectives™ on Stack Overflow

Removing csv rows in Python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related