Scrapy CSV crawling

Question

'm trying to crawl some rows from CSV file using CSVFeedSpider The structure of the file is the next: id | category | price I need to crawl the rows which only have a spefic category "paid" I do the next:

class Outillage_spider(CSVFeedSpider):
name = 'domain.com'
allowed_domains = ['domain.com', 'www.domain.com']
start_urls = ('http://www.domain.com/file.csv',)

delimiter = ';'
headers = ['name', 'category', 'price']

def parse_row(self, response, row):
    categories = ['Bosch','Dolmar','Fein','Hitachi','Karcher','Leman','Makita','SDMO','Ski']
if row['category'] in categories:
        res = {}
        res['name'] = row['name']
        res['price'] = row['price']
        return load_product(res, response)
    else:
  return None

And the next I got:

      File "/home/rolikoff/web/scrapy_projects/local/lib/python2.7/site-packages/Scrapy-0.14.1-py2.7.egg/scrapy/contrib/spiders/feed.py", line 129, in parse_rows
    raise TypeError('You cannot return an "%s" object from a spider' % type(ret).__name__)
exceptions.TypeError: You cannot return an "NoneType" object from a spider

I think it happens when parse_row() returns None. But I'm not sure how to change the fucthion. Do you have any ideas?

Thanks Dmitry

reclosedev · Accepted Answer · 2012-02-01 15:46:32Z

1

Try to return empty list or tuple instead None

else:
    return []

And make sure, that load_product returns list, tuple, Item or Request

answered Feb 1, 2012 at 15:46

reclosedev

9,53237 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Arthur Neves · Accepted Answer · 2012-02-01 15:46:00Z

1

As far I am concerned you have to yield fields within the parse_row ! for example, this is an spider that I did for crawling of Podcasts URLs : https://github.com/arthurnn/podcast/blob/master/podcast/spiders/itunes_spider.py

I would remove the else! try this out:

  if row['category'] in categories:
        res = {}
        res['name'] = row['name']
        res['price'] = row['price']
        yield load_product(res, response)

However if you are not using a normal spider! For a CSVFeedSpider read my Edit bellow:

EDIT

In this case you have to return a BaseItem or a list or a tuple! if you look at the implementation of CSVFeedSpider http://dev.scrapy.org/browser/scrapy/contrib/spiders/feed.py?rev=1516 ! you will see that

edited Feb 1, 2012 at 15:46

answered Feb 1, 2012 at 15:37

Arthur Neves

12.2k8 gold badges63 silver badges74 bronze badges

2 Comments

KennyPowers Over a year ago

It doesn't allow to use yueld File "/home/rolikoff/web/scrapy_projects/local/lib/python2.7/site-packages/Scrapy-0.14.1-py2.7.egg/scrapy/contrib/spiders/feed.py", line 129, in parse_rows raise TypeError('You cannot return an "%s" object from a spider' % type(ret).__name__) exceptions.TypeError: You cannot return an "generator" object from a spider

KennyPowers Over a year ago

Ok, I've found the solution Replace return None with return () Thanks anyway!

Collectives™ on Stack Overflow

Scrapy CSV crawling

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related