0

'm trying to crawl some rows from CSV file using CSVFeedSpider The structure of the file is the next: id | category | price I need to crawl the rows which only have a spefic category "paid" I do the next:

class Outillage_spider(CSVFeedSpider):
name = 'domain.com'
allowed_domains = ['domain.com', 'www.domain.com']
start_urls = ('http://www.domain.com/file.csv',)

delimiter = ';'
headers = ['name', 'category', 'price']

def parse_row(self, response, row):
    categories = ['Bosch','Dolmar','Fein','Hitachi','Karcher','Leman','Makita','SDMO','Ski']
if row['category'] in categories:
        res = {}
        res['name'] = row['name']
        res['price'] = row['price']
        return load_product(res, response)
    else:
  return None

And the next I got:

      File "/home/rolikoff/web/scrapy_projects/local/lib/python2.7/site-packages/Scrapy-0.14.1-py2.7.egg/scrapy/contrib/spiders/feed.py", line 129, in parse_rows
    raise TypeError('You cannot return an "%s" object from a spider' % type(ret).__name__)
exceptions.TypeError: You cannot return an "NoneType" object from a spider

I think it happens when parse_row() returns None. But I'm not sure how to change the fucthion. Do you have any ideas?

Thanks Dmitry

0

2 Answers 2

1

Try to return empty list or tuple instead None

else:
    return []

And make sure, that load_product returns list, tuple, Item or Request

Sign up to request clarification or add additional context in comments.

Comments

1

As far I am concerned you have to yield fields within the parse_row ! for example, this is an spider that I did for crawling of Podcasts URLs : https://github.com/arthurnn/podcast/blob/master/podcast/spiders/itunes_spider.py

I would remove the else! try this out:

  if row['category'] in categories:
        res = {}
        res['name'] = row['name']
        res['price'] = row['price']
        yield load_product(res, response)

However if you are not using a normal spider! For a CSVFeedSpider read my Edit bellow:

EDIT

In this case you have to return a BaseItem or a list or a tuple! if you look at the implementation of CSVFeedSpider http://dev.scrapy.org/browser/scrapy/contrib/spiders/feed.py?rev=1516 ! you will see that

2 Comments

It doesn't allow to use yueld File "/home/rolikoff/web/scrapy_projects/local/lib/python2.7/site-packages/Scrapy-0.14.1-py2.7.egg/scrapy/contrib/spiders/feed.py", line 129, in parse_rows raise TypeError('You cannot return an "%s" object from a spider' % type(ret).__name__) exceptions.TypeError: You cannot return an "generator" object from a spider
Ok, I've found the solution Replace return None with return () Thanks anyway!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.