I am saving the output of web scrawling using scrapy in a csv file. The crawling itself seems to be working correctly, but I am not happy with the format of the output saved in csv file. I crawl 20 webpages where each page contains 100 job titles and their respective urls. So I am expecting the output looking like this:
url1, title1
url2, title2
...
...
url1999, title1999
url2000, title2000
however, the actual output in csv looks like this:
url1 url2 ... url100, title1 title2 ... title100
url101 url02 ... url200, title101 title102 ... title200
...
url1901 url902 ... url2000, title1901 title1902 ... title2000
My Spider code is:
import scrapy
class TextPostItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
class MySpider(scrapy.Spider):
name = "craig_spider"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/search/npo"]
def parse(self, response):
number = 0
for page in range(0, 20):
yield scrapy.Request("http://sfbay.craigslist.org/search/npo?=%s" % number, callback=self.parse_item, dont_filter=True)
number += 100
def parse_item(self, response):
item = TextPostItem()
item['title'] =response.xpath("//span[@class='pl']/a/text()").extract()
item['link'] = response.xpath("//span[@class='pl']/a/@href").extract()
return item
My csv code is:
scrapy crawl craig_spider -o craig.csv -t csv
Any suggestion? Thanks.
csvcode?