How can I get text on Python Scrapy

Question

import scrapy


class WanikaniSpider(scrapy.Spider):
    name = 'japandict'
    allowed_domains = ['www.japandict.com']
    start_urls = ['https://www.japandict.com/lists/jlpt5k']
    
           
    def parse(self, response):
        kanjiler = response.xpath("//div[@class='row']/div/div/div")
        for kanji in kanjiler:
            kanjiicon= kanji.xpath("//div[@class='row']/div/div/div/a/div/span")
            yield{
                'kanjiicon': kanjiicon
            }

I created spider like that. I wanna take kanjiicon as a text. But when I use .get .extract methods its returning empty.
How can I fix that?

score 0 · Accepted Answer · 2021-08-22 00:12:18Z

I'm getting output.

CODE:

import scrapy


class WanikaniSpider(scrapy.Spider):
    name = 'japandict'
    allowed_domains = ['www.japandict.com']
    start_urls = ['https://www.japandict.com/lists/jlpt5k']
    
           
    def parse(self, response):
        kanjiler = response.xpath('//*[@class="d-inline-block w-100 text-muted"]')
        for kanji in kanjiler:
            kanjiicon= kanji.xpath('.//*[@class="xlarge text-normal me-4"]/text()').get().replace('\n','').strip()
            
            yield {
                'kanjiicon': kanjiicon
            }

Output:

{'kanjiicon': '右'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '雨'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '円'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '下'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '何'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '火'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '外'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '学'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '間'}

Tom · Accepted Answer · 2021-08-21 23:27:42Z

0

You need to decode the string to utf-8, ascii doesn't cover Japanese chars.

Try something like:

kanjiicon = kanjiicon.decode('utf-8')

answered Aug 21, 2021 at 23:27

Tom

1,1582 gold badges8 silver badges14 bronze badges

Collectives™ on Stack Overflow

How can I get text on Python Scrapy

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related