CSS Selector HTML with Scrapy Python

Question

I am trying to make a web crawler to pull some information from Yahoo Finance as a personal Project. However, on the analysis page of Yahoo finance I can't pull a particular value. The HTML seems complicated to me, could I get some guidance?

class yhcrawler(scrapy.Spider):
    name = 'yahoo'
    
    start_urls = [f'https://ca.finance.yahoo.com/quote/{t}/analysis?p={t}' for t in tkrs]
    
    def parse(self, response):
        filename = 'stock_growths.csv'
        
        l = response.css('div#YDC-Col1>div>div>div>div>div>section>table>tbody>tr>td#431::text').extract()
        print(l)

this is what I am trying

l = response.css('div#YDC-Col1>div>div>div>div>div>section>table>tbody>tr>td#431::text').extract()

and I am getting an empty results of

2021-04-18 15:12:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ca.finance.yahoo.com/quote/M/analysis?p=M> (referer: None)
[]

The value I am trying to get is on the highlighted line, -11.82%

You wanna specify the exact value of an item available in that site in order for others to help you. — SIM
– SIM, Commented Apr 18, 2021 at 22:57
I don't know which ticker you are using, so the value in the image is useless. What value you wish to grab, if you consider this link? Beware that the value in there are not static, so specify by the field name, as in Current Year, Next Year e.t.c. — SIM
– SIM, Commented Apr 19, 2021 at 4:10

SIM · Accepted Answer · 2021-04-19 07:38:48Z

1

Try this:

class YahoofinanceSpider(scrapy.Spider):
    name = 'yahoofinance'
    start_urls = ['https://ca.finance.yahoo.com/quote/aapl/analysis?p=aapl']
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    } 

    def start_requests(self):
        for start_url in self.start_urls:
            yield scrapy.Request(start_url,headers=self.headers)

    def parse(self, response):
        item = response.xpath("//td[./span][contains(.,'Next 5 Years')]/following-sibling::td/text()").getall()
        yield {"item":item}

answered Apr 19, 2021 at 7:38

SIM

22.5k6 gold badges45 silver badges116 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

CSS Selector HTML with Scrapy Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related