Newest 'web-scraping' Questions

-3 votes

0 answers

35 views

What are the best ways to enhance the Python code of a Scrapy spider? [closed]

I want to enhance the following Python code so that it can print the transcript completely, and without extra spaces. def parse_item(self, response): # Getting the article box that ...

Mo Bilal

61

asked 12 hours ago

2 votes

0 answers

60 views

How to stop/kill achieved Scrapy spider instance within RStudio

I'm making a tutorial on how to scrape with Scrapy. For that, I use Quarto/RStudio and the website https://quotes.toscrape.com/. For pedagogic purposes, I need to run a first crawl on the first page, ...

Didier mac cormick

227

asked yesterday

Advice

0 votes

1 replies

26 views

How to fetch realTime news Data feed

i wanted to know how i can get live news feed data (INDIAN) , without any or like minimal latency(30-40s), i tried using some rss feeds but all they do is provide the data as some latency so what i ...

its m

49

asked Nov 18 at 16:50

-1 votes

0 answers

32 views

Want to crawl Facebook Groups to fetch members data using python [closed]

I used some script to automate the member data collection but it get detected and denied permission.Any solution on that? tried to bypass the bot but not able to do.

Mohammad Raza

11

asked Nov 14 at 6:19

1 vote

0 answers

79 views

Invoke-WebRequest URL encoding

I want to retrieve content from web page. However, I tried above method but the error still come when the query string contain Chinese character. code $json = Get-Content -Encoding utf8 -Path "./...

Akira

33

asked Nov 12 at 3:07

0 votes

1 answer

201 views

Fetch data from https://www.sofascore.com/?

This is my python code using on ubuntu to try fetch and extract data from https://www.sofascore.com/ I create this test code before using on E2 device in my plugin # python3 -m venv venv # source venv/...

RR-EB

55

asked Nov 4 at 0:15

0 votes

1 answer

58 views

Scrapy handle status 202

I'm quite new to web scraping, and in particular in using Scrapy's spiders, pipelines... I'm getting some 202 status from some spider requests' response, hence the page content is not available yet ...

Manu310

178

asked Oct 28 at 11:27

0 votes

0 answers

34 views

Docsearch Typesense scraper only finds records on Docusaurus landing page

Problem I’m using Docusaurus with Typesense and the docsearch-typesense-scraper to index my documentation site. Everything runs fine — the sitemap is found, and the scraper produces records. However, ...

Erwin

1

asked Oct 20 at 12:59

2 votes

1 answer

37 views

Cannot access 'iwe-autocomplete' element in html with selenium

Website photo with search box visible. So, this is the website https://sa.ucla.edu/ro/public/soc There is a dropdown menu for selecting subject area where I need to write subject and i will receive ...

Rohit Kasturi

23

asked Oct 16 at 10:05

-1 votes

1 answer

59 views

Selenium script marks all search results as “not found” because details load only after clicking a link [closed]

I’m using Python + Selenium + ChromeDriver to check a list of titles (from a CSV file) against an online library catalog. My script searches each title and tries to determine if a specific library has ...

huda

1

asked Oct 11 at 13:50

-2 votes

1 answer

117 views

Webscrape links to download files based on word in page HTML

I am webscraping WHO pages using the following code: pacman::p_load(rvest, httr, stringr, purrr) download_first_pdf_from_handle <- function(handle_id) { ...

flâneur

321

asked Oct 5 at 4:24

1 vote

1 answer

124 views

Scraping archived content [closed]

I am a bit new to webscraping and trying to build a scraper to collect the title, text, and date from this archived page: from selenium import webdriver from selenium.webdriver.chrome.service import ...

Kaitlin

83

asked Sep 30 at 13:49

0 votes

0 answers

251 views

Scraping Instagram Likes at Bulk

My goal is to find out if a given user has liked any post of another profile. So the following question has to be answered: Has the user X liked any post on the profile Y in the past 24 months. For ...

a6i09per5f

300

asked Sep 24 at 20:44

3 votes

1 answer

152 views

How to clean inconsistent address strings in Python?

I'm working on a web scraping project in Python to collect data from a real estate website. I'm running into an issue with the addresses, as they are not always consistent. I've already handled simple ...

Adamzam15

41

asked Sep 11 at 12:13

-1 votes

3 answers

230 views

Unable to scrape 2nd table from Fbref.com for players table

I would like to scrape the 2nd table in the page seen below from the link - https://fbref.com/en/comps/9/2023-2024/stats/2023-2024-Premier-League-Stats on google collab. But pd.read_html only gives me ...

rian patel

1

asked Sep 6 at 11:36

2 votes

2 answers

188 views

Extracting html table and turn into tibble or data.frame in R

Using the following code: library(rvest) read_html("https://gainblers.com/mx/quinielas/progol-revancha/", encoding = "UTF-8")|> html_elements(xpath= '//*[@id="...

Alejandro Carrera

603

asked Sep 5 at 18:48

1 vote

2 answers

110 views

Selenium select from dropdown menu

I'm a bit new to Selenium and am trying to build a webscraper that can select a dropdown menu and then select specific options from the menu. I've built the following code and it was working at one ...

Kaitlin

83

asked Sep 4 at 11:33

1 vote

1 answer

231 views

Trouble scraping dynamic lottery results table – inconsistent parsing

I’ve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...

Zuryab

11

asked Aug 27 at 10:50

0 votes

0 answers

63 views

Disable assignment of window.location in Selenium

I'm trying to extract data from a website using Selenium. On random occasions, the page will do a client-side redirect with window.location. How can I disable this? I've tried redefining the property ...

anon

697

asked Aug 23 at 21:02

0 votes

1 answer

112 views

Python Selenium find nested element [closed]

on this page I want to parse few elements. I would like to get text in circles and use attribute value to click sometimes. That code returns anything. With this code I want to get all attribute ...

Rok Golob

19

asked Aug 22 at 6:57

0 votes

1 answer

206 views

Pytube consistently fails with HTTP Error 400: Bad Request also on latest version

I am trying to use pytube (v15.0.0) to fetch the titles of YouTube videos. However, for every video I try, my script fails with the same error: HTTP Error 400: Bad Request. I have already updated ...

Rohit Hake

1

asked Aug 14 at 9:42

1 vote

2 answers

292 views

How to download protected PDF (ViewDocument) using Selenium or requests?

I'm trying to download a protected PDF from the New York State Courts NYSCEF website using Python. The URL looks like this: https://iapps.courts.state.ny.us/nyscef/ViewDocument?docIndex=...

Daremitsu

655

asked Aug 4 at 10:28

-2 votes

2 answers

150 views

R Web Scraping - Data is Incomplete (Yahoo Finance)

I am using the following code. It successfully targets the correct url and node text. However, the data that is returned is incomplete as some of the fields (like previous close and open) are blank or ...

Brad Horn

685

asked Jul 30 at 18:30

0 votes

2 answers

169 views

Extracting The SGF Data From This Webpage

I would like to scrape the problems from these Go (board game) books, and convert them into SGFs, if they aren't in that format already. For now, I would be satisfied with only taking the problems ...

psygo

7,853

asked Jul 26 at 2:28

1 vote

0 answers

47 views

Pyppeteer returns None or empty content when scraping Digikala product page

I'm trying to scrape a product page from Digikala using Pyppeteer because the site is heavily JavaScript-rendered. Here is my render class: import asyncio from pyppeteer import launch from pyppeteer....

Ali Motamed

31

asked Jul 19 at 5:44

Collectives™ on Stack Overflow

What are the best ways to enhance the Python code of a Scrapy spider? [closed]

How to stop/kill achieved Scrapy spider instance within RStudio

How to fetch realTime news Data feed

Want to crawl Facebook Groups to fetch members data using python [closed]

Invoke-WebRequest URL encoding

Fetch data from https://www.sofascore.com/?

Scrapy handle status 202

Docsearch Typesense scraper only finds records on Docusaurus landing page

Cannot access 'iwe-autocomplete' element in html with selenium

Selenium script marks all search results as “not found” because details load only after clicking a link [closed]

Webscrape links to download files based on word in page HTML

Scraping archived content [closed]

Scraping Instagram Likes at Bulk

How to clean inconsistent address strings in Python?

Unable to scrape 2nd table from Fbref.com for players table

Extracting html table and turn into tibble or data.frame in R

Selenium select from dropdown menu

Trouble scraping dynamic lottery results table – inconsistent parsing

Disable assignment of window.location in Selenium

Python Selenium find nested element [closed]

Pytube consistently fails with HTTP Error 400: Bad Request also on latest version

How to download protected PDF (ViewDocument) using Selenium or requests?

R Web Scraping - Data is Incomplete (Yahoo Finance)

Extracting The SGF Data From This Webpage

Pyppeteer returns None or empty content when scraping Digikala product page

Hot Network Questions