Skip to main content
Filter by
Sorted by
Tagged with
-3 votes
0 answers
35 views

I want to enhance the following Python code so that it can print the transcript completely, and without extra spaces. def parse_item(self, response): # Getting the article box that ...
Mo Bilal's user avatar
2 votes
0 answers
60 views

I'm making a tutorial on how to scrape with Scrapy. For that, I use Quarto/RStudio and the website https://quotes.toscrape.com/. For pedagogic purposes, I need to run a first crawl on the first page, ...
Didier mac cormick's user avatar
Advice
0 votes
1 replies
26 views

i wanted to know how i can get live news feed data (INDIAN) , without any or like minimal latency(30-40s), i tried using some rss feeds but all they do is provide the data as some latency so what i ...
its m's user avatar
  • 49
-1 votes
0 answers
32 views

I used some script to automate the member data collection but it get detected and denied permission.Any solution on that? tried to bypass the bot but not able to do.
Mohammad Raza's user avatar
1 vote
0 answers
79 views

I want to retrieve content from web page. However, I tried above method but the error still come when the query string contain Chinese character. code $json = Get-Content -Encoding utf8 -Path "./...
Akira's user avatar
  • 33
0 votes
1 answer
201 views

This is my python code using on ubuntu to try fetch and extract data from https://www.sofascore.com/ I create this test code before using on E2 device in my plugin # python3 -m venv venv # source venv/...
RR-EB's user avatar
  • 55
0 votes
1 answer
58 views

I'm quite new to web scraping, and in particular in using Scrapy's spiders, pipelines... I'm getting some 202 status from some spider requests' response, hence the page content is not available yet ...
Manu310's user avatar
  • 178
0 votes
0 answers
34 views

Problem I’m using Docusaurus with Typesense and the docsearch-typesense-scraper to index my documentation site. Everything runs fine — the sitemap is found, and the scraper produces records. However, ...
Erwin's user avatar
  • 1
2 votes
1 answer
37 views

Website photo with search box visible. So, this is the website https://sa.ucla.edu/ro/public/soc There is a dropdown menu for selecting subject area where I need to write subject and i will receive ...
Rohit Kasturi's user avatar
-1 votes
1 answer
59 views

I’m using Python + Selenium + ChromeDriver to check a list of titles (from a CSV file) against an online library catalog. My script searches each title and tries to determine if a specific library has ...
huda's user avatar
  • 1
-2 votes
1 answer
117 views

I am webscraping WHO pages using the following code: pacman::p_load(rvest, httr, stringr, purrr) download_first_pdf_from_handle <- function(handle_id) { ...
flâneur's user avatar
  • 321
1 vote
1 answer
124 views

I am a bit new to webscraping and trying to build a scraper to collect the title, text, and date from this archived page: from selenium import webdriver from selenium.webdriver.chrome.service import ...
Kaitlin's user avatar
  • 83
0 votes
0 answers
251 views

My goal is to find out if a given user has liked any post of another profile. So the following question has to be answered: Has the user X liked any post on the profile Y in the past 24 months. For ...
a6i09per5f's user avatar
3 votes
1 answer
152 views

I'm working on a web scraping project in Python to collect data from a real estate website. I'm running into an issue with the addresses, as they are not always consistent. I've already handled simple ...
Adamzam15's user avatar
-1 votes
3 answers
230 views

I would like to scrape the 2nd table in the page seen below from the link - https://fbref.com/en/comps/9/2023-2024/stats/2023-2024-Premier-League-Stats on google collab. But pd.read_html only gives me ...
rian patel's user avatar
2 votes
2 answers
188 views

Using the following code: library(rvest) read_html("https://gainblers.com/mx/quinielas/progol-revancha/", encoding = "UTF-8")|> html_elements(xpath= '//*[@id="...
Alejandro Carrera's user avatar
1 vote
2 answers
110 views

I'm a bit new to Selenium and am trying to build a webscraper that can select a dropdown menu and then select specific options from the menu. I've built the following code and it was working at one ...
Kaitlin's user avatar
  • 83
1 vote
1 answer
231 views

I’ve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...
Zuryab's user avatar
  • 11
0 votes
0 answers
63 views

I'm trying to extract data from a website using Selenium. On random occasions, the page will do a client-side redirect with window.location. How can I disable this? I've tried redefining the property ...
anon's user avatar
  • 697
0 votes
1 answer
112 views

on this page I want to parse few elements. I would like to get text in circles and use attribute value to click sometimes. That code returns anything. With this code I want to get all attribute ...
Rok Golob's user avatar
0 votes
1 answer
206 views

I am trying to use pytube (v15.0.0) to fetch the titles of YouTube videos. However, for every video I try, my script fails with the same error: HTTP Error 400: Bad Request. I have already updated ...
Rohit Hake's user avatar
1 vote
2 answers
292 views

I'm trying to download a protected PDF from the New York State Courts NYSCEF website using Python. The URL looks like this: https://iapps.courts.state.ny.us/nyscef/ViewDocument?docIndex=...
Daremitsu's user avatar
  • 655
-2 votes
2 answers
150 views

I am using the following code. It successfully targets the correct url and node text. However, the data that is returned is incomplete as some of the fields (like previous close and open) are blank or ...
Brad Horn's user avatar
  • 685
0 votes
2 answers
169 views

I would like to scrape the problems from these Go (board game) books, and convert them into SGFs, if they aren't in that format already. For now, I would be satisfied with only taking the problems ...
psygo's user avatar
  • 7,853
1 vote
0 answers
47 views

I'm trying to scrape a product page from Digikala using Pyppeteer because the site is heavily JavaScript-rendered. Here is my render class: import asyncio from pyppeteer import launch from pyppeteer....
Ali Motamed's user avatar

1
2 3 4 5
1035