Newest 'web-scraping' Questions - Page 2

0 votes

0 answers

60 views

How to extract debundled JavaScript files via CDP or Playwright

I've been trying to programmatically extract the original, debundled JavaScript source files behind a web app that uses Webpack bundles and source maps. While Chrome DevTools clearly shows the ...

NoOne

19

asked Jul 16 at 7:49

0 votes

0 answers

72 views

how do i complete the Checkout bot at the payment process that uses tokens? (using python requests)

I'm working on a Python bot that monitors a Shopify webshop, adds a product to the cart, and tries to continue to checkout. I'm using requests.Session() and BeautifulSoup to handle the stock check, ...

Denzel

13

asked Jul 11 at 14:12

0 votes

1 answer

176 views

How can I speed up my Selenium scraper using multiprocessing in Python? [closed]

I'm scraping a large list of URLs (1.2 million) using Selenium + BeautifulSoup with Python's multiprocessing.Pool. I want to scale it up to scrape faster, ideally without hitting system resource ...

SolidOpt

113

asked Jul 10 at 6:52

0 votes

0 answers

76 views

Puppeteer scraper - bot detection

I have a scraper running on Puppeteer and Node.js. When I change the headless mode from false to true, the bot gets detected. I would like it to work with false since the deployment will be on Railway....

Juliaano

1

asked Jul 6 at 18:27

-1 votes

2 answers

197 views

How to use Python to download a pdf file from a link (not button!)

How to use python to download a pdf file from a "Download PDF" link on a web page and save it to a local folder? If I move cursor to the link and right click "inspect" I got ...

ylin321

1

asked Jul 2 at 2:49

0 votes

0 answers

52 views

Can I connect to an authenticated proxy using undetected-chromedriver + seleniumwire?

The proxy I'm trying to connect to is ScraperAPI's proxy and this is the way they provided in documentation to connect to seleniumwire, however it doesn't work and it still uses my IP. API_KEY = '...

Christian

1

asked Jun 25 at 20:51

0 votes

1 answer

54 views

Selenium can’t click on StoryWeaver book cards after login (macOS, Python)

I’m automating downloads of StoryWeaver books with Selenium+Python. After logging in, I land on the level-page and can see the story cards in the UI—but my script can’t click any of them to navigate ...

Mohammad Malik

11

asked Jun 25 at 6:03

1 vote

1 answer

62 views

Web Scraping with Python: Selenium TimeoutException when logging in to React-rendered StoryWeaver site in headless Chrome

I’m trying to automate downloading StoryWeaver PDFs by: 1. Navigating to the React homepage at https://storyweaver.org.in/en/ 2. Clicking Log in (which opens a React modal) 3. Filling out my email/...

Mohammad Malik

11

asked Jun 24 at 7:45

1 vote

0 answers

95 views

How to use Java Selenium ChromeDriver to passed Cloudflare bot checking?

I know this question has been asked many times, but all I found is using Python programming which is I am totally not familiar. I also follows this article to add Request headers, and this to ...

Kunto Fullstack

423

asked Jun 22 at 17:26

0 votes

1 answer

57 views

clickable elements throw error "could not be scrolled into view" in selenium

I try to scrape https://www.anytimemailbox.com/s/new-york-42-broadway. I checked https://stackoverflow.com/a/61343018/21294350 and used driver.execute_script("window.scrollTo(0, document.body....

An5Drama

774

asked Jun 20 at 10:23

2 votes

1 answer

59 views

Facing this issue when loading data using pd.read_csv from a url

SSLCertVerificationError Traceback (most recent call last) File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py:1344, in AbstractHTTPHandler....

Deepanshu Kumar

21

asked Jun 14 at 0:22

-4 votes

1 answer

831 views

Best ways to feed the Ollama LLM with a high data load

I am developing a chatbot for my university that will use a wiki with curriculum information for courses and other relevant data. One of the challenges is optimizing the use of Ollama to process the ...

user27403331

13

asked Jun 10 at 16:41

0 votes

0 answers

108 views

How do I use VBA to scrape the Merriam Webster site and output definitions, synonyms, antonyms, and examples to Excel?

Apologies in advance as my code so far is a mess - I'm very lost. I have an Excel doc with a list of vocabulary terms. I'm trying to create something that will go to the Merriam Webster site, search ...

Saber

33

asked Jun 9 at 1:42

-6 votes

1 answer

90 views

Retrieve icon via google spreadsheet, function importxml on website Tankathon (draft prospect)

I've tried to retrieve in https://www.tankathon.com/players/jase-richardson the icons "+" and "-" using importxml in google spreadsheets the problem is I've tried BF1 = https://...

Ma Poub

11

asked Jun 8 at 22:33

-1 votes

2 answers

94 views

How can I scrape content that's loaded dynamically on Sainsbury's product pages?

Trying to build a scraper that extracts nutritional information from each product page on Sainsbury (for eg, scraping energy values out of https://www.sainsburys.co.uk/gol-ui/product/sainsburys-...

Siddharth Gianchandani

11

asked Jun 6 at 14:32

0 votes

1 answer

36 views

Python-Selenium: Loading the third page from bvc.com.co shows blank screen

I'm trying to scrape some data from bvc.com.co (the Colombian Stock Exchange webpage). But always, when loading the third stock, the screen comes blank and the target expected_condition can not be ...

Ivan Castro

615

asked Jun 5 at 17:30

1 vote

0 answers

73 views

Why location details are different after scraping data from Google Maps?

In this web scraper, it scrapes the business data from Google Maps and saves the data to the Excel file, but in these Excel sheets latitude and longitude are always different from the actual it shown ...

mihir soni

11

asked Jun 4 at 9:33

0 votes

0 answers

23 views

winhttrack is renaming filetypes from mirror website. How to get winhttrack to retain original filetypes

I am trying to mirror levels found on the https://megamaker.webmeka.io/ site using winhttrack. I am excluding all pages other than the index pages, the level page and the mmlv level downloads. ...

Neil McLean

1

asked May 31 at 7:31

0 votes

1 answer

60 views

Octoparse: Dealing with Infinite Scroll AND Load More Button

Long story short I am waiting for a company to provide product data but it is taking them months to get back to me. I've decided to try and scrape the data from their site myself to get things moving ...

Nemo

21

asked May 29 at 18:41

-1 votes

2 answers

118 views

Trying to scrape sectional times from Racingtv.com [closed]

I'm trying to scrape sectional times for horse races from RacingTV (e.g., https://www.racingtv.com/results/2025-05-11/leopardstown/1310) using Python and Selenium, and I need the output to be ...

user30640245

1

asked May 26 at 15:00

0 votes

0 answers

54 views

Apify Update breaks python debugging capabilities

I'm working with apify for web scraping and I recently updated from apify-cli 0.21.6 to 0.21.7 I use the python SDK and for debugging I use pdb.set_trace() or breakpoint() When I updated apify-cli the ...

Cristobal Sarome

826

asked May 21 at 1:19

1 vote

0 answers

48 views

How to prevent a non-link web element from opening in a new tab

I'm using Python Selenium to find an element, click it, and have the new page stay open in the same window instead of opening in a new tab. I'm trying to be careful with how I word this because I it's ...

JimmyG

657

asked May 20 at 15:53

-4 votes

1 answer

63 views

scrapy webcrawler refuses to crawl http on localhost [closed]

I had a small webcrawler that was written using scrapy and since I didn't want to run it against real site during development I used a local mirror. Mirror was served with python -m http.server 8000 ...

Anton

132

asked May 19 at 3:11

2 votes

2 answers

92 views

Encountering an error while using Selenium to search for a stock on Google Finance

I tried to search stock in google finance as below. from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver....

TRUE

43

asked May 17 at 13:06

0 votes

0 answers

42 views

HtmlUnit get nested form for authentication

I've been trying to use html unit to authenticate to a page using okta. The html looks like this: I can use getElementById to get the okta-login-container div, but I can't get anything inside of it. ...

carlos palma

864

asked May 15 at 19:47

Collectives™ on Stack Overflow

How to extract debundled JavaScript files via CDP or Playwright

how do i complete the Checkout bot at the payment process that uses tokens? (using python requests)

How can I speed up my Selenium scraper using multiprocessing in Python? [closed]

Puppeteer scraper - bot detection

How to use Python to download a pdf file from a link (not button!)

Can I connect to an authenticated proxy using undetected-chromedriver + seleniumwire?

Selenium can’t click on StoryWeaver book cards after login (macOS, Python)

Web Scraping with Python: Selenium TimeoutException when logging in to React-rendered StoryWeaver site in headless Chrome

How to use Java Selenium ChromeDriver to passed Cloudflare bot checking?

clickable elements throw error "could not be scrolled into view" in selenium

Facing this issue when loading data using pd.read_csv from a url

Best ways to feed the Ollama LLM with a high data load

How do I use VBA to scrape the Merriam Webster site and output definitions, synonyms, antonyms, and examples to Excel?

Retrieve icon via google spreadsheet, function importxml on website Tankathon (draft prospect)

How can I scrape content that's loaded dynamically on Sainsbury's product pages?

Python-Selenium: Loading the third page from bvc.com.co shows blank screen

Why location details are different after scraping data from Google Maps?

winhttrack is renaming filetypes from mirror website. How to get winhttrack to retain original filetypes

Octoparse: Dealing with Infinite Scroll AND Load More Button

Trying to scrape sectional times from Racingtv.com [closed]

Apify Update breaks python debugging capabilities

How to prevent a non-link web element from opening in a new tab

scrapy webcrawler refuses to crawl http on localhost [closed]

Encountering an error while using Selenium to search for a stock on Google Finance

HtmlUnit get nested form for authentication

Hot Network Questions