51,744 questions
0
votes
0
answers
60
views
How to extract debundled JavaScript files via CDP or Playwright
I've been trying to programmatically extract the original, debundled JavaScript source files behind a web app that uses Webpack bundles and source maps. While Chrome DevTools clearly shows the ...
0
votes
0
answers
72
views
how do i complete the Checkout bot at the payment process that uses tokens? (using python requests)
I'm working on a Python bot that monitors a Shopify webshop, adds a product to the cart, and tries to continue to checkout. I'm using requests.Session() and BeautifulSoup to handle the stock check, ...
0
votes
1
answer
176
views
How can I speed up my Selenium scraper using multiprocessing in Python? [closed]
I'm scraping a large list of URLs (1.2 million) using Selenium + BeautifulSoup with Python's multiprocessing.Pool. I want to scale it up to scrape faster, ideally without hitting system resource ...
0
votes
0
answers
76
views
Puppeteer scraper - bot detection
I have a scraper running on Puppeteer and Node.js. When I change the headless mode from false to true, the bot gets detected. I would like it to work with false since the deployment will be on Railway....
-1
votes
2
answers
197
views
How to use Python to download a pdf file from a link (not button!)
How to use python to download a pdf file from a "Download PDF" link on a web page and save it to a local folder? If I move cursor to the link and right click "inspect" I got ...
0
votes
0
answers
52
views
Can I connect to an authenticated proxy using undetected-chromedriver + seleniumwire?
The proxy I'm trying to connect to is ScraperAPI's proxy and this is the way they provided in documentation to connect to seleniumwire, however it doesn't work and it still uses my IP.
API_KEY = '...
0
votes
1
answer
54
views
Selenium can’t click on StoryWeaver book cards after login (macOS, Python)
I’m automating downloads of StoryWeaver books with Selenium+Python. After logging in, I land on the level-page and can see the story cards in the UI—but my script can’t click any of them to navigate ...
1
vote
1
answer
62
views
Web Scraping with Python: Selenium TimeoutException when logging in to React-rendered StoryWeaver site in headless Chrome
I’m trying to automate downloading StoryWeaver PDFs by:
1. Navigating to the React homepage at https://storyweaver.org.in/en/
2. Clicking Log in (which opens a React modal)
3. Filling out my email/...
1
vote
0
answers
95
views
How to use Java Selenium ChromeDriver to passed Cloudflare bot checking?
I know this question has been asked many times, but all I found is using Python programming which is I am totally not familiar.
I also follows this article to add Request headers, and this to ...
0
votes
1
answer
57
views
clickable elements throw error "could not be scrolled into view" in selenium
I try to scrape https://www.anytimemailbox.com/s/new-york-42-broadway. I checked https://stackoverflow.com/a/61343018/21294350 and used driver.execute_script("window.scrollTo(0, document.body....
2
votes
1
answer
59
views
Facing this issue when loading data using pd.read_csv from a url
SSLCertVerificationError Traceback (most recent call last)
File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py:1344, in AbstractHTTPHandler....
-4
votes
1
answer
831
views
Best ways to feed the Ollama LLM with a high data load
I am developing a chatbot for my university that will use a wiki with curriculum information for courses and other relevant data. One of the challenges is optimizing the use of Ollama to process the ...
0
votes
0
answers
108
views
How do I use VBA to scrape the Merriam Webster site and output definitions, synonyms, antonyms, and examples to Excel?
Apologies in advance as my code so far is a mess - I'm very lost.
I have an Excel doc with a list of vocabulary terms. I'm trying to create something that will go to the Merriam Webster site, search ...
-6
votes
1
answer
90
views
Retrieve icon via google spreadsheet, function importxml on website Tankathon (draft prospect)
I've tried to retrieve in https://www.tankathon.com/players/jase-richardson
the icons "+" and "-" using importxml in google spreadsheets
the problem is I've tried
BF1 = https://...
-1
votes
2
answers
94
views
How can I scrape content that's loaded dynamically on Sainsbury's product pages?
Trying to build a scraper that extracts nutritional information from each product page on Sainsbury (for eg, scraping energy values out of https://www.sainsburys.co.uk/gol-ui/product/sainsburys-...
0
votes
1
answer
36
views
Python-Selenium: Loading the third page from bvc.com.co shows blank screen
I'm trying to scrape some data from bvc.com.co (the Colombian Stock Exchange webpage). But always, when loading the third stock, the screen comes blank and the target expected_condition can not be ...
1
vote
0
answers
73
views
Why location details are different after scraping data from Google Maps?
In this web scraper, it scrapes the business data from Google Maps and saves the data to the Excel file, but in these Excel sheets latitude and longitude are always different from the actual it shown ...
0
votes
0
answers
23
views
winhttrack is renaming filetypes from mirror website. How to get winhttrack to retain original filetypes
I am trying to mirror levels found on the https://megamaker.webmeka.io/ site
using winhttrack.
I am excluding all pages other than the index pages, the level page and the
mmlv level downloads.
...
0
votes
1
answer
60
views
Octoparse: Dealing with Infinite Scroll AND Load More Button
Long story short I am waiting for a company to provide product data but it is taking them months to get back to me. I've decided to try and scrape the data from their site myself to get things moving ...
-1
votes
2
answers
118
views
Trying to scrape sectional times from Racingtv.com [closed]
I'm trying to scrape sectional times for horse races from RacingTV (e.g., https://www.racingtv.com/results/2025-05-11/leopardstown/1310) using Python and Selenium, and I need the output to be ...
0
votes
0
answers
54
views
Apify Update breaks python debugging capabilities
I'm working with apify for web scraping and I recently updated from apify-cli 0.21.6 to 0.21.7
I use the python SDK and for debugging I use pdb.set_trace() or breakpoint()
When I updated apify-cli the ...
1
vote
0
answers
48
views
How to prevent a non-link web element from opening in a new tab
I'm using Python Selenium to find an element, click it, and have the new page stay open in the same window instead of opening in a new tab. I'm trying to be careful with how I word this because I it's ...
-4
votes
1
answer
63
views
scrapy webcrawler refuses to crawl http on localhost [closed]
I had a small webcrawler that was written using scrapy and since I didn't want to run it against real site during development I used a local mirror. Mirror was served with python -m http.server 8000 ...
2
votes
2
answers
92
views
Encountering an error while using Selenium to search for a stock on Google Finance
I tried to search stock in google finance as below.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver....
0
votes
0
answers
42
views
HtmlUnit get nested form for authentication
I've been trying to use html unit to authenticate to a page using okta. The html looks like this:
I can use getElementById to get the okta-login-container div, but I can't get anything inside of it. ...