Questions tagged [web-scraping]
Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.
609 questions
3
votes
1
answer
126
views
Multi-Page Web Scraping Code Using Selenium with Multithreading
I have written a web scraping script using Selenium to crawl blog content from multiple URLs. The script processes URLs in batches of 1000 and uses multithreading with the ThreadPoolExecutor to ...
4
votes
1
answer
105
views
Scraping the calendar of some public libraries from their websites
I've been learning some Haskell as an amateur (to be precise: I started programming with this language, and it has been a year or less since I started seriously). So far, I have realised only small ...
3
votes
2
answers
100
views
Validating a web crawlers page visits with a decorator
I am writing a crawler that is going to end up in production and I was trying to come up with a way to validate its page visits. It scrapes asp.net pages so each scraping process involves a few ...
3
votes
1
answer
110
views
Scraping website with Python and Selenium to collect data from dynamic website
Summary:
The code scrapes the website and collects the data to store it in CSV. It also downloads selected information that is available for download in PDF format. The details and the entire code are ...
2
votes
1
answer
80
views
A selenium web scraper to package NBA data
I'm building a selenium web scraper for basketball-reference.com that takes a player name and returns data in either a JSON format or Pandas DataFrame object. The class in question is one of many that ...
5
votes
1
answer
220
views
Scraping the Divar.ir
I've wrote a code to scrape the Divar, which is an equivalent of Ebay in Iran. I have a few questions:
Am I doing the error handling and logging ok?
Is there a better way to optimize this code? (note ...
4
votes
2
answers
215
views
Enum to deserialize HTML sizes from JSON with serde
I added an enum for my webscraper to deserialize data from a JSON field that represents an HTML image size, which can either be an unsigned int like 1080 or a ...
2
votes
0
answers
77
views
Simplified HTML parsing for LEGO features
The goal is to extract the the Features section from a Lego product page. In the Features section, usually there's a header (...
3
votes
1
answer
84
views
HTTP scraper for Python Package
I'm trying to make my first Python package as a learning experience. There's a lot of things that I suspect I am doing poorly, but this post is specifically about my HttpRequest class. I made this ...
4
votes
2
answers
287
views
Test generator I made for practice
Made this generator to practice using imports from other modules and better readability for coding. What could I have done better and what did I do wrong?
File called test_generator.py
...
3
votes
1
answer
234
views
A simple web scraper for nature.com news articles
I have created a simple web scraper that fetches news article previews from nature.com and saves each article to a file containing the article preview text.
I am learning independently, so I would ...
2
votes
1
answer
231
views
Beginner Python Web scraping
I'm a newbie in programming, I chose Python. I'm learning on my own.
Currently I'm preparing code for a portfolio on github.
I will be grateful for any code review, especially in the subject of OOP: ...
6
votes
1
answer
865
views
Python web scraper for Amazon customer reviews
I'm new to web scraping and tried building a web scraper for Amazon customer reviews. The program works fine as is but I wanted to improve the design and get some feedback. The basic idea was to ...
3
votes
2
answers
244
views
Saving Scraped Data to a File
When scraping and saving data into a file, Which method is more efficient when saving scraped data to a file?
open the file first, scrape, and save the data all ...
5
votes
1
answer
380
views
Grabbing youtube thumbnails from urls
I made some C# code to grab youtube thumbnails from urls, I originally made this in python took some time converting it to C#. I am VERY new to C# and did this with minimal help.
...
2
votes
1
answer
119
views
Web-scraping bountied questions from Stack Exchange sites
I recently built my first web scraper in Python and decided to use Stack Overflow (SO) and Stack Exchange (SE) as test websites.
Code
...
1
vote
1
answer
144
views
Parsing: Сollecting data from a book site
A task:
Сollect data from the site in the following format:
book; user; book_rating; comment_rating; publication_date; comment
For one book at once several pages of ...
3
votes
1
answer
239
views
Scrape PokeDex and display in tkinter
Hi I am new here and I just completed my first working version of a pokedex app with a GUI using tkinter. I used selenium to scrape the data from pokemondb.net, and then used pandas to clean up the ...
9
votes
2
answers
2k
views
Python script to scrape and parse the Stanford Encyclopedia of Philosophy
I wrote the following script to parse an SEP article and call pandoc to convert it to EPUB. I'd love your feedback.
There is no function but I didn't think it was worth adding. Also there is no test ...
1
vote
0
answers
64
views
Iterate tables from table id from href links until no table with specific table id is found
I am doing web scraping to the next web page (which is my root URL to start scraping tables): https://www.iso.org/standards-catalogue/browse-by-ics.html
What I am trying to achieve is to parse the ...
1
vote
1
answer
150
views
Design Pattern: Builder - BeautifulSoup directory navigation and scraping
I wrote a class on top of BeautifulSoup using the builder design pattern that allows for the navigation of the necp data directory.
There are a couple navigation ...
1
vote
0
answers
271
views
Web scraper for pokemon prices using Selenium
I made this scraper to pull the prices of pokemon cards off tcgcollector, using a csv file from the same site. Because for some reason they don't export price. I'm looking for any kind of noobie ...
3
votes
0
answers
832
views
A simple Python script that crawls information about Youtube playlists and your watch history
You will need to follow this guide.
This is a Youtube crawler that crawls information about Youtube playlists, it uses Youtube Data API v3 and it crawls the title, url, description, count and videos ...
3
votes
1
answer
124
views
Acupuncture database builder
The following code builds a rudimentary acupuncture database by collecting data from the web.
I would like to hear suggestions about improvements to the database structure, code organization, web-...
4
votes
2
answers
409
views
Scraping campsite availability from a webpage using vba with selenium
I wrote some code to extract the information from a table, but it takes an extremely long time.
The table is in the format of a calendar.
I need the information on an Excel sheet with column 1 as the ...