6,084 questions
0
votes
0
answers
47
views
Issue With Jsoup Document Selector
I'm using java spring boot and jsoup and recently I upgraded jsoup version to 1.21.1.
My code creates search query and searches for it in the document
Elements targetElements = document.select(...
3
votes
1
answer
55
views
Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working
I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc.
The ...
0
votes
0
answers
96
views
parse marked customize for list
I've seen the docs https://marked.js.org/using_pro#renderer and it has no example for the list i want to customize
more detail https://github.com/markedjs/marked/blob/master/src/Tokens.ts#L137 as the ...
3
votes
1
answer
91
views
Why isn't the end tag included in an ASIDE.OuterHTML
My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace command.
But somehow the OuterHTML ...
1
vote
0
answers
31
views
Passing CSRF token through Dart html parsing
I'm making an app where students can login to there portal website and it shows their data, however I'm having trouble authenticated users, when I did this project on another website I used ...
-1
votes
2
answers
83
views
Parser on python returns an empty list (i guess its an HTML class selection issue)
The idea is: i wanna collect the name of the flat and its price as a list for every flat on the website.
Ive made a simple parser on python, but looks like i cant get any values, since it returns an ...
0
votes
1
answer
47
views
Duplicate extra data when webscraping fbref.com
I am trying to webscrape the league table for the EPL, but when I do that I am getting duplicate links as well as links to the teams that are not even in the premier league which makes no sense.
Here ...
1
vote
1
answer
159
views
How do I get to the root directory in C++?
I am building a web-server. I am trying to build a function handler that parses the index.html file in the root directory. It works but when I go to the website on my localhost 127.0.0.1:8080 I get ...
0
votes
1
answer
24
views
Code will not scroll down playlist to parse song names
Using beautifulsoup and selenium in python, I am trying to scroll down a list of songs in a playlist to parse the song names. The code however will not get past the first 30 songs and scroll down ...
-7
votes
1
answer
120
views
Replace the querystring of an href declaration in an <a> tag
I want to replace the following hyperlinks dynamically
from
<a href="/xsearch2?q=some search/21">21</a>
to
<a href="/xsearch2?q=some search&page=21">21</a&...
-1
votes
1
answer
30
views
Divs not being detected with BeautifulSoup
I am trying to parse https://rateyourmusic.com/release/album/tyler-the-creator/igor/reviews/1/
I can access the divs that have class_=review_body if I download the html files locally on to my system. ...
0
votes
0
answers
73
views
jsoup converting '&' to '&' when I set the Element
I am trying to parse an html input using jsoup (v1.18.1), extract elements, extract each attribute value and replace as follows:
> with >
< with <
The method I'm feeding this code ...
0
votes
0
answers
24
views
PHP Simple HTML DOM Parser not Returning Anything [duplicate]
I'm trying to use the PHP Simple HTML DOM Parser for the first time from here - https://simplehtmldom.sourceforge.io/docs/1.9/index.html
Unfortunately, I'm having an issue where it's not returning ...
-2
votes
1
answer
54
views
JavaScript for bookmarklet data extraction from an html monthly calendar schedule
I have a bookmarklet and JavaScript with which I am extracting data from an html table from a website.
For the most part the script works fine however it parses the date wrong. The date, in the HTML ...
0
votes
1
answer
58
views
Xml Parse code working fine at my end, does not work at client region over same Html
I have written Apps Script code for Html Parsing using XmlParse. It works fine at my end, my browser and system language both are English as well as my Google Account's. But when I shared the same ...
2
votes
1
answer
131
views
How to handle self-closing tags without end-slash in html.parser.HTMLParser
By default it seems that html.parser.HTMLParser cannot handle self closing tags correctly, if they are not terminated using /. E.g. it handles <img src="asfd"/> fine, but it ...
-1
votes
2
answers
30
views
code not running when webscraping weather data
I am trying to scrape earthquake weather data from USGS and my code runs up to the print(soup) line but nothing after that
import requests
from bs4 import BeautifulSoup
url="https://earthquake....
0
votes
1
answer
51
views
Populating Spreadsheet(s) from email html table
I am not a programmer but I've been digging through the weeds to figure something out on my own and I'm stuck. I have a google spreadsheet with multiple sheets that I need to populate with content ...
1
vote
0
answers
57
views
Error: peg$SyntaxError: Expected Character but "&" Found While Parsing SVG Path Data in JavaScript
I am working with an SVG file and converting it to JSON using svgson library. Additionally, I am using the svg-path-to-polygons library to decode the d attribute in the path element. However, I am ...
0
votes
0
answers
534
views
"unstructured" and langchain's "HTMLHeaderTextSplitter" ignores "pre" and/or "code" HTML tags
I want to read a webpage and split it into chunks to feed a vector database in a RAG pipeline. This webpage has python code examples on it, but I cannot create chunks with that code text, it is ...
0
votes
1
answer
305
views
Get Errors on HTML Content Using Jsoup for Java
I am building an application that receives HTML content as strings. I need to verify that these HTML strings are well-formed, meaning I want to parse them and detect lines with errors.
During my ...
-1
votes
1
answer
634
views
Web scraping 2nd table player stats from fbref.com [duplicate]
Was hoping for help here. I'm trying to web scrape this second table of player goal and shot creation stats on FB Ref for the MLS, but my script is bringing in the first table of team statistics ...
1
vote
1
answer
442
views
How to create vnodes from a string with html tags in vue 3
Research
So I've found this answer on how to create a vnode list from a simple SVG with one path layer and how to transform that in Vue2.
I could not find any good solutions for Vue 3, so I scaffolded ...
1
vote
1
answer
247
views
Problem: How to scrape dynamically loaded data table in Python?
Python novice here. I have been learning how to scrape from various baseball sites (Fangraphs, Statcast, Rotowire). I have had success with a few different methods, but the Park Factors table on ...
0
votes
0
answers
72
views
How can we remove or escape garbage value like =3D or =3D&"e while parsing mhtml or html page using jsoup?
I'm trying to extract data from offline saved html pages using jsoup, but while parsing the html document through my java code, I'm getting some garbage value like =3D, =3d&"e, etc. Is ...