Newest 'html-parsing' Questions

0 votes

0 answers

47 views

Issue With Jsoup Document Selector

I'm using java spring boot and jsoup and recently I upgraded jsoup version to 1.21.1. My code creates search query and searches for it in the document Elements targetElements = document.select(...

user613

243

asked Nov 18 at 9:04

3 votes

1 answer

55 views

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc. The ...

James Brian

33

asked Aug 30 at 17:29

0 votes

0 answers

96 views

parse marked customize for list

I've seen the docs https://marked.js.org/using_pro#renderer and it has no example for the list i want to customize more detail https://github.com/markedjs/marked/blob/master/src/Tokens.ts#L137 as the ...

zummon

996

asked Apr 4 at 0:57

3 votes

1 answer

91 views

Why isn't the end tag included in an ASIDE.OuterHTML

My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace command. But somehow the OuterHTML ...

iRon

24.4k

asked Mar 3 at 9:46

1 vote

0 answers

31 views

Passing CSRF token through Dart html parsing

I'm making an app where students can login to there portal website and it shows their data, however I'm having trouble authenticated users, when I did this project on another website I used ...

abtlb

11

asked Feb 12 at 9:54

-1 votes

2 answers

83 views

Parser on python returns an empty list (i guess its an HTML class selection issue)

The idea is: i wanna collect the name of the flat and its price as a list for every flat on the website. Ive made a simple parser on python, but looks like i cant get any values, since it returns an ...

Danny Mxxre

1

asked Jan 18 at 16:45

0 votes

1 answer

47 views

Duplicate extra data when webscraping fbref.com

I am trying to webscrape the league table for the EPL, but when I do that I am getting duplicate links as well as links to the teams that are not even in the premier league which makes no sense. Here ...

Vignesh

27

asked Dec 26, 2024 at 22:39

1 vote

1 answer

159 views

How do I get to the root directory in C++?

I am building a web-server. I am trying to build a function handler that parses the index.html file in the root directory. It works but when I go to the website on my localhost 127.0.0.1:8080 I get ...

Codemon

11

asked Dec 21, 2024 at 17:49

0 votes

1 answer

24 views

Code will not scroll down playlist to parse song names

Using beautifulsoup and selenium in python, I am trying to scroll down a list of songs in a playlist to parse the song names. The code however will not get past the first 30 songs and scroll down ...

BouckleyBoy

11

asked Dec 9, 2024 at 4:57

-7 votes

1 answer

120 views

Replace the querystring of an href declaration in an <a> tag

I want to replace the following hyperlinks dynamically from <a href="/xsearch2?q=some search/21">21</a> to <a href="/xsearch2?q=some search&page=21">21</a&...

KTH Clips

1

asked Dec 1, 2024 at 2:40

-1 votes

1 answer

30 views

Divs not being detected with BeautifulSoup

I am trying to parse https://rateyourmusic.com/release/album/tyler-the-creator/igor/reviews/1/ I can access the divs that have class_=review_body if I download the html files locally on to my system. ...

Nate

1

asked Nov 21, 2024 at 7:04

0 votes

0 answers

73 views

jsoup converting '&' to '&amp' when I set the Element

I am trying to parse an html input using jsoup (v1.18.1), extract elements, extract each attribute value and replace as follows: > with &gt < with &lt The method I'm feeding this code ...

Pallavi

1

asked Sep 12, 2024 at 20:10

0 votes

0 answers

24 views

PHP Simple HTML DOM Parser not Returning Anything [duplicate]

I'm trying to use the PHP Simple HTML DOM Parser for the first time from here - https://simplehtmldom.sourceforge.io/docs/1.9/index.html Unfortunately, I'm having an issue where it's not returning ...

Lewis Hardisty

121

asked Aug 29, 2024 at 18:45

-2 votes

1 answer

54 views

JavaScript for bookmarklet data extraction from an html monthly calendar schedule

I have a bookmarklet and JavaScript with which I am extracting data from an html table from a website. For the most part the script works fine however it parses the date wrong. The date, in the HTML ...

SystemWorks

181

asked Aug 28, 2024 at 8:47

0 votes

1 answer

58 views

Xml Parse code working fine at my end, does not work at client region over same Html

I have written Apps Script code for Html Parsing using XmlParse. It works fine at my end, my browser and system language both are English as well as my Google Account's. But when I shared the same ...

Amna Irfan

3

asked Aug 10, 2024 at 18:07

2 votes

1 answer

131 views

How to handle self-closing tags without end-slash in html.parser.HTMLParser

By default it seems that html.parser.HTMLParser cannot handle self closing tags correctly, if they are not terminated using /. E.g. it handles <img src="asfd"/> fine, but it ...

flawr

11.7k

asked Aug 4, 2024 at 12:41

-1 votes

2 answers

30 views

code not running when webscraping weather data

I am trying to scrape earthquake weather data from USGS and my code runs up to the print(soup) line but nothing after that import requests from bs4 import BeautifulSoup url="https://earthquake....

Lumko Mtengwane

1

asked Jul 18, 2024 at 16:44

0 votes

1 answer

51 views

Populating Spreadsheet(s) from email html table

I am not a programmer but I've been digging through the weeds to figure something out on my own and I'm stuck. I have a google spreadsheet with multiple sheets that I need to populate with content ...

notobella designs

1

asked Jul 5, 2024 at 15:56

1 vote

0 answers

57 views

Error: peg$SyntaxError: Expected Character but "&" Found While Parsing SVG Path Data in JavaScript

I am working with an SVG file and converting it to JSON using svgson library. Additionally, I am using the svg-path-to-polygons library to decode the d attribute in the path element. However, I am ...

HEMAL

430

asked Jun 26, 2024 at 9:53

0 votes

0 answers

534 views

"unstructured" and langchain's "HTMLHeaderTextSplitter" ignores "pre" and/or "code" HTML tags

I want to read a webpage and split it into chunks to feed a vector database in a RAG pipeline. This webpage has python code examples on it, but I cannot create chunks with that code text, it is ...

Abraham Martín Expósito

29

asked May 29, 2024 at 11:02

0 votes

1 answer

305 views

Get Errors on HTML Content Using Jsoup for Java

I am building an application that receives HTML content as strings. I need to verify that these HTML strings are well-formed, meaning I want to parse them and detect lines with errors. During my ...

Juan Rojas

75

asked May 26, 2024 at 0:44

-1 votes

1 answer

634 views

Web scraping 2nd table player stats from fbref.com [duplicate]

Was hoping for help here. I'm trying to web scrape this second table of player goal and shot creation stats on FB Ref for the MLS, but my script is bringing in the first table of team statistics ...

user15039720

1

asked May 13, 2024 at 23:22

1 vote

1 answer

442 views

How to create vnodes from a string with html tags in vue 3

Research So I've found this answer on how to create a vnode list from a simple SVG with one path layer and how to transform that in Vue2. I could not find any good solutions for Vue 3, so I scaffolded ...

Nebulosar

1,887

asked Apr 15, 2024 at 15:12

1 vote

1 answer

247 views

Problem: How to scrape dynamically loaded data table in Python?

Python novice here. I have been learning how to scrape from various baseball sites (Fangraphs, Statcast, Rotowire). I have had success with a few different methods, but the Park Factors table on ...

gredow1979

11

asked Apr 10, 2024 at 20:58

0 votes

0 answers

72 views

How can we remove or escape garbage value like =3D or =3D&&quote while parsing mhtml or html page using jsoup?

I'm trying to extract data from offline saved html pages using jsoup, but while parsing the html document through my java code, I'm getting some garbage value like =3D, =3d&&quote, etc. Is ...

Kunanj Pradhan

11

asked Mar 27, 2024 at 7:00

Collectives™ on Stack Overflow

Issue With Jsoup Document Selector

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

parse marked customize for list

Why isn't the end tag included in an ASIDE.OuterHTML

Passing CSRF token through Dart html parsing

Parser on python returns an empty list (i guess its an HTML class selection issue)

Duplicate extra data when webscraping fbref.com

How do I get to the root directory in C++?

Code will not scroll down playlist to parse song names

Replace the querystring of an href declaration in an <a> tag

Divs not being detected with BeautifulSoup

jsoup converting '&' to '&amp' when I set the Element

PHP Simple HTML DOM Parser not Returning Anything [duplicate]

JavaScript for bookmarklet data extraction from an html monthly calendar schedule

Xml Parse code working fine at my end, does not work at client region over same Html

How to handle self-closing tags without end-slash in html.parser.HTMLParser

code not running when webscraping weather data

Populating Spreadsheet(s) from email html table

Error: peg$SyntaxError: Expected Character but "&" Found While Parsing SVG Path Data in JavaScript

"unstructured" and langchain's "HTMLHeaderTextSplitter" ignores "pre" and/or "code" HTML tags

Get Errors on HTML Content Using Jsoup for Java

Web scraping 2nd table player stats from fbref.com [duplicate]

How to create vnodes from a string with html tags in vue 3

Problem: How to scrape dynamically loaded data table in Python?

How can we remove or escape garbage value like =3D or =3D&&quote while parsing mhtml or html page using jsoup?

Hot Network Questions