Use loop and find html element's values JavaScript

Question

I want to use vanilla js to loop through a string of html text and get its values. with jQuery I can do something like this

var str1="<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";
$.each($(str1).find('h2'), function(index, value) {
/// console.log($(value).text());
});

using $(str) converts it to an html string as I understand it and we can then use .text() to get an element (h2)'s value. but I want to do this within my node app on the backend rather than on the client side, because it'd be more efficient (?) and also it'd just be nice to not rely on jQuery.

Some context, I'm working on a blogging app. I want a table of contents created into an object server side.

What exactly is your question? What are you trying to a achieve? — Olian04
– Olian04, Commented Jan 2, 2018 at 23:41
Why would you be having DOM nodes on the server where there is no DOM? — Scott Marcus
– Scott Marcus, Commented Jan 2, 2018 at 23:46
Well jQuery should work at backend, but considering not relying on it you would probably have to use some set of regular expression to find each element tag or the easiest way parse it through some document parser you can check the npm site for such parsers — Ezekiel
– Ezekiel, Commented Jan 2, 2018 at 23:46
Cheerio can do this, but afaik it doesn't allow some things such as class manipulation. github.com/cheeriojs/cheerio — simon
– simon, Commented Jan 2, 2018 at 23:49

Mulan · Accepted Answer · 2018-01-03 21:03:29Z

This is another way using .innerHTML but uses the built-in iterable protocol

Here's the operations we'll need, the types they have, and a link to the documentation of that function

Create an HTML element from a text
String -> HTMLElement – provided by set Element#innerHTML
Get the text contents of an HTML element
HTMLElement -> String – provided by get Element#innerHTML
Find nodes matching a query selector
(HTMLElement, String) -> NodeList – provided by Element#querySelectorAll
Transform a list of nodes to a list of text
(NodeList, HTMLElement -> String) -> [String] – provided by Array.from

// html2elem :: String -> HTMLElement
const html2elem = html =>
  {
    const elem = document.createElement ('div')
    elem.innerHTML = html
    return elem.childNodes[0]
  }

// findText :: (String, String) -> [String]
const findText = (html, selector) =>
  Array.from (html2elem(html).querySelectorAll(selector), e => e.textContent)

// str :: String  
const str =
  "<div><h1>MAIN HEADING</h1><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";

console.log (findText (str, 'h2'))
// [
//   "This is a heading1",
//   "This is a heading2"
// ]
// :: [String]

console.log (findText (str, 'h1'))
// [
//   "MAIN HEADING"
// ]
// :: [String]

slevy1 · Accepted Answer · 2018-01-11 20:40:15Z

The best way to parse HTML is to use the DOM. But, if all you have is a string of HTML, according to this Stackoverflow member) you may create a "dummy" DOM element to which you'd add the string to be able to manipulate the DOM, as follows:

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head>
<body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>
</body</html>";

Now you have a couple of ways to access the data using the DOM, as follows:

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head><body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div></body</html>";
    
    // one way
    el.g = el.getElementsByTagName;
    var h2s = el.g("h2");
    for(var i = 0, max = h2s.length; i < max; i++){
        console.log(h2s[i].textContent);
        if (i == max -1) console.log("\n");
    }
    
    // and another
    var elementList = el.querySelectorAll("h2");
    for (i = 0, max = elementList.length; i < max; i++) {
        console.log(elementList[i].textContent);
    }

You may also use a regular expression, as follows:

var str = '<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>';

var re = /<h2>([^<]*?)<\/h2>/g;
var match;
var m = [];
var i=0;
while ( match = re.exec(str) ) {
    m.push(match.pop());
}
console.log(m);

The regex consists of an opening H2 tag followed by not a "<",followed by a closing H2 tag. The "*?" take into account zero or multiple instances of which there is at least zero or one instance.

Per Ryan of Stackoverflow:

exec with a global regular expression is meant to be used in a loop, as it will still retrieve all matched subexpressions.

The critical part of the regex is the "g" flag as per MDN. It allows the exec() method to obtain multiple matches in a given string. In each loop iteration, match becomes an array containing one element. As each element is popped off and pushed onto m, the array m ultimately contains all the captured text values.

Collectives™ on Stack Overflow

Use loop and find html element's values JavaScript

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related