5

I need to split up a string like this

<p>foo</p><p>bar</p>

to an array with "foo" and "bar"

I thought RegEx could help me, but it seems I didn't understand RegEx. This is my try.

var inputText = "<p>foo</p><p>bar</p>";
splittedSelection = inputText.split("/<p>|<\/p>/g");

But all I can achieve is an array with one entry and it's the same as the inputText.

I made a little fiddle for you.

Thanks for any help.

6
  • 1
    You're not using a regex here, you're using a string. splittedSelection = inputText.split(/<p>|<\/p>/g); Commented Aug 3, 2017 at 15:10
  • 2
    stackoverflow.com/questions/1732348/… Commented Aug 3, 2017 at 15:10
  • Thanks for that, @epascarello. Everybody go click that link Commented Aug 3, 2017 at 15:13
  • 1
    Do not parse HTML with Regex Commented Aug 3, 2017 at 15:16
  • Please take a look at @baao's answer :) Commented Aug 3, 2017 at 15:21

6 Answers 6

2

You should use /<p>|<\/p>/g instead of inside quotations. However, this will produce ["", "foo", "", "bar", ""], which is undesirable, so you can .filter() out empty results, like this:

var inputText = "<p>foo</p><p>bar</p>";

splittedSelection = inputText.split(/<p>|<\/p>/g).filter(function(value) {
  // Filter out empty results
  return value !== "";
});

document.getElementById("bar").innerHTML += "0: " + splittedSelection[0] + "\n" + "1: " + splittedSelection[1] + "\n";
<div id="bar">
</div>

Sign up to request clarification or add additional context in comments.

Comments

1

you can start from something like this:

  1. .+ will handle different tags and attributes
  2. .+? creates a lazy quantifier

const text = "<p>foo</p><p>bar</p>";

const re = /<.+?>(.+?)<\/.+?>/g;

console.log(text.split(re).filter(t => t));

1 Comment

"Lazy quantifier" = "By adding the ? after the +, we tell it to repeat as few times as possible, so the first match it comes across, is where we want to stop the matching." – lazy vs. greedy stackoverflow.com/a/2301298/1066234
0

ES6 based answer:

const regex = /<[^>]*>/gi;
let string = '<p>foo</p><p>bar</p>';
let result = string.split(regex).filter(e => e);

Comments

0

Assuming this is on the client you can use jQuery instead of regex.

var inputText = "<p>foo</p><p>bar</p>";
var splittedSelection = $('<div>'+inputText+'</div>').find("p").map(function() { 
  return $(this).text() 
});
$.each(splittedSelection, function(i,item) {
  $("#bar").append(i+": " +item + "<br/>");
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<div id="bar"></div>

Comments

0

Forget about the answers that try to fix your regex. Don't do it with regex.

Instead, get the elements and map their textContent to an array:

let res = Array.from(document.getElementsByTagName('p')).map(e => e.textContent);
console.log(res);
<p>foo</p><p>bar</p>

If you only have this string and it is not a part of the document, create an element and parse it then (you don't even need to append the element to the DOM):

let s = "<p>foo</p><p>bar</p>";
let el = document.createElement('div');
el.innerHTML = s;

let res = Array.from(el.getElementsByTagName('p')).map(e => e.textContent);
console.log(res);

If you're doing this in node, you can use cheerio:

const cheerio = require('cheerio')
let html = "<p>foo</p><p>bar</p>";
const $ = cheerio.load(html);
let res = [];
$('p').each((i,e) => res.push($(e).text()));
console.log(res);

If you are doing this in any other environment, changes are extremely high that there's a DOM/XML/HTML parser available, too.

3 Comments

This is like offering apples to who's asking for milk, isn't it? What's about this task should be done in nodejs?
No it isn't @Hitmands. It's explaining someone who is doing it wrong how to do it right. If you ask me how to jump from a bridge I'd also say better don't do it instead of explaining your original question. I've added a version for node...
All of us are aware that Regex shouldn't be used as parsers, but, he his asking for that... You can add a comment with a suggestion to better handle the problem but answers should be answers...
0

Another solution with regex:

let regex = /(?![<p>])(.*?)(?=[<\/p>])/g
  , inputText = "<p>foo</p><p>bar</p>";

let array = inputText.match(regex).filter(i => i);
  
console.log(array);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.