0

I would like to read everything JavaScript out of a string with preg_match_all.

$pattern = '~<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>~su';
$success = preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);

array(0 => '<script>alert("Hallo Welt 1");</script>');

The result now contains the script tag as well. I would like to exclude this tag.

My Sample Online Regex with Sample Code.

2
  • Classic XY problem Most likely regex is not your tool. Commented Mar 19, 2019 at 0:44
  • Add a real capture group (…) around the inner part, and use result set [1]. Commented Mar 19, 2019 at 1:33

1 Answer 1

1

Regex is the wrong tool for parsing XML/HTML. You should use a DOM parser instead. XPath expressions is a language specialized on parsing DOM structures.

$html = <<<_EOS_
<script>alert("Hallo Welt 1");</script>
<div>Hallo Welt</div>
<script type ="text/javascript">alert("Hallo Welt 2");</script>
<div>Hallo Welt 2</div>
<script type ="text/javascript">
              alert("Hallo Welt 2");
</script>
_EOS_;

$doc = new DOMDocument();
$doc->loadHTML("<!DOCTYPE html><html>$html</html>");
$xpath = new DOMXPath($doc);
$scripts = $xpath->query('//script/text()');

foreach ($scripts as $script)
  var_dump($script->data);
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this practicable solution… i think it's the best way...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.