3

How do I write a pattern to use with PHP's preg_match function to check if a string containing script-tags?

1
  • By 'script tags' do you mean things like <script>, <?, <?php, <% and so forth? Commented Sep 21, 2009 at 12:30

3 Answers 3

4

For security reasons? Basically, you can't. Here are some things I learned doing this in the past:

  • <a href="javascript:something">...</a>
  • <p onmouseover="something">
  • There are a number of URL schemes that are equivalent to javascript: in different browsers, like jscript:, mocha:, and livescript:. Most are undocumented.
  • Old versions of Netscape treated certain bytes (0x94 and 0x95, I think?) as equivalent to <>. Hopefully there's nothing like this in modern browsers.
  • VBScript.

MySpace tried to do this, and the result was the "Samy is my hero" worm which took down the service for a day or so, among numerous other security disasters on their part.

So if you want to accept a limited subset of HTML that only includes text and formatting, you have to whitelist, not blacklist. You have to whitelist tags, attributes, and if you want to allow links, URL schemes. There are a few existing libraries out there for doing this, but I don't know which ones to recommend in PHP.

Sign up to request clarification or add additional context in comments.

Comments

1

Don't use regular expressions for processing xml/html. You should rather use the DOM classes of PHP, it should be much more reliable than any regex you will find:

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXPath($document);
if ($xpath->query('//script')->length > 0) {
    // document contains script tags
}

1 Comment

The question simply says "string", which does not necessarily imply that there is a document structure...
0

Are you trying to escape them? if so try the following (not tested)

$string=str_replace(array("&", "<", ">"), array("&amp;", "&lt;", "&gt;"), $string);

With this way a surprise will be waiting your attackers.

4 Comments

I'm trying to check if containg script-tag (true/false).
then this might work: preg_match("/<script.*?</script>/i", $string);
Or simply htmlspecialchars.
A shame using str_replace when htmlspecialchars is available :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.