How do I write a pattern to use with PHP's preg_match function to check if a string containing script-tags?
3 Answers
For security reasons? Basically, you can't. Here are some things I learned doing this in the past:
<a href="javascript:something">...</a><p onmouseover="something">- There are a number of URL schemes that are equivalent to
javascript:in different browsers, likejscript:,mocha:, andlivescript:. Most are undocumented. - Old versions of Netscape treated certain bytes (0x94 and 0x95, I think?) as equivalent to
<>. Hopefully there's nothing like this in modern browsers. - VBScript.
MySpace tried to do this, and the result was the "Samy is my hero" worm which took down the service for a day or so, among numerous other security disasters on their part.
So if you want to accept a limited subset of HTML that only includes text and formatting, you have to whitelist, not blacklist. You have to whitelist tags, attributes, and if you want to allow links, URL schemes. There are a few existing libraries out there for doing this, but I don't know which ones to recommend in PHP.
Comments
Don't use regular expressions for processing xml/html. You should rather use the DOM classes of PHP, it should be much more reliable than any regex you will find:
$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXPath($document);
if ($xpath->query('//script')->length > 0) {
// document contains script tags
}
1 Comment
Are you trying to escape them? if so try the following (not tested)
$string=str_replace(array("&", "<", ">"), array("&", "<", ">"), $string);
With this way a surprise will be waiting your attackers.
4 Comments
htmlspecialchars.