0

How can I convert the code inside the <code> and <pre> tags to html entities ?

<code class="php"> <div> a div.. </div> </code>

<pre class="php">
<div> a div.. </div>
</pre>

<div> this should be ignored </div>
10
  • Depends on the context. Where does the code reside? Inside a string? Commented Apr 2, 2011 at 23:04
  • yes, it's php string variable Commented Apr 2, 2011 at 23:05
  • 2
    @Alexandra this is tough, because you'd need to parse the structure first to tell apart the parts you need to entity from those you don't. Why is it mixed that way in the first place? Can you influence how this is generated? Commented Apr 2, 2011 at 23:07
  • i can't.. this is the output when a visitor posts a comment and I want to be able for them to post html too Commented Apr 2, 2011 at 23:09
  • @Alexandra You can't just let visitors post HTML to your site — this enables XSS attacks and allows bots to post really nasty spam that is invisible to regular visitors, but visible to search engine bots. Commented Apr 2, 2011 at 23:16

4 Answers 4

2

OK, I've been playing with this for a while. The result may not be the best or most direct solution (and, frankly, I disagree with your approach entirely if arbitrary users are going to be submitting the input), but it appears to "work". And, most importantly, it doesn't use regexes for parsing XML. :)

Faking the input

<?php

$str = <<<EOF
<code class="php"> <div> a div.. </div> </code>

<pre class="php">
<div> a div.. </div>
</pre>

<div> this should be ignored </div>
EOF;

?>

Code

<?php

function recurse(&$doc, &$parent) {
   if (!$parent->hasChildNodes())
      return;

   foreach ($parent->childNodes as $elm) {

      if ($elm->nodeName == "code" || $elm->nodeName == "pre") {
         $content = '';
         while ($elm->hasChildNodes()) { // `for` breaks the `removeChild`
             $child = $elm->childNodes->item(0);
             $content .= $doc->saveXML($child);
             $elm->removeChild($child);
         }
         $elm->appendChild($doc->createTextNode($content));
      }
      else {
         recurse($doc, $elm);
      }
   }
}

// Load in the DOM (remembering that XML requires one root node)
$doc = new DOMDocument();
$doc->loadXML("<document>" . $str . "</document>");

// Iterate the DOM, finding <code /> and <pre /> tags:
recurse($doc, $doc->documentElement);

// Output the result
foreach ($doc->childNodes->item(0)->childNodes as $node) {
   echo $doc->saveXML($node);
}

?>

Output

<code class="php"> &lt;div&gt; a div.. &lt;/div&gt; </code>

<pre class="php">
&lt;div&gt; a div.. &lt;/div&gt;
</pre>

<div> this should be ignored </div>

Proof

You can see it working here.

Note that it doesn't explicitly call htmlspecialchars; the DOMDocument object handles the escaping itself.

I hope that this helps. :)

Sign up to request clarification or add additional context in comments.

5 Comments

My code also appears to work... Yours breaks easily as mine. PS I wasted 16min less than you have.
@webarto: Thanks for your unconstructive comment. Could you provide an example of input that breaks my code, so that I can fix it? PS it's not a waste if you're on Stack Overflow to help people. And it's 12 minutes.
I have just noticed that HTML appears to be escaped twice. This is now fixed.
It is not nonconstructive (codepad.org/FulWwCbC), it brings fatal error if you add just one closing tag that is not opened, anyway, you code is fancy, but in real situations it is worthless, just as mine.
@webarto: Of course, valid XML is required. That is a pre-requisite that I suppose I should mention in the answer.
2

You can use jquery. This will encode anything inside any tags with a class code.

$(".code").each(
    function () {
        $(this).text($(this).html()).html();
    }
);

The fiddle: http://jsfiddle.net/mazzzzz/qnbLL/

6 Comments

+1 I'd recommend this approach as long as the result HTML is not insecure.
This question is about PHP, not Javascript.
but how can you get hacked if the code is escaped? stackoverflow does the same thing....
@Alexandra: No, it doesn't. Stack Overflow accepts a strict subset of HTML. You reject a strict subset of HTML.
@Tomalak but i only do that to the stuff inside CODE. on the rest of the comment I'm stripping tags just like SO
|
2

PHP

if(preg_match_all('#\<(code|pre) class\=\"php\"\>(.*?)\</(code|pre)\>#is', $html, $code)){
    unset($code[0]);
    foreach($code as $array){
        foreach($array as $value){
            $html = str_replace($value, htmlentities($value, ENT_QUOTES), $html);
        }
    }
}

HTML

<code class="php"> &lt;div&gt; a div.. &lt;/div&gt; </code>

<pre class="php">
&lt;div&gt; a div.. &lt;/div&gt;
</pre>

<div> this should be ignored </div>

Have you ever heard of BB code? http://en.wikipedia.org/wiki/BBCode

10 Comments

This isn't very flexible. You require the tags to have a very strict text layout.
I don't understand, It breaks just as easily as yours, unstrict layout is just an hacking attempt.
@webarto: No, it breaks more easily. Mine requires valid XML; yours requires an exact text match. The user could not write more spaces inside the tag (which is otherwise valid), or add more parameters in the tag (which is also otherwise valid). Yours also matches <code>...</pre>.
By exact match you mean <code class="php"></code>? Yes. But it should be strict. Can I insert inline CSS or JS in your code or pre tag? (which doesn't make sense anyway because you can use HTML outside in this case). Yes it does match <code>...</pre>. But valid XML is pre-requisite :)
@webarto: <code class="php"><pre class="php">A</pre>B</code> is valid XML, but your code would (I believe) fail to escape B in it.
|
1

This is related somewhat, you do not have to use Geshi, but I wrote a bit of code here Advice for implementing simple regex (for bbcode/geshi parsing) that would help you with the problem.

It can be tweaked to not use GeSHi, just would take a bit of tinkering. Hope it helps ya.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.