Securing snippet with dropping eval() in input-file parsing

Question

I have a template-esque system which can load bulk templates (more than one template entry in one file) and store them accordingly. The problem is that the current approach uses preg_replace() and eval and it is really error-prone. An example for this error could be an improperly-placed character which breaks the regular expression and creates a parse error:

Parse error: syntax error, unexpected '<' in tsys.php: eval()'d code

The code which does this said loading is the following:

// Escaping
$this->_buffer = str_replace( array('\\', '\'', "\n"), array('\\\\', '\\\'', ''), $this->_buffer);

// Regular-expression chunk up the input string to evaluative code
$this->_buffer = preg_replace('#<!--- BEGIN (.*?) -->(.*?)<!--- END (.*?) -->#', "\n" . '$this->_tstack[\'\\1\'] = \'\\2\';', $this->_buffer);

// Run the previously created PHP code
eval($this->_buffer);

An example file of this bulk template looks like the following:

<!--- BEGIN foo -->
<p>Some HTML code</p>
<!--- END foo -->

<!--- BEGIN bar -->
<h1>Some other HTML code</h1>
<!--- END bar -->

When the code is ran on this input, the $this->_tstack will be given two elements:

array (
  'foo' => "<p>Some HTML code</p>",
  'bar' => "<h1>Some other HTML code</h1>",
);

Which is the expected behavior but I am looking for a method which we could drop the need of eval.

@ircmaxell You store the HTML in these files, and this code is supposed to load these files into the internal _tstack container, from which, you can print templates to the screen. What I am searching an approach for parsing multi- (or bulk-) template files without the need of eval(). — Whisperity
– Whisperity, Commented Jul 13, 2012 at 15:01
@ircmaxell And for a greater insight: after the template is loaded, you can prepare templates (templates can have variable places inside them which are filled with values), add them to output buffers and if needed, print them on the screen. Templates are the ones who contain what HTML the system should and will output. — Whisperity
– Whisperity, Commented Jul 13, 2012 at 21:28

Whisperity · Accepted Answer · 2012-07-15 11:37:55Z

1

Well, here goes. Given $template contains:

<!--- BEGIN foo -->
    <p>Some HTML code</p>
<!--- END foo -->

<!--- BEGIN bar -->
    <h1>Some other HTML code</h1>
<!--- END bar -->

Then:

$values = array();
$pattern = '#<!--- BEGIN (?P<key>\S+) -->(?P<value>.+?)<!--- END (?P=key) -->#si';
if ( preg_match_all($pattern, $template, $matches, PREG_SET_ORDER) ) {
    foreach ($matches as $match) {
        $values[$match['key']] = trim($match['value']);
    }
}
var_dump($values);

Results in:

array(2) {
  ["foo"]=>
  string(21) "<p>Some HTML code</p>"
  ["bar"]=>
  string(29) "<h1>Some other HTML code</h1>"
}

If white space preservation is important, remove trim().

edited Jul 15, 2012 at 11:37

Whisperity

3,0421 gold badge21 silver badges36 bronze badges

answered Jul 13, 2012 at 15:50

Dan Lugg

20.7k19 gold badges116 silver badges180 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Whisperity Over a year ago

This did work, with some slight modifications. Because template names can contain whitespaces, I needed to change (?P<key>\S+) to (?P<key>\S.+). I have altered the answer to make it look like a valid PHP code, because there were some syntax errors. Thank you for the answer.

Dan Lugg Over a year ago

@Whisperity Sorry, that was valid PHP 5.4 code; not 5.3. Your edits make it backward compatible; thank you :) Also, it's probably best to narrow the character class allowance to what you need. If it can only be whitespace and "word characters" (letters, digits, and underscores), then [\w\s]+ should suffice. [\w\s-]+ for hyphens too.

Whisperity Over a year ago

I have just updated to PHP 5.4.4, but when I tested the code I only had 5.3.8. Off topic, but should I try to make it (the project) backward compatible, or drop it all, and write it as a new code?

Dan Lugg Over a year ago

@Whisperity That depends on far too many factors. Backward compatibility is obviously a massive asset (read; necessity) for libraries/applications made publicly available for use in the wild. However, if this is purely an in-house solution, I would leverage features offered by the latest release, and start coding new. Again, too many other factors; you'll have to decide for yourself.

Whisperity Over a year ago

@Bracketwroks I think I won't fudge on [] instead of array(), but will drop mysql_ in favour of mysqli_. Anyway, before we go more off-topic, I wish to thank you for the solution. I really need to learn RegExp more.

Florent · Accepted Answer · 2012-07-09 09:53:34Z

1

You can use preg_match_all to do that:

// Remove CR and NL
$buffer = str_replace(array("\r", "\n"), '', $this->_buffer);

// Grab interesting parts
$matches = array();
preg_match_all('/\?\?\? BOT (?P<group>[^ ]+) \?\?\?(?P<content>.*)!!! EOT \1 !!!/', $buffer, $matches);

// Build the stack
$stack = array_combine(array_values($matches['group']), array_values($matches['content']));

Will output:

Array
(
    [foo] => <p>Some HTML code</p>
    [bar] => <h1>Some other HTML code</h1>
)

answered Jul 9, 2012 at 9:53

Florent

12.4k10 gold badges51 silver badges58 bronze badges

1 Comment

Whisperity Over a year ago

The method did work, but since I asked, the templates were modified to use a different, more HTML-like format (see the updated question). And I am not sure how to modify your preg_replace_all() line to prevent the system from getting Warning: array_combine() [function.array-combine]: Both parameters should have at least 1 element error and FALSE as $stack.

Collectives™ on Stack Overflow

Securing snippet with dropping eval() in input-file parsing

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related