3

I'm writing an RSS to JSON parser and as a part of that, I need to use htmlentities() on any tag found inside the description tag. Currently, I'm trying to use preg_replace(), but I'm struggling a little with it. My current (non-working) code looks like:

$pattern[0] = "/\<description\>(.*?)\<\/description\>/is";
$replace[0] = '<description>'.htmlentities("$1").'</description>';
$rawFeed = preg_replace($pattern, $replace, $rawFeed);

If you have a more elegant solution to this as well, please share. Thanks.

2 Answers 2

7

Simple. Use preg_replace_callback:

function _handle_match($match)
{
    return '<description>' . htmlentities($match[1]) . '</description>';
}

$pattern = "/\<description\>(.*?)\<\/description\>/is";
$rawFeed = preg_replace_callback($pattern, '_handle_match', $rawFeed);

It accepts any callback type, so also methods in classes.

Sign up to request clarification or add additional context in comments.

1 Comment

How would you change the pattern to match the content of all nested nodes? Thanks.
0

The more elegant solution would be to employ SimpleXML. Or a third party library such as XML_Feed_Parser or Zend_Feed to parse the feed.

Here is a SimpleXML example:

<?php
$rss = file_get_contents('http://rss.slashdot.org/Slashdot/slashdot');
$xml = simplexml_load_string($rss);

foreach ($xml->item as $item) {
    echo "{$item->description}\n\n";
}
?>

Keep in mind that RSS and RDF and Atom look different, which is why it can make sense to employ one of the above libraries I mentioned.

2 Comments

I am actually using simpleXML, but the problem is that any embedded HTML inside the description tag also becomes an object, which is why I am entity encoding it first.
Your feed is broken then. Good feeds wrap HTML and similar in CDATA.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.