3

I have a problem with an rss feed.

When i do <title>This is a title </title>

The title appears nicely in the feed

But when i ddo $title = "this is a tilte";

<title><![CDATA['$title']]></title>

The title doesn't appear at all.


It still doesn't work. I generate my rss feed dynamicly and it looks like this:

$item_template="
      <item>
         <title>[[title]]</title>
         <link>[[link]]</link>
         <description><![CDATA[[[description]]]]></description>
         <pubDate>[[date]]</pubDate>
      </item>
      ";

and in a loop:

$s.=str_replace(
array("[[title]]","[[link]]","[[description]]","[[date]]"),
array(htmlentities($row["title"]),$url,$description,$date),
$item_template);

The problem is specifically when the title has a euro sign. Then it shows up in my rss validator like:

Â\x80

More detailed information:

Ok I have been struggeling with this for the last few days and I can't find a solution. So I will start a bounty. Here is more information:

  • The information that goes in the feed is stored in a latin 1 database (which i administer)
  • The problem appears when there is a euro sign in the database. No matter wether its like € or &euro;
  • The euro sign sometimes appears like weird charachters or like Â\x80
  • I try to solve the problem on the feed side not on the reader side.
  • The complete code can be found over here: codedump
  • Next: sometimes when the euro sign cannot be parsed the item (either the title or description) is shown empty. So if you look in the source when showing the feed in an browser you'll find <title></title>

If there is more information needed please ask.

2
  • By the way, don't mix English and German. "datum" sounds cool, but is the singular of data. You are looking for "date". Greetings from Münster(Westf.) Commented May 4, 2009 at 15:41
  • Can you give us the exact value of the database value (as in base64_encode($row["title"])?) Why do you think this value contains a Euro sign? (I.e. how did you enter it, does it show up as "€" anywhere else?) Commented May 7, 2009 at 22:17

6 Answers 6

15
+50

The problem is your outputting code; change

echo '<title><![CDATA[$title]]></title>';

to

echo '<title><![CDATA[' . $title . ']]></title>';

As a side note, please mind the following: Do not answer your own question with a follow-up, but edit the original one. Do not use regexps for no good reason. Do not guess.

Instead, do what you should have done all along: Wrap the title in htmlentitites and be done, as in:

echo '<title>' . htmlentities($title, ENT_NOQUOTES, [encoding]) . '</title>';

Replace [encoding] with the character encoding you are using. Most likely, this is 'UTF-8'. This is necessary because php(<6) uses ISO-8859-1 by default and there is no way to express e.g. the Euro sign in that encoding. For further information, please refer to this well-written introduction.

I also suggest you read about XML. Start with the second chapter.

Sign up to request clarification or add additional context in comments.

5 Comments

I can still see your answer, but thanks for the effort. I'm sorry, but my answer was incomplete and I should have seen that right away. You also need to specify a character encoding. Edited the answer.
Well, still problems with the euro sign. $s.=str_replace( array("[[title]]","[[link]]","[[description]]","[[datum]]"), array(htmlentities($row["title"],ENT_NOQUOTES,"UTF-8",false),$url,$description,$datum), $item_template); This gives back an empty title.
@sanders: Why are you setting the forth parameter(double_encode) of htmlentities? Also, please check that your encoding is really UTF-8
I removed the fourth parameter. I return it as utf8 like return utf8_encode($s); But stil no result
@sanders Sorry for the long delay. The problem is the encoding of $s. Try $row["title"] = "\xe2\x82\xac"; /* € in UTF-8*/ . This should yield <title>&euro;</title>. Then look how your $s differs from € in UTF-8 and trace it to the original problem.
3

Use htmlspecialchars() instead of htmlentities().

RSS/ATOM feeds are not HTML, so you cant use HTML entities in them. XML has only five entities defined by default, so you can’t use &euro;. Since you’re using UTF — use literal euro sign, without conversion (no htmlentities), but with escaping other sensitive characters (htmlspecialchars).

And this would be completely valid RSS/XML. If this doesn’t solve the problem it means, that it lies somewhere else (please provide me with generated raw-source of the RSS for more help).

Comments

1

Which programming language or environment do you use? For instance, in PHP the single quotes prevent evaluating the variables inside.

Otherwise, in this case you don't really need those quotes. May be you were confused by the array syntax of PHP.

So you'd better write:

<title><![CDATA[$title]]></title>

Comments

0

I do not understand why you should use an encoding function. When a 3rd party takes your content, there will be no idea how to decode that string. I think that - you should use CDATA for tags that may break de XML - use well defined libraries to write XML . For PHP: DomDocument or XML Writer (http://php.net/manual/en/book.xmlwriter.php)

Comments

-1

I believe RSS Profile does not allow it: this document states that title holds, character data which is further defined as follows.

Comments

-1

This article may be helpful for information about the euro sign and support in various contexts. Some of the suggestions from that article include using &#8364; or &euro; or just replacing the sign with the word "euro." Good luck!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.