3

I have some HTML code that I store in a Java.lang.String variable. I write that variable to a file and set the encoding to UTF-8 when writing the contents of the string variable to the file on the filesystem. I open up that file and everything looks great e.g. → shows up as a right arrow.

However, if the same String (containing the same content) is used by a jsp page to render content in a browser, characters such as → show up as a question mark (?)

When storing content in the String variable, I make sure that I use:

String myStr = new String(bytes[], charset)  

instead of just:

String myStr = "<html><head/><body>&rarr;</body></html>";

Can someone please tell me why the String content gets written to the filesystem perfectly but does not render in the jsp/browser?

Thanks.

5
  • 1
    It looks like you didn't provide correct charset in headers of your page. Try to change encoding in your browser to UTF-8 Commented Nov 16, 2009 at 23:18
  • 1
    You shouldn't change the encoding in the browser. You should rather instruct the browser to use the right encoding by setting the response encoding accordingly. Commented Nov 16, 2009 at 23:20
  • If, as your other comment suggests, you don't see the characters correctly on the server side, then the next thing to examine is how you get to that byte array. Are you reading a file? Do you set the encoding? I'm taking for granted that charset equals "UTF-8". Commented Nov 16, 2009 at 23:26
  • I hava a class that generates html code and stores it in string variable. The html code is first stored in a String (no encoding defined), then I get the byte[] from this string, create a new string variable, pass the byte[] to the new string variable along with the correct encoding. Commented Nov 16, 2009 at 23:40
  • That double conversion to String is completely unnecessary. String stores its state as 16 bit unicode internally, all you are doing is a conversion to and from a byte Array. Assuming you call getBytes() with "UTF-8" as the encoding (if not that is your problem right there) let's focus on the first string. How are you generating that, are you reading from any binary sources (a file, a byte array anything else). And more importantly why? That is what JSP's do for you. Commented Nov 16, 2009 at 23:59

3 Answers 3

4

but does not render in the jsp/browser?

You need to set the response encoding as well. In a JSP you can do this using

<%@ page pageEncoding="UTF-8" %>

This has actually the same effect as setting the following meta tag in HTML <head>:

<meta http-equiv="content-type" content="text/html; charset=utf-8">
Sign up to request clarification or add additional context in comments.

2 Comments

When I try to print the contents of the java string variable on the console by using System.out.println(...), I see "?" instead of right arrow so my guess is that JSP gets question marks that is why it displays question marks in the browser. And I think that the problem is within my java code and maybe I have to specify encoding of the String content in some other way.
Then the console should also be configured to use UTF-8. You can find more background information and detailed solutions here: balusc.blogspot.com/2009/05/… Hope this helps.
1

Possibilities:

  1. The browser does not support UTF-8
  2. You don't have Content-Type: text/html; charset=utf-8 in your HTTP Headers.

Comments

0

The lazy developer (=me) uses Apache Common Lang StringEscapeUtils.escapeHtml http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html#escapeHtml(java.lang.String) which will help you handle all 'odd' characters. Let the browser do the final translation of the html entities.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.