8

I am developing a simple web application using java/jsp/tomcat/mysql, and the most problem lies on the character encoding because I need to deal with UTF-8 encoding instead of the default 8851.

First of I'd like to describe my program structure. I am using a Servlet called Controller.java to handle all request. So in web.xml, I have a Controller servlet which takes all request from *.do.

Then this Controller will dispatch the request based on the requested URL, for example, if client asks for register.do, Controller will dispatch the request to Register.java.

And in the Register.java, there is a method which takes the request as parameter, namely:

public String perform(HttpServletRequest request) {
    do something with the request...
}

So the problem is if I want to print something in UTF-8 inside this method, it will give random characters. For example, I have an Enum which stores several constants, one of the properties the Enum has is its name in Traditional Chinese. If I print it in

public static void main(Stirng[] args{
    System.out.println(MyEnum.One.getChn());
    logger.info(MyEnum.One.getChn());
}

This is printed correctly in Chinese. However, if I put the exact code inside the method dealing with HttpServletRequest:

public String perform(HttpServletRequest request) {
    System.out.println(MyEnum.One.getChn());
    logger.info(MyEnum.One.getChn());
}

They are printed as random characters, but I can see from the debug window (eclipse) that the variables are holding correct Chinese characters.

So, the same situation happens when I want to store the value from request.getParameter(). In the debug window, I can see the variable is holding correct characters, but one I print it out or try to store it in the database, it is random characters.

I don't know why the behavior acts like this, and this is blocking me from reading submitted form values and store them into database. Could someone give some hints on this?

Great thanks.

4
  • 1
    I can't understand whether you are worried about corrupted output on the server's console and logs, or corrupted output in the resulting response to the browser. Can you clarify? Commented Jun 7, 2012 at 17:46
  • 2
    What is the value of System.getProperty("file.encoding")? Commented Jun 7, 2012 at 17:48
  • What I am worried about is that in the debug window I can see correct encoding, but when I pass the variable to my database access object and store it to the db, it will become random characters. So then I found that in the method of dealing with requests, even simply print out UTF-8 Enum value doesn't work. Commented Jun 7, 2012 at 18:38
  • How do you know that the value that is actually stored in the database is junk, when it could be that the value is corrupted during retrieval from the database? Commented Jun 7, 2012 at 19:18

2 Answers 2

12

Here is a small tutorial what you need to do to make UTF-8 work in your web application:

You have to implement Filter in your application for character encoding:

public class CharacterEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig filterConfig)
            throws ServletException {

    }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain)
            throws IOException, ServletException {
        servletRequest.setCharacterEncoding("UTF-8");
        servletResponse.setContentType("text/html; charset=UTF-8");
        filterChain.doFilter(servletRequest, servletResponse);
    }

    @Override
    public void destroy() {

    }
}

You have to make sure that your tomcat's server.xml's file connector element has URIEncoding attribute which value is UTF-8.

<Connector port="8080" 
           protocol="HTTP/1.1"
           connectionTimeout="20000"
           URIEncoding="UTF-8"
           redirectPort="8443"/>

Also you need to specify this in every JSP page:

<%@page contentType="text/html" pageEncoding="UTF-8"%>
Sign up to request clarification or add additional context in comments.

3 Comments

Actually, instead of the filter, I think you can put this in your JSP's: <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
Stijn de Witt using only your answer worked for me! Didn't use any filters. Thanks.
the universal advice))
6

If you need to use UTF-8 encoding (and really, everybody should be going this these days), then you can follow the "UTF-8 everywhere HOWTO" found in the Tomcat FAQ:

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

Remember that you also need to support UTF-8 in your database's text fields.

Also remember that sometimes "printing" a String with non-ASCII characters in it to a log file or the console can be affected by

  1. The character encoding of the output stream
  2. The character encoding of the file reader (e.g. cat/less/vi)
  3. The character encoding of the terminal

You might be better off writing the values to a file and then using a hex editor to examine the contents to be sure that you are getting the byte values you are looking for.

1 Comment

UTF-8 everywhere in Eclipse: Unicode/UTF-8 in your Eclipse Java projects

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.