2

I was really discouraged by java's string encoding. There are many auto conversions in it. and I can't found the regular. Anyone have good idea? for example: In a jsp page, it has such link

http://localhost:8080/helloworld/hello?world=凹ㄉ

And then we need to process it, so we do this:

String a = new String(request.getParameter("world").toString().getBytes("ISO-8859-1"), 
                      "UTF-8");
a = "http://localhost/" + a;

And when I debug it, I found a is right.

And then I pass this to a session object: request.getSession().setAttribute("hello", a);

Later in a jsp page with encoding "Big5", and i try to get the attribute and display, And i found the characters "凹ㄉ" are corrupted.

How can I solve this?

3 Answers 3

12

That is not how you convert between character sets. What you need to be worrying about is this part:

 request.getParameter("world").toString().getBytes("ISO-8859-1")

Once you have it as a string, it is stored internally as 16 bit unicode. Getting it as bytes and then telling java to treat those bytes as if they were UTF-8 is not going to do anything good.

If you found it to be fine, that is just a coincidence. Once you call that getParameter("world").toString() you have your unicode string. The further decoding and encoding will just break certain characters, it just happens to not break yours.

The question is how you get that attribute to display later? You say the jsp page's encoding is not unicode, but rather Big5, so what are you doing to get that string out of the attribute map and put it on that page? That is the likely source of the problem. Given the misunderstanding about how to handle the character conversion in getting the parameter, it would be likely that there are some mistakes on that Big5 page as well.

By the way, do you really need to use Big5? Would UTF-16 work (if not UTF-8)? It could certainly remove some headaches.

Sign up to request clarification or add additional context in comments.

Comments

0

The way I handle encodings in Java is by not allowing text encoded in something other than UTF-8 to be uploaded to my site. This is how I do it:

try {
    CharsetDecoder charsetDecoder = StandardCharsets.UTF_8.newDecoder();
    charsetDecoder.onMalformedInput(CodingErrorAction.REPORT);

    return IOUtils.toString(new InputStreamReader(new FileInputStream(filePath), charsetDecoder));
}
catch (MalformedInputException e) {
    // throw an exception saying the file was not saved with UTF-8 encoding.
}

I recommend reading https://www.baeldung.com/java-char-encoding. It contains a very good summary of what you need to know regarding String encoding in Java.

Comments

-1

The following code will work

String a = new String(request.getParameter("world").toString().getBytes("ISO-8859-1"), 
                      "UTF-16");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.