2

I have a program I run with mvn exec:java (my main file is encoded in utf-8 and the default charset of my system is windows-1252)

System.out.println(Charset.defaultCharset()); //print windows-1252
String s = "éàè";
System.out.println(new String(s.getBytes(Charset.forName("UTF-8")))); //OK Print éàè
System.out.println(new String(s.getBytes(Charset.forName("windows-1252")))); //Not OK Print ▒▒▒

I don't understand why the first print works, according to the documentation getBytes encodes the String into a sequence of bytes using the given charset and the String constructor constructs a new String by decoding the specified array of bytes using the platform's default charset

So the first print encodes in UTF-8 and then decode with the platform's default charset which is windows-1252, how could this workd ? It cannot decode the encoded utf-8 byte array using the platform charset windows-1252.

The second print is wrong, I don't understand why. As my file is encoded in utf-8 and the platform charset is windows-1252, my intention is to encode the String with windows-1252 charset so I call s.getBytes(Charset.forName("windows-1252")) and then create a String with the previous result but it doesn't work

4
  • Try PrintStream out = new PrintStream(System.out, true, "windows-1252"); out.println(s); Commented Mar 3, 2016 at 17:00
  • As a sidenote, the MS-DOS default charset is not 1252, see here : docs.oracle.com/javase/7/docs/technotes/guides/intl/… Commented Mar 3, 2016 at 17:01
  • PrintStream out = new PrintStream(System.out, true, "windows-1252"); doesn't work but PrintStream out = new PrintStream(System.out, true, "utf-8"); does Commented Mar 3, 2016 at 17:04
  • @Berger you are right, I am using MinGW to execute my program, with MS-DOS the program works correctly Commented Mar 3, 2016 at 17:13

1 Answer 1

3

The String value éàè is encoded in UTF-8 as byte octets 0xC3 0xA9 0xC3 0xA0 0xC3 0xA8. Those same byte octets interpreted as Windows-1252 are the String value éÃ<nbsp>è (where <nbsp> is a non-breaking space character, Unicode codepoint U+00A0).

In the first example, you are converting a String to the above UTF-8 bytes, and then you are converting the bytes back to a String using Windows-1252 instead of UTF-8. So you should be getting a new String value of éÃ<nbsp>è, not éàè. You are then writing that String to the console, so it gets encoded using Windows-1252 back to byte octets 0xC3 0xA9 0xC3 0xA0 0xC3 0xA8, which should be displayed as éÃ<nbsp>è (or something similar to it) if the console is displaying the bytes as-is. On the other hand, if the console is configured for UTF-8 instead, those bytes would display as éàè when interpreted as UTF-8.

In the second example, since you are using Windows-1252 for both encoding and decoding, and the particular characters in question are supported by Windows-1252, you should end up with the original String value éàè before writing it to the console. If that String gets encoded to bytes using Windows-1252, and the console is configured for UTF-8, it would make sense why you don't see éàè displayed. The String value éàè is encoded in Windows-1252 as byte octets 0xE9 0xE0 0xE8, which is not a valid UTF-8 byte octet sequence.

In short, the behavior you are seeing would happen when your console is configured to interpret outgoing bytes as UTF-8, but you are not giving it proper UTF-8 encoded bytes as output.

Sign up to request clarification or add additional context in comments.

4 Comments

This is a restatement of the question, not an answer. If you are asserting that his console is set to UTF-8, you should lead off with that.
I did not just restate the question. The question is asking to understand how the encodings are behaving differently than expected. I think I answered that.
The lede was buried. It makes more sense now with the concluding paragraph.
exact @RemyLebeau, I forgot to take in consideration the encoding of the console. Effectively, the result differs whether I use MinGW console or MS-DOS

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.