Given that I have following function
static void fun(String str) {
System.out.println(String.format("%s | length in String: %d | length in bytes: %d | bytes: %s", str, str.length(), str.getBytes().length, Arrays.toString(str.getBytes())));
}
on invoking fun("ó"); its output is
ó | length in String: 1 | length in bytes: 2 | bytes: [-61, -77]
so it means character ó needs 2 bytes to represent and as per Character class documentation too default is UTF-16 in java, considering that when I do following
System.out.println(new String("ó".getBytes(), StandardCharsets.UTF_16));// output=쎳
System.out.println(new String("ó".getBytes(), StandardCharsets.ISO_8859_1));// output=ó
System.out.println(new String("ó".getBytes(), StandardCharsets.US_ASCII));// output=��
System.out.println(new String("ó".getBytes(), StandardCharsets.UTF_8));// output=ó
System.out.println(new String("ó".getBytes(), StandardCharsets.UTF_16BE));// output=쎳
System.out.println(new String("ó".getBytes(), StandardCharsets.UTF_16LE));// output=돃
Why any of UTF_16, UTF_16BE, UTF_16LE charset not able to decode bytes properly, given that bytes are representing a 16 bit length character? And how UTF-8 is able decode it properly given that UTF-8 consider each character only 8 bit long so it should have printed 2 chars(1 char for each byte) like in ISO_8859_1.