I'm trying to write a Java equivalent to PHP's ord():
public static int ord(char c) {
return (int) c;
}
public static int ord(String s) {
return s.length() > 0 ? ord(s.charAt(0)) : 0;
}
This seems to works well for characters with an ordinal value of up to 127, i.e. within ASCII. However, PHP returns 195 (and higher) for characters from the extended ASCII table or beyond. A comment by Mr. Llama to the answer on a related question explains this as follows:
To elaborate, the reason é showed ASCII 195 is because it's actually a two-byte character (UTF-8), the first byte of which is ASCII 195. – Mr. Llama
I hence changed my ord(char c) method to mask out all but the most significant byte:
public static int ord(char c) {
return (int) (c & 0xFF);
}
Still, the results differ. Two examples:
ord('é')(U+00E9) gives195in PHP while my Java function yields233ord('⸆')(U+2E06) gives226in PHP while my Java function yields6
I manged to get the same behavior for the method that accepts a String by first turning the String into a byte array, explicitly using UTF-8 encoding:
public static int ord(String s) {
return s.length() > 0 ? ord((char)s.getBytes(StandardCharsets.UTF_8)[0]) : 0;
}
However, using the method that accepts a char still behaves as before and I could not yet find a solution for that. In addition, I don't understand why the change actually worked: Charset.defaultCharset() returns UTF-8 on my platform anyway. So...
- How can I make my function behave similar to that of PHP?
- Why does the change to
ord(String s)actually work?
Explanatory answers are much appreciated, as I want to grasp what's going on exactly.
é: ascii-code.com. 195 is the code forÃ, so who knows WTF is going on under-the-hood in PHP.ord()does not work correctly with characters outside the ASCII range. However, I'm trying to replicate that behavior.