9

For example, in Emoji Char set, U+1F601 is the unicode value for "GRINNING FACE WITH SMILING EYES", and \xF0\x9F\x98\x81 is the UTF-8 bytes value for this character.

\xE2\x9D\xA4 is for heavy black heart, and the unicode is U+2764.

So my question is, if I have a byte array with value (0xF0, 0x9F, 0x98, 0x81, 0xE2, 0x9D, 0xA4), then how I can convert it into Unicode value?

For the above result, what I want is a String array with value "1F601" and "2764".

I know I can write a complex method to do this work, but I hope there is already a library to do this work.

4
  • 1
    You can refer this question.this question is already answered. Commented Sep 4, 2013 at 6:16
  • 1
    Do you just need a Unicode String or do you actually need the value 1F601? Because for the latter you'll need String.codePointAt() in addition the producing the String as explained in the answers. Commented Sep 4, 2013 at 6:24
  • @JoachimSauer Yes, this is what I want. Thanks for pointing out the method codePointAt. Here I updated my question to make it clear. Can you have a look again? Thanks. Commented Sep 4, 2013 at 10:38
  • Could you explain how to convert the UTF-8 values into the unicode? When I use the code below, it gives me the emoji instead of the unicode values such as U+1F601. @XWang Commented Jun 5, 2016 at 15:20

4 Answers 4

8

So my question is, if I have a byte array with value (0xF0, 0x9F, 0x98, 0x81), then how I can convert it into Unicode value?

Simply call the String constructor specifying the data and the encoding:

String text = new String(bytes, "UTF-8");

You can specify a Charset instead of the name of the encoding - I like Guava's simple Charsets class, which allows you to write:

String text = new String(bytes, Charsets.UTF_8);

Or for Java 7, use StandardCharsets without even needing Guava:

String text = new String(bytes, StandardCharsets.UTF_8);
Sign up to request clarification or add additional context in comments.

8 Comments

If you use Java 7's java.nio.charset.StandardCharsets you don't even need Guava
@artbristol: Thanks - I had a quick look, but missed that. Will edit it in.
@JonSkeet please what's the equivalent in .net or c#
@CharlesO: You'd use Encoding.UTF8.GetBytes(text), and in reverse, Encoding.UTF8.GetString(bytes).
@CharlesO: Okay - I was giving the .NET equivalent of the code in my question, which is what I thought you were after. So do you now have everything you need?
|
1

Simply use String class:

byte[] bytesArray = new byte[10]; // array of bytes (0xF0, 0x9F, 0x98, 0x81)

String string = new String(bytesArray, Charset.forName("UTF-8")); // covert byteArray

System.out.println(string); // Test result

Comments

0

Here is an example using InputStreamReader:

InputStream inputStream = new FileInputStream("utf-8-text.txt");
Reader      reader      = new InputStreamReader(inputStream,
                                                Charset.forName("UTF-8"));

int data = reader.read();
while(data != -1){
    char theChar = (char) data;
    data = reader.read();
}

reader.close();

Ref:Java I18N example

Comments

0

Here is a function to convert UNICODE (ISO_8859_1) to UTF-8

public static String String_ISO_8859_1To_UTF_8(String strISO_8859_1) {
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < strISO_8859_1.length(); i++) {
  final char ch = strISO_8859_1.charAt(i);
  if (ch <= 127) 
  {
      stringBuilder.append(ch);
  }
  else 
  {
      stringBuilder.append(String.format("%02x", (int)ch));
  }
}
String s = stringBuilder.toString();
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
    data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
                         + Character.digit(s.charAt(i+1), 16));
}
String strUTF_8 =new String(data, StandardCharsets.UTF_8);
return strUTF_8;
}

TEST

String strA_ISO_8859_1_i = new String("الغلاف".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1);

System.out.println("ISO_8859_1 strA est = "+ strA_ISO_8859_1_i + "\n String_ISO_8859_1To_UTF_8 = " + String_ISO_8859_1To_UTF_8(strA_ISO_8859_1_i));

RESULT

ISO_8859_1 strA est = Ø§ÙØºÙا٠String_ISO_8859_1To_UTF_8 = الغلاف

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.