2

I want to convert unicode string to UTF8 string. I want to use this UTF8 string in SMS API to send unicode SMS. I want conversion like this tool https://cafewebmaster.com/online_tools/utf8_encode

eg. I have unicode string "हैलो फ़्रेंड्स" and it should be converted into "हà¥à¤²à¥ à¥à¥à¤°à¥à¤à¤¡à¥à¤¸"

I have tried this but not getting expected output

    private string UnicodeToUTF8(string strFrom)
        {
           byte[] bytes = Encoding.Default.GetBytes(strFrom);

           return Encoding.UTF8.GetString(bytes);

        }

and calling function like this

string myUTF8String = UnicodeToUTF8("हैलो फ़्रेंड्स");
3
  • 1
    hrmm ASCII is not utf-8 on the best of days Commented Dec 27, 2018 at 8:11
  • tried this too byte[] bytes = Encoding.Default.GetBytes(myString); myString = Encoding.UTF8.GetString(bytes); Commented Dec 27, 2018 at 8:13
  • @satyender replace Encoding.Default with Encoding.UTF8 and then use the resulting byte[] array as-is, don't pass it back to GetString() at all. Which SMS API are you using exactly? If it supports Unicode properly, it should be taking a Unicode string as input and handle the UTF-8 encoding internally for you Commented Dec 30, 2018 at 6:13

2 Answers 2

8

I don't think this is possible to answer concretely without knowing more about the SMS API you want to use. The string type in C# is UTF-16. If you want a different encoding, it's given to you as a byte[] (because a string is UTF-16, always).

You could 'cast' that into a string by doing something like this:

static string UnicodeToUTF8(string from) {
    var bytes = Encoding.UTF8.GetBytes(from);
    return new string(bytes.Select(b => (char)b).ToArray());
}

As far as I can tell this yields the same output as the website you linked. However, without knowing what API you're handing this string off to, I can't guarantee that this will ultimately work.

The point of string is that we don't need to worry about its underlying encoding, but this casting operation is kind of a giant hack and makes no guarantees that string represents a well-formed string anymore.

If something expects a UTF-8 encoding, it should accept a byte[], not a string.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks its working its returning the expected output
1

Try this:

string output = "hello world";
byte[] bytes1 = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, Encoding.Unicode.GetBytes(output));
byte[] bytes2 = Encoding.Convert(Encoding.Unicode, Encoding.Unicode, Encoding.Unicode.GetBytes(output));
var output1 = Encoding.UTF8.GetString(bytes1);
var output2 = Encoding.Unicode.GetString(bytes2);

You will see that bytes1 is 11 bytes (1 byte per char UTF-8) and bytes2 is 22 bytes (2 bytes per char for unicode)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.