0

I'm using json.net to read data sent in json format from a server. The server encodes all string-type data it sends in json as utf-8.

Now to read the data in c# I do something like this: string s = json.Value<string>("data");

I assume the string s is now in utf-8 format, whereas the default encoding for strings in c# is utf-16 (unicode).

To convert the string to unicode, would this be correct?

byte[] bytes = Encoding.Unicode.GetBytes(s);
string unicode = Encoding.UTF8.GetString(bytes);

What I want (I think) is the raw bytes from s and then pass that to the utf-8 decoder to get unicode, but I'm not sure what exactly Encoding.Unicode.GetBytes gives me, or what I should use instead.

5
  • You can't double parse it. But what is wrong with your string in the first place, since all strings in .NET are UTF16? Commented Apr 4, 2016 at 14:02
  • Well the string is received as utf-8, I assumed I had to do something, but if json.net automatically handles this then it's ok as you say, but I don't know if that's the case. Commented Apr 4, 2016 at 14:03
  • I think you need to swap it. Encoding.UTF8.GetBytes(s) and then Encoding.Unicode.GetString(bytes). This way you will convert the UTF8 to Unicode. Commented Apr 4, 2016 at 14:05
  • In your question you have a variable called json -- how does that get populated? Is there some kind of stream being read from a web response? If so, you want to pass Encoding.UTF8 to the stream reader. Commented Apr 4, 2016 at 14:34
  • 1
    You are right, I just discovered the data is read from the socket using Encoding.Default.GetString which isn't exactly optimal. Using Encoding.UTF8there directly should fix all problems with utf-8 encoded strings. Commented Apr 4, 2016 at 14:49

1 Answer 1

1

There is no need to convert anything, since string objects in .NET are encoded in UTF-16.

If there is anything to change, you should change something where JSON.NET deserializes the string: you can't double parse it. The incoming JSON string is already interpreted for a specific encoding. You can't go back from there without the original bytes.

Sign up to request clarification or add additional context in comments.

3 Comments

If the json data that is received looks like this: { "data" : "strö" } it definitely needs to be converted becase it will look exactly like that in the c# string as well.
Are you sure all goes well on the other end?
You were correct; the string that was parsed by json was created from the raw data from the socket using Encoding.Default instead of Encoding.UTF8.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.