42

How do I decode this string 'Sch\u00f6nen' (@"Sch\u00f6nen") in C#, I've tried HttpUtility but it doesn't give me the results I need, which is "Schönen".

1
  • Have you looked at the System.Text.Encoding classes? You might be able to use the UTF-8 encoding to decode the Unicode string content Commented Feb 15, 2012 at 23:50

3 Answers 3

94

Regex.Unescape did the trick:

System.Text.RegularExpressions.Regex.Unescape(@"Sch\u00f6nen");

Note that you need to be careful when testing your variants or writing unit tests: "Sch\u00f6nen" is already "Schönen". You need @ in front of string to treat \u00f6 as part of the string.

Sign up to request clarification or add additional context in comments.

1 Comment

I know this comment is old but you have fixed my issue with your comment about adding @ before the unicode. Thank you so much.
4

If you landed on this question because you see "Sch\u00f6nen" (or similar \uXXXX values in string constant) - it is not encoding. It is a way to represent Unicode characters as escape sequence similar how string represents New Line by \n and Return by \r.

I don't think you have to decode.

string unicodestring = "Sch\u00f6nen";
Console.WriteLine(unicodestring);

Schönen was outputted.

2 Comments

Well it shows up as "Sch\u00f6nen" when I output it on the windows phone emulator so it needs to be escaped. The user who answered my question and then deleted his post got the correct answer, I don't know why he deleted it.
I think you two misunderstood each other :) @findcaiyzh, if you update your example with string unicodestring = "Sch\\u00f6nen"; you'll get the case @M_K is talking about. This scenario is usual when working with JSON result, retrieved from a remote endpoint.
1

Wrote a code that covnerts unicode strings to actual chars. (But the best answer in this topic works fine and less complex).

string stringWithUnicodeSymbols = @"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, @"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
    try
    {
        if (s.Length == 4)
        {
            var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
            outString += decoded;
        }
        else
        {
            outString += s;
        }
    }
    catch (Exception e)
    {
        outString += s;
    }
}

1 Comment

With the length==4 check, pretty sure this could give false results for something like \uAAAAAAA\uAAAA that has length 4 strings between unicode chars. The core conversion is decent enough to write a parser from, though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.