0

Reviewing some old code of mine, and wondered if there was a better way to create a literal string with unicode symbols...

I have a REST interface that requires certain escaped characters; for example, a property called username with value of john%foobar+Smith that must be requested like this:

{"username":"john\u0025foobar\u002bSmith"}

My c# method to replace certain characters like % and + is pretty basic:

public static string EncodeUTF8(string unescaped) {
    string utf8_ampersand = @"\u0026";
    string utf8_percent = @"\u0025";
    string utf8_plus = @"\u002b";
    return unescaped.Replace("&", utf8_ampersand).Replace("+", utf8_plus).Replace("%", utf8_percent);
}

This seems an antiquated way to do this; surely there is some single line method using Encoding that would output literal UTF code, but I can't find any examples that aren't essentially replace statements like mine... is there a better way?

11
  • 1
    There is nothing about UTF8 in this question. It is Unicode. Commented Apr 6, 2015 at 16:46
  • Don't do that. You should use a JSON serializer. Commented Apr 6, 2015 at 16:47
  • @SLaks You don't need to escape + and % in JSON Commented Apr 6, 2015 at 16:47
  • It's a limitation of the back-end web service, I know json doesn't ordinarily need to be escaped like this. Commented Apr 6, 2015 at 16:54
  • @brnwdrng You could use a regex with a replacer method, but I don't think it would be a big win. Commented Apr 6, 2015 at 16:55

1 Answer 1

2

You could do it with Regex:

static readonly Regex ReplacerRegex = new Regex("[&+%]");

public static string Replace(Match match)
{
    // 4-digits hex of the matched char
    return @"\u" + ((int)match.Value[0]).ToString("x4");
}

public static string EncodeUTF8(string unescaped)
{
    return ReplacerRegex.Replace(unescaped, Replace);
}

But i don't suggest it very much (unless you have tens of replaces). I do think it would be slower, and bigger to write.

Sign up to request clarification or add additional context in comments.

4 Comments

Yeah that's more handsome code, but I agree it's overkill; my version is easier to interpret at a glance, was just hoping a slicker native method had arrived in the three years since I wrote it - guess not - but thanks!
@brnwdrng The only advantage is if you have tens of characters to replace: the unicode code is automatically calculated, so you don't have to write it and risk writing it wrong.
Another advantage of this approach (assuming you'd use dictionary to map match to replacement instead of String.Format) that it allows replacing characters with different values in one pass.
Right; I do have a couple of service requests that might conceivably have 10's of replacements... this would appear safer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.