0

I have this UI that needs unicode values to display superscript Characters. the data is coming inbound has html code.The only problem I can see is that it needs an extra backslash. I am passing a string of "®" into EncodeNonAsciiCharacters.

Is there any way to return \u00AE and not \\u00AE

static string EncodeNonAsciiCharacters(string value)
{
    StringBuilder sb = new StringBuilder();
    foreach (char c in value)
    {
        if (c > 127)
        {
            string encodedtext = ((int)c).ToString("x4");
            //string encodedValue = "\\u" + encodedtext.ToUpper();
            string encodedValue = @"\u" + encodedtext.ToUpper();
                sb.Append(encodedValue);
        }
        else
        {
            sb.Append(c);
        }
    }
    return sb.ToString();
}
4
  • Aren't C# strings UTF-16 already? Commented Mar 18, 2019 at 16:22
  • This is fundamentally wrong because C# chars are 16-bit UTF-16LEs, not 8-bit UTF8s. Commented Mar 18, 2019 at 16:22
  • You're not "returning" \\u00AE. You need to write \\u in your code editor because \ is an escape character in C# string literals. You could write @"\u" instead if you wanted. Commented Mar 18, 2019 at 16:33
  • When you see "\\u1234" in the debugger, it actually represents the string @"\u1234". It escapes the backslash by doubling it up Commented Mar 18, 2019 at 16:49

1 Answer 1

6

I have written a program to demonstrate your requirement. You do not need to escape string literals if you use @ before your string. It means to interpret the string literally (that is, you cannot escape any characters within the string if you use the @ prefix). It enhances readability in cases where it can be used.

using System;
using System.Text;

public class Program
{
    public static void Main()
    {
            string value="⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎ ®";
            StringBuilder sb = new StringBuilder();
            foreach (char c in value)
            {
                if (c > 127)
                {
                    string encodedtext = ((int)c).ToString("x4");
                    string encodedValue = @"\u" + encodedtext.ToUpper();
                    sb.Append(encodedValue);
                    //Console.WriteLine(encodedValue);
                }
                else
                {
                    sb.Append(c);
                }
            }
            Console.WriteLine(sb.ToString());
    }
}

Output:

\u2078 
\u2079 
\u207A 
\u207B 
\u207C 
\u207D 
\u207E 
\u2080 
\u2081 
\u2082 
\u2083 
\u2084 
\u2085 
\u2086 
\u2087 
\u2088 
\u2089 
\u208A 
\u208B 
\u208C 
\u208D 
\u208E 
\u00AE
Sign up to request clarification or add additional context in comments.

2 Comments

should the WriteLine display the symbol ? And not the encode value?
It won't display the symbol since you are converting the symbol to its hex value and then prefixing the \u to the encoded value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.