6

I have some strings in a file that are already escaped. So the content of the file looks like this:

Hello\nWorld. This is\tGreat.

When I read the file, I get \n as two different characters instead of one.

How can I convert an escaped string to a non-escaped one?

1
  • Can it contain anything C# string literal can contain, like Unicode escape sequences? What about quotation marks? Commented Jul 8, 2011 at 19:00

5 Answers 5

8

based on @deAtog 's code, i made some minor additions

  • support \U00000000 format chars
  • simplify the hex conversions somewhat

    string UnEscape(string s)
    {
        StringBuilder sb = new StringBuilder();
        Regex r = new Regex("\\\\[abfnrtv?\"'\\\\]|\\\\[0-3]?[0-7]{1,2}|\\\\u[0-9a-fA-F]{4}|\\\\U[0-9a-fA-F]{8}|.");
        MatchCollection mc = r.Matches(s, 0);
    
        foreach (Match m in mc)
        {
            if (m.Length == 1)
            {
                sb.Append(m.Value);
            }
            else
            {
                if (m.Value[1] >= '0' && m.Value[1] <= '7')
                {
                    int i = Convert.ToInt32(m.Value.Substring(1), 8);
                    sb.Append((char)i);
                }
                else if (m.Value[1] == 'u')
                {
                    int i = Convert.ToInt32(m.Value.Substring(2), 16);
                    sb.Append((char)i);
                }
                else if (m.Value[1] == 'U')
                {
                    int i = Convert.ToInt32(m.Value.Substring(2), 16);
                    sb.Append(char.ConvertFromUtf32(i));
                }
                else
                {
                    switch (m.Value[1])
                    {
                        case 'a':
                            sb.Append('\a');
                            break;
                        case 'b':
                            sb.Append('\b');
                            break;
                        case 'f':
                            sb.Append('\f');
                            break;
                        case 'n':
                            sb.Append('\n');
                            break;
                        case 'r':
                            sb.Append('\r');
                            break;
                        case 't':
                            sb.Append('\t');
                            break;
                        case 'v':
                            sb.Append('\v');
                            break;
                        default:
                            sb.Append(m.Value[1]);
                            break;
                    }
                }
            }
        }
    
        return sb.ToString();
    }
    
Sign up to request clarification or add additional context in comments.

Comments

4

You can try using System.Text.RegularExpressions.Regex.Unescape.

There's also an entry on the MSDN forums.

See also How can I Unescape and Reescape strings in .net? .

2 Comments

Quick correction: that's System.Text.RegularExpressions.Regex.Unescape. Please revise.
Regex.Unescape is unapplicable here, it is used to unescape regex control characters only.
3

Like you I was unable to find a decent solution to this problem. While you can certainly use String.Replace, the performance and speed of this solution is terrible. Furthermore, it's hard to support octal and Unicode escape sequences via this method. A much better alternative is to use a simple RegEx parser. Here's a method that will properly un-escape any string given. It supports standard escape sequences, octal escape sequences, and unicode escape sequences.

string UnEscape(string s) {
    StringBuilder sb = new StringBuilder();
    Regex r = new Regex("\\\\[abfnrtv?\"'\\\\]|\\\\[0-3]?[0-7]{1,2}|\\\\u[0-9a-fA-F]{4}|.");
    MatchCollection mc = r.Matches(s, 0);

    foreach (Match m in mc) {
        if (m.Length == 1) {
            sb.Append(m.Value);
        } else {
            if (m.Value[1] >= '0' && m.Value[1] <= '7') {
                int i = 0;

                for (int j = 1; j < m.Length; j++) {
                    i *= 8;
                    i += m.Value[j] - '0';
                }

                sb.Append((char)i);
            } else if (m.Value[1] == 'u') {
                int i = 0;

                for (int j = 2; j < m.Length; j++) {
                    i *= 16;

                    if (m.Value[j] >= '0' && m.Value[j] <= '9') {
                        i += m.Value[j] - '0';
                    } else if (m.Value[j] >= 'A' && m.Value[j] <= 'F') {
                        i += m.Value[j] - 'A' + 10;
                    } else if (m.Value[j] >= 'a' && m.Value[j] <= 'f') {
                        i += m.Value[j] - 'a' + 10;
                    }
                }

                sb.Append((char)i);
            } else {
                switch (m.Value[1]) {
                    case 'a':
                        sb.Append('\a');
                        break;
                    case 'b':
                        sb.Append('\b');
                        break;
                    case 'f':
                        sb.Append('\f');
                        break;
                    case 'n':
                        sb.Append('\n');
                        break;
                    case 'r':
                        sb.Append('\r');
                        break;
                    case 't':
                        sb.Append('\t');
                        break;
                    case 'v':
                        sb.Append('\v');
                        break;
                    default:
                        sb.Append(m.Value[1]);
                        break;
                }
            }
        }
    }

    return sb.ToString();
}

1 Comment

Thank you, this was great, i made some minor improvements to support \U00000000 format chars and simplify the hex conversions somewhat. I have submitted my version below but feel free to incorporate it into yours instead.
2

you could do something like:

string str = str.Replace(@"\n","\n");

update:

Obviously this is a workaround as the scenario is "un natural" by itself. The Regex.Unescape solution is unapplicable here as it is intended to use for unescaping regex control characters, and not new lines etc.

In order to support other relevant characters one can write a replacing function like this one:

public string ReEscapeControlCharacters(string str) {
   return str.Replace(@"\n","\n").Replace(@"\r","\r").Replace(@"\t","\t");
}

3 Comments

This is a workaround. What about \t and the others hidden and control characters? Should he do it for all others?
Obviously it is a workaround... I am updating the answer with further detail
If the string contains an escaped \ followed by an n this will give the wrong result \\next
-3

Try this:

String replaced = startstring.Replace(System.Environment.NewLine, desirevalue);

This have to be valid only for "\n".

1 Comment

That would be if it was an interpreted \n, not an escaped literal "slash-enn".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.