Evaluate escaped string

Question

I have some strings in a file that are already escaped. So the content of the file looks like this:

Hello\nWorld. This is\tGreat.

When I read the file, I get \n as two different characters instead of one.

How can I convert an escaped string to a non-escaped one?

Can it contain anything C# string literal can contain, like Unicode escape sequences? What about quotation marks? — svick
– svick, Commented Jul 8, 2011 at 19:00

mcdrewski · Accepted Answer · 2014-08-24 12:43:30Z

based on @deAtog 's code, i made some minor additions

support \U00000000 format chars

simplify the hex conversions somewhat

string UnEscape(string s)
{
    StringBuilder sb = new StringBuilder();
    Regex r = new Regex("\\\\[abfnrtv?\"'\\\\]|\\\\[0-3]?[0-7]{1,2}|\\\\u[0-9a-fA-F]{4}|\\\\U[0-9a-fA-F]{8}|.");
    MatchCollection mc = r.Matches(s, 0);

    foreach (Match m in mc)
    {
        if (m.Length == 1)
        {
            sb.Append(m.Value);
        }
        else
        {
            if (m.Value[1] >= '0' && m.Value[1] <= '7')
            {
                int i = Convert.ToInt32(m.Value.Substring(1), 8);
                sb.Append((char)i);
            }
            else if (m.Value[1] == 'u')
            {
                int i = Convert.ToInt32(m.Value.Substring(2), 16);
                sb.Append((char)i);
            }
            else if (m.Value[1] == 'U')
            {
                int i = Convert.ToInt32(m.Value.Substring(2), 16);
                sb.Append(char.ConvertFromUtf32(i));
            }
            else
            {
                switch (m.Value[1])
                {
                    case 'a':
                        sb.Append('\a');
                        break;
                    case 'b':
                        sb.Append('\b');
                        break;
                    case 'f':
                        sb.Append('\f');
                        break;
                    case 'n':
                        sb.Append('\n');
                        break;
                    case 'r':
                        sb.Append('\r');
                        break;
                    case 't':
                        sb.Append('\t');
                        break;
                    case 'v':
                        sb.Append('\v');
                        break;
                    default:
                        sb.Append(m.Value[1]);
                        break;
                }
            }
        }
    }

    return sb.ToString();
}

Community · Accepted Answer · 2017-05-23 10:33:58Z

4

You can try using System.Text.RegularExpressions.Regex.Unescape.

There's also an entry on the MSDN forums.

See also How can I Unescape and Reescape strings in .net? .

edited May 23, 2017 at 10:33

CommunityBot

11 silver badge

answered Jul 8, 2011 at 18:52

Brad Christie

102k16 gold badges160 silver badges200 bronze badges

2 Comments

Adriano Carneiro Over a year ago

Quick correction: that's System.Text.RegularExpressions.Regex.Unescape. Please revise.

Variant Over a year ago

Regex.Unescape is unapplicable here, it is used to unescape regex control characters only.

deAtog · Accepted Answer · 2014-09-29 21:00:42Z

Like you I was unable to find a decent solution to this problem. While you can certainly use String.Replace, the performance and speed of this solution is terrible. Furthermore, it's hard to support octal and Unicode escape sequences via this method. A much better alternative is to use a simple RegEx parser. Here's a method that will properly un-escape any string given. It supports standard escape sequences, octal escape sequences, and unicode escape sequences.

string UnEscape(string s) {
    StringBuilder sb = new StringBuilder();
    Regex r = new Regex("\\\\[abfnrtv?\"'\\\\]|\\\\[0-3]?[0-7]{1,2}|\\\\u[0-9a-fA-F]{4}|.");
    MatchCollection mc = r.Matches(s, 0);

    foreach (Match m in mc) {
        if (m.Length == 1) {
            sb.Append(m.Value);
        } else {
            if (m.Value[1] >= '0' && m.Value[1] <= '7') {
                int i = 0;

                for (int j = 1; j < m.Length; j++) {
                    i *= 8;
                    i += m.Value[j] - '0';
                }

                sb.Append((char)i);
            } else if (m.Value[1] == 'u') {
                int i = 0;

                for (int j = 2; j < m.Length; j++) {
                    i *= 16;

                    if (m.Value[j] >= '0' && m.Value[j] <= '9') {
                        i += m.Value[j] - '0';
                    } else if (m.Value[j] >= 'A' && m.Value[j] <= 'F') {
                        i += m.Value[j] - 'A' + 10;
                    } else if (m.Value[j] >= 'a' && m.Value[j] <= 'f') {
                        i += m.Value[j] - 'a' + 10;
                    }
                }

                sb.Append((char)i);
            } else {
                switch (m.Value[1]) {
                    case 'a':
                        sb.Append('\a');
                        break;
                    case 'b':
                        sb.Append('\b');
                        break;
                    case 'f':
                        sb.Append('\f');
                        break;
                    case 'n':
                        sb.Append('\n');
                        break;
                    case 'r':
                        sb.Append('\r');
                        break;
                    case 't':
                        sb.Append('\t');
                        break;
                    case 'v':
                        sb.Append('\v');
                        break;
                    default:
                        sb.Append(m.Value[1]);
                        break;
                }
            }
        }
    }

    return sb.ToString();
}

Thank you, this was great, i made some minor improvements to support \U00000000 format chars and simplify the hex conversions somewhat. I have submitted my version below but feel free to incorporate it into yours instead.

Variant · Accepted Answer · 2011-07-08 21:56:50Z

2

you could do something like:

string str = str.Replace(@"\n","\n");

update:

Obviously this is a workaround as the scenario is "un natural" by itself. The Regex.Unescape solution is unapplicable here as it is intended to use for unescaping regex control characters, and not new lines etc.

In order to support other relevant characters one can write a replacing function like this one:

public string ReEscapeControlCharacters(string str) {
   return str.Replace(@"\n","\n").Replace(@"\r","\r").Replace(@"\t","\t");
}

edited Jul 8, 2011 at 21:56

answered Jul 8, 2011 at 18:52

Variant

17.4k4 gold badges44 silver badges67 bronze badges

3 Comments

Tocco Over a year ago

This is a workaround. What about \t and the others hidden and control characters? Should he do it for all others?

Variant Over a year ago

Obviously it is a workaround... I am updating the answer with further detail

Ian Mercer Over a year ago

If the string contains an escaped \ followed by an n this will give the wrong result \\next

Adriano Carneiro · Accepted Answer · 2011-07-08 18:53:55Z

-3

Try this:

String replaced = startstring.Replace(System.Environment.NewLine, desirevalue);

This have to be valid only for "\n".

edited Jul 8, 2011 at 18:53

Adriano Carneiro

58.8k12 gold badges94 silver badges123 bronze badges

answered Jul 8, 2011 at 18:52

Tigran

62.3k8 gold badges90 silver badges124 bronze badges

1 Comment

Brad Christie Over a year ago

That would be if it was an interpreted \n, not an escaped literal "slash-enn".

Collectives™ on Stack Overflow

Evaluate escaped string

5 Answers 5

Comments

2 Comments

1 Comment

update:

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

1 Comment

update:

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related