238

In C#, can I convert a string value to a string literal, the way I would see it in code? I would like to replace tabs, newlines, etc. with their escape sequences.

If this code:

Console.WriteLine(someString);

produces:

Hello
World!

I want this code:

Console.WriteLine(ToLiteral(someString));

to produce:

\tHello\r\n\tWorld!\r\n

17 Answers 17

214

A long time ago, I found this:

private static string ToLiteral(string input)
{
    using (var writer = new StringWriter())
    {
        using (var provider = CodeDomProvider.CreateProvider("CSharp"))
        {
            provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);
            return writer.ToString();
        }
    }
}

This code:

var input = "\tHello\r\n\tWorld!";
Console.WriteLine(input);
Console.WriteLine(ToLiteral(input));

Produces:

    Hello
    World!
"\tHello\r\n\tWorld!"

These days, Graham discovered you can use Roslyn's Microsoft.CodeAnalysis.CSharp package on NuGet:

private static string ToLiteral(string valueTextForCompiler)
{
    return Microsoft.CodeAnalysis.CSharp.SymbolDisplay.FormatLiteral(valueTextForCompiler, false);
}
Sign up to request clarification or add additional context in comments.

13 Comments

Just found this from google the subject. This has to be best, no point in reinventing stuff that .net can do for us
Nice one, but be aware that for longer strings, this will insert "+" operators, newlines and indentation. I couldn't find a way to turn that off.
What about the inverse ? If you have a file with text containg escape sequences incluidng especial character escaped with its ascii code ? How to produce a raw version ?
If you run: void Main() { Console.WriteLine(ToLiteral("test \"\'\\\0\a\b\f\n\r\t\v\uaaaa \\\blah")); } you'll notice that this doesn't take care of a few escapes. Ronnie Overby pointed \f, the others are \a and \b
Is there a way to make it output verbatim (@"...") literals?
|
55

Use Regex.Escape(String):

Regex.Escape escapes a minimal set of characters (, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes.

6 Comments

+1 no idea why this is way below. Other answers are just too verbose and look like reinventing wheels
This is not what OP is asking for. It doesn't return a string literal, it returns a string with Regex special characters escaped. This would turn Hello World? into Hello World\?, but that is an invalid string literal.
I agree with @atheaos, this is a great answer to a very different question.
+1 even though it doesn't quite answer the OP's question it was what I (and so I suspect maybe others) were looking for when I came across this question. :)
This will not work as needed. The regex special characters are not the same. It will work for \n for example, but when you have a space, it will be converted to "\ " which is not what C# would do...
|
50

There's a method for this in Roslyn's Microsoft.CodeAnalysis.CSharp package on NuGet:

private static string ToLiteral(string valueTextForCompiler)
{
    return Microsoft.CodeAnalysis.CSharp.SymbolDisplay.FormatLiteral(valueTextForCompiler, false);
}

Obviously, this didn't exist at the time of the original question, but it might help people who end up here from Google Search.

7 Comments

this is a nice way to do it from .net core.
Yes, the package supports .NET Core and .NET Standard 2.0 - meaning it can also be referenced from .NET Framework 4.6.1+
Also useful in source generators.
This is the only way that escaped all the characters for me
Works great, but I need a way to restore the original unescaped string.
|
33

This is a fully working implementation, including escaping of Unicode and ASCII non-printable characters. It does not insert "+" signs like Hallgrim's answer.

static string ToLiteral(string input) {
    StringBuilder literal = new StringBuilder(input.Length + 2);
    literal.Append("\"");
    foreach (var c in input) {
        switch (c) {
            case '\"': literal.Append("\\\""); break;
            case '\\': literal.Append(@"\\"); break;
            case '\0': literal.Append(@"\0"); break;
            case '\a': literal.Append(@"\a"); break;
            case '\b': literal.Append(@"\b"); break;
            case '\f': literal.Append(@"\f"); break;
            case '\n': literal.Append(@"\n"); break;
            case '\r': literal.Append(@"\r"); break;
            case '\t': literal.Append(@"\t"); break;
            case '\v': literal.Append(@"\v"); break;
            default:
                // ASCII printable character
                if (c >= 0x20 && c <= 0x7e) {
                    literal.Append(c);
                // As UTF16 escaped character
                } else {
                    literal.Append(@"\u");
                    literal.Append(((int)c).ToString("x4"));
                }
                break;
        }
    }
    literal.Append("\"");
    return literal.ToString();
}

Note that this also escapes all Unicode characters. If your environment supports them, you could change that part to escape only control characters:

// UTF16 control characters
} else if (Char.GetUnicodeCategory(c) == UnicodeCategory.Control) {
    literal.Append(@"\u");
    literal.Append(((int)c).ToString("x4"));
} else {
    literal.Append(c);
}

7 Comments

You should use Char.GetUnicodeCategory(c) == UnicodeCategory.Control to decide whether to escape it, or people who don't speak ASCII won't be very happy.
This depends on situation if your resulting string will be used in the environment supporting unicode or not.
I added input = input ?? string.Empty; as the first line of the method so I could pass null and get back "" instead of a null reference exception.
Nice. Change enclosing quotes to ' and now you have what Python gives you out of the box with repr(a_string) :).
Why did you escape ' as that is not necessary?
|
26

A more structured approach, including all escape sequences for strings and chars, is:

It doesn't replace Unicode characters with their literal equivalent. It doesn't cook eggs, either.

public class ReplaceString
{
    static readonly IDictionary<string, string> m_replaceDict
        = new Dictionary<string, string>();

    const string ms_regexEscapes = @"[\a\b\f\n\r\t\v\\""]";

    public static string StringLiteral(string i_string)
    {
        return Regex.Replace(i_string, ms_regexEscapes, match);
    }

    public static string CharLiteral(char c)
    {
        return c == '\'' ? @"'\''" : string.Format("'{0}'", c);
    }

    private static string match(Match m)
    {
        string match = m.ToString();
        if (m_replaceDict.ContainsKey(match))
        {
            return m_replaceDict[match];
        }

        throw new NotSupportedException();
    }

    static ReplaceString()
    {
        m_replaceDict.Add("\a", @"\a");
        m_replaceDict.Add("\b", @"\b");
        m_replaceDict.Add("\f", @"\f");
        m_replaceDict.Add("\n", @"\n");
        m_replaceDict.Add("\r", @"\r");
        m_replaceDict.Add("\t", @"\t");
        m_replaceDict.Add("\v", @"\v");

        m_replaceDict.Add("\\", @"\\");
        m_replaceDict.Add("\0", @"\0");

        //The SO parser gets fooled by the verbatim version
        //of the string to replace - @"\"""
        //so use the 'regular' version
        m_replaceDict.Add("\"", "\\\"");
    }

    static void Main(string[] args){

        string s = "here's a \"\n\tstring\" to test";
        Console.WriteLine(ReplaceString.StringLiteral(s));
        Console.WriteLine(ReplaceString.CharLiteral('c'));
        Console.WriteLine(ReplaceString.CharLiteral('\''));

    }
}

4 Comments

This is not all escape sequences ;)
Works better than the solution above - and other escape sequences can easily be added.
Verbatim in the accepted answer was driving me bonkers. This works 100% for my purpose. Replaced regex with @"[\a\b\f\n\r\t\v\\""/]"and added m_replaceDict.Add("/", @"\/"); for JSON.
Also, you have to add the enclosing quotations to this if you want those.
21

Try:

var t = HttpUtility.JavaScriptStringEncode(s);

3 Comments

Does not work. If I have "abc\n123" (without quotes, 8 chars), I want "abc" + \n + "123" (7 chars). Instead it produces "abc" + "\\" + "\n123" (9 chars). Notice the slash was doubled and it still contains a string literal of "\n" as two characters, not the escaped character.
@Paul What you want is the opposite of what the question is asking, though. This, according to your description, answers the question, and therefore does work.
I found this useful to escape active directory names in the frontend
19

Hallgrim's answer is excellent, but the "+", newline and indent additions were breaking functionality for me. An easy way around it is:

private static string ToLiteral(string input)
{
    using (var writer = new StringWriter())
    {
        using (var provider = CodeDomProvider.CreateProvider("CSharp"))
        {
            provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, new CodeGeneratorOptions {IndentString = "\t"});
            var literal = writer.ToString();
            literal = literal.Replace(string.Format("\" +{0}\t\"", Environment.NewLine), "");
            return literal;
        }
    }
}

4 Comments

Works great. I also added one line before the return literal to make it more readable: literal = literal.Replace("\\r\\n", "\\r\\n\"+\r\n\"");
Added this literal = literal.Replace("/", @"\/"); for JSON functionality.
This is 100% straight forward and the only correct answer! All other answers either didn't understand the question or re-invented the wheel.
Sad, cannot get this to work under DOTNET CORE. Anyone has a better answer?
18
public static class StringHelpers
{
    private static Dictionary<string, string> escapeMapping = new Dictionary<string, string>()
    {
        {"\"", @"\\\"""},
        {"\\\\", @"\\"},
        {"\a", @"\a"},
        {"\b", @"\b"},
        {"\f", @"\f"},
        {"\n", @"\n"},
        {"\r", @"\r"},
        {"\t", @"\t"},
        {"\v", @"\v"},
        {"\0", @"\0"},
    };

    private static Regex escapeRegex = new Regex(string.Join("|", escapeMapping.Keys.ToArray()));

    public static string Escape(this string s)
    {
        return escapeRegex.Replace(s, EscapeMatchEval);
    }

    private static string EscapeMatchEval(Match m)
    {
        if (escapeMapping.ContainsKey(m.Value))
        {
            return escapeMapping[m.Value];
        }
        return escapeMapping[Regex.Escape(m.Value)];
    }
}

2 Comments

Why is there 3 backslashes and two speech marks in the first value of the dictionary?
Nice answer, @JamesYeoman that's because regex pattern needs to be escaped.
10

Here is a little improvement for Smilediver's answer. It will not escape all no-ASCII characters, but only these are really needed.

using System;
using System.Globalization;
using System.Text;

public static class CodeHelper
{
    public static string ToLiteral(this string input)
    {
        var literal = new StringBuilder(input.Length + 2);
        literal.Append("\"");
        foreach (var c in input)
        {
            switch (c)
            {
                case '\'': literal.Append(@"\'"); break;
                case '\"': literal.Append("\\\""); break;
                case '\\': literal.Append(@"\\"); break;
                case '\0': literal.Append(@"\0"); break;
                case '\a': literal.Append(@"\a"); break;
                case '\b': literal.Append(@"\b"); break;
                case '\f': literal.Append(@"\f"); break;
                case '\n': literal.Append(@"\n"); break;
                case '\r': literal.Append(@"\r"); break;
                case '\t': literal.Append(@"\t"); break;
                case '\v': literal.Append(@"\v"); break;
                default:
                    if (Char.GetUnicodeCategory(c) != UnicodeCategory.Control)
                    {
                        literal.Append(c);
                    }
                    else
                    {
                        literal.Append(@"\u");
                        literal.Append(((ushort)c).ToString("x4"));
                    }
                    break;
            }
        }
        literal.Append("\"");
        return literal.ToString();
    }
}

Comments

8

Interesting question.

If you can't find a better method, you can always replace.
In case you're opting for it, you could use this C# Escape Sequence List:

  • \' - single quote, needed for character literals
  • \" - double quote, needed for string literals
  • \ - backslash
  • \0 - Unicode character 0
  • \a - Alert (character 7)
  • \b - Backspace (character 8)
  • \f - Form feed (character 12)
  • \n - New line (character 10)
  • \r - Carriage return (character 13)
  • \t - Horizontal tab (character 9)
  • \v - Vertical quote (character 11)
  • \uxxxx - Unicode escape sequence for character with hex value xxxx
  • \xn[n][n][n] - Unicode escape sequence for character with hex value nnnn (variable length version of \uxxxx)
  • \Uxxxxxxxx - Unicode escape sequence for character with hex value xxxxxxxx (for generating surrogates)

This list can be found in the C# Frequently Asked Questions What character escape sequences are available?

2 Comments

This link no longer works, a textbook example of why link-only answers are discouraged.
Very true, @James, but thanks to Jamie Twells the information is available again :+1:
5

If JSON conventions are enough for the unescaped strings you want to get escaped and you already use Json.NET (Newtonsoft.Json) in your project (it has a pretty large overhead), you may use this package like the following:

using System;
using Newtonsoft.Json;

public class Program
{
    public static void Main()
    {
        Console.WriteLine(ToLiteral(@"abc\n123"));
    }

    private static string ToLiteral(string input)
    {
        return JsonConvert.DeserializeObject<string>("\"" + input + "\"");
    }
}

1 Comment

This seems to be the opposite of what OP wants? JsonConvert.SerializeObject(input).Trim('"') works great though.
1
public static class StringEscape
{
  static char[] toEscape = "\0\x1\x2\x3\x4\x5\x6\a\b\t\n\v\f\r\xe\xf\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\"\\".ToCharArray();
  static string[] literals = @"\0,\x0001,\x0002,\x0003,\x0004,\x0005,\x0006,\a,\b,\t,\n,\v,\f,\r,\x000e,\x000f,\x0010,\x0011,\x0012,\x0013,\x0014,\x0015,\x0016,\x0017,\x0018,\x0019,\x001a,\x001b,\x001c,\x001d,\x001e,\x001f".Split(new char[] { ',' });

  public static string Escape(this string input)
  {
    int i = input.IndexOfAny(toEscape);
    if (i < 0) return input;

    var sb = new System.Text.StringBuilder(input.Length + 5);
    int j = 0;
    do
    {
      sb.Append(input, j, i - j);
      var c = input[i];
      if (c < 0x20) sb.Append(literals[c]); else sb.Append(@"\").Append(c);
    } while ((i = input.IndexOfAny(toEscape, j = ++i)) > 0);

    return sb.Append(input, j, input.Length - j).ToString();
  }
}

1 Comment

An explanation would be in order. E.g., what is the idea/gist? E.g., is it due to performance considerations? Please respond by editing your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
1

My attempt at adding ToVerbatim to Hallgrim's accepted answer:

private static string ToLiteral(string input)
{
    using (var writer = new StringWriter())
    {
        using (var provider = CodeDomProvider.CreateProvider("CSharp"))
        {
            provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, new CodeGeneratorOptions { IndentString = "\t" });
            var literal = writer.ToString();
            literal = literal.Replace(string.Format("\" +{0}\t\"", Environment.NewLine), "");
            return literal;
        }
    }
}

private static string ToVerbatim(string input)
{
    string literal = ToLiteral(input);
    string verbatim = "@" + literal.Replace(@"\r\n", Environment.NewLine);
    return verbatim;
}

Comments

0

Hallgrim's answer was excellent. Here's a small tweak in case you need to parse out additional white space characters and linebreaks with a C# regular expression. I needed this in the case of a serialized JSON value for insertion into Google Sheets and ran into trouble as the code was inserting tabs, +, spaces, etc.

  provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);
  var literal = writer.ToString();
  var r2 = new Regex(@"\"" \+.\n[\s]+\""", RegexOptions.ECMAScript);
  literal = r2.Replace(literal, "");
  return literal;

Comments

0

I feel like the JsonEncodedText.Encode method is simpler to use as it's part f the runtime (System.Text.Json):

https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsonencodedtext.encode?view=net-8.0

Example:

JsonEncodedText.Encode("\tHello\r\n\tWorld!\r\n")

Will return \tHello\r\n\tWorld!\r\n

Note that you can also pass JavaScriptEncoder.UnsafeRelaxedJsonEscaping such that \" are not encoded as \u0022A. It's a matter of preference and requirements.

Comments

-1

I submit my own implementation, which handles null values and should be more performant on account of using array lookup tables, manual hex conversion, and avoiding switch statements.

using System;
using System.Text;
using System.Linq;

public static class StringLiteralEncoding {
  private static readonly char[] HEX_DIGIT_LOWER = "0123456789abcdef".ToCharArray();
  private static readonly char[] LITERALENCODE_ESCAPE_CHARS;

  static StringLiteralEncoding() {
    // Per http://msdn.microsoft.com/en-us/library/h21280bw.aspx
    var escapes = new string[] { "\aa", "\bb", "\ff", "\nn", "\rr", "\tt", "\vv", "\"\"", "\\\\", "??", "\00" };
    LITERALENCODE_ESCAPE_CHARS = new char[escapes.Max(e => e[0]) + 1];
    foreach(var escape in escapes)
      LITERALENCODE_ESCAPE_CHARS[escape[0]] = escape[1];
  }

  /// <summary>
  /// Convert the string to the equivalent C# string literal, enclosing the string in double quotes and inserting
  /// escape sequences as necessary.
  /// </summary>
  /// <param name="s">The string to be converted to a C# string literal.</param>
  /// <returns><paramref name="s"/> represented as a C# string literal.</returns>
  public static string Encode(string s) {
    if(null == s) return "null";

    var sb = new StringBuilder(s.Length + 2).Append('"');
    for(var rp = 0; rp < s.Length; rp++) {
      var c = s[rp];
      if(c < LITERALENCODE_ESCAPE_CHARS.Length && '\0' != LITERALENCODE_ESCAPE_CHARS[c])
        sb.Append('\\').Append(LITERALENCODE_ESCAPE_CHARS[c]);
      else if('~' >= c && c >= ' ')
        sb.Append(c);
      else
        sb.Append(@"\x")
          .Append(HEX_DIGIT_LOWER[c >> 12 & 0x0F])
          .Append(HEX_DIGIT_LOWER[c >>  8 & 0x0F])
          .Append(HEX_DIGIT_LOWER[c >>  4 & 0x0F])
          .Append(HEX_DIGIT_LOWER[c       & 0x0F]);
    }

    return sb.Append('"').ToString();
  }
}

1 Comment

Why are switch statements bad? Aren't they optimised by the compiler (lookup tables or similar)?
-10

Code:

string someString1 = "\tHello\r\n\tWorld!\r\n";
string someString2 = @"\tHello\r\n\tWorld!\r\n";

Console.WriteLine(someString1);
Console.WriteLine(someString2);

Output:

    Hello
    World!

\tHello\r\n\tWorld!\r\n

2 Comments

I have someString1, but it is read from a file. I want it to appear as someString2 after calling some method.
The string may be dynamically created/obtained he needs a method to handle any string

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.