1

My scenario is:

  • Create an email in Outlook Express and save it as .eml file;
  • Read the file as string in C# console application;

I'm saving the .eml file encoded in utf-8. An example of text I wrote is:

  1. 'Goiânia é badalação.'

There are special characters like âéçã. It is portuguese characters. When I open the file with notepad++ the text is shown like this:

  1. 'Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.'

If I open it in outook express again, it's shown normal, like the first way. When I read the file in console application, using utf-8 decoding, the string is shown like the second way.

The code I using is:

string text = File.ReadAllText(@"C:\fromOutlook.eml", Encoding.UTF8);
Console.WriteLine(text);

I tried all Encoding options and a lot of methods I found in the web but nothing works. Can someone help me do this simple conversion?

'Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.' to 'Goiânia é badalação.'

    string text = "Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.";

    byte[] bytes = new byte[text.Length * sizeof(char)];
    System.Buffer.BlockCopy(text.ToCharArray(), 0, bytes, 0, bytes.Encoding.UTF8.GetString(bytes, 0, bytes.Length);

    char[] chars = new char[bytes.Length / sizeof(char)];
    System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
    Console.WriteLine(new string(chars));

In this utf-8 table you can see the hex. value of these characters, 'é' == 'c3 a9': http://www.utf8-chartable.de/

Thanks.

3
  • 4
    That's quoted printable... see stackoverflow.com/questions/2226554/… Commented Feb 15, 2013 at 12:03
  • 1
    Try Encoding.Unicode - just a hunch. Commented Feb 15, 2013 at 12:04
  • @ShellShock Unicode and UTF-8 are related but still different. Here Commented Feb 15, 2013 at 12:09

2 Answers 2

1
var input = "Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.";             
var buffer = new List<byte>();
var i = 0;
while(i < input.Length)
{
    var character = input[i];
    if(character == '=')
    {
        var part = input.Substring(i+1,2);
        buffer.Add(byte.Parse(part, System.Globalization.NumberStyles.HexNumber));
        i+=3;
    }
    else
    {
        buffer.Add((byte)character);
        i++;
    }
};
var output = Encoding.UTF8.GetString(buffer.ToArray());
Console.WriteLine(output); // prints: Goiânia é badalação.
Sign up to request clarification or add additional context in comments.

Comments

1

Knowing the problem is quoted printable, I found a good decoder here:

http://www.dpit.co.uk/2011/09/decoding-quoted-printable-email-in-c.html

This works for me.

Thanks folks.

Update: The above link is dead, here is a workable application:

How to convert Quoted-Print String

1 Comment

@IlliaRatkevych See update edit. I edited in the workable code: stackoverflow.com/questions/37540244/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.