9

Is there any way to determine a byte array's encoding in C#?

I have any string, like "Lorem ipsum áéíóú ñÑç", and I get bytes array using several encodings.

I would like a only method for detect encoding in byte array and I get string value again.

Other issue, maybe, I'll have a column in database which store BLOB (like byte array). A string previously converted to byte array in UTF-8. Maybe another application converts a string to byte array using Unicode encoding.

In a database column there are byte arrays in several encodings. It would be very useful detect byte array's encoding. I need a way to find encoding of byte array.

Tests:

string DataXmlForSupport = "<support><machinename></machinename><comments>Este es el log 1 áéíóú</comments></support>";
        string DataXmlForSupport2 = "Lorem ipsum áéíóú ñÑç";

        [TestMethod]
        public void Encoding_byte_array_string()
        {
            var uencoding = new System.Text.UnicodeEncoding();
            byte[] data = uencoding.GetBytes(DataXmlForSupport);

            var dataXml = Encoding.Unicode.GetString(data);
            Assert.AreEqual(DataXmlForSupport, dataXml, "Se esperaba resultados Unicode");

            dataXml = Encoding.UTF8.GetString(data);
            Assert.AreNotEqual(DataXmlForSupport, dataXml, "NO Se esperaba resultados UTF8");

            var utf8 = new System.Text.UTF8Encoding();
            data = utf8.GetBytes(DataXmlForSupport2);

            dataXml = Encoding.UTF8.GetString(data);
            Assert.AreEqual(DataXmlForSupport2, dataXml, "Se esperaba resultados UTF8");

            dataXml = Encoding.Unicode.GetString(data);
            Assert.AreNotEqual(DataXmlForSupport2, dataXml, "NO Se esperaba resultados Unicode");

        }
4
  • You should fix your database to only have one encoding, or store the encoding name in a separate column. It is not possible to reliably detect encodings. Commented Oct 22, 2013 at 13:47
  • Typically it's your job to associate the encoding with the data. For example in most XML/HTML files one of the first things you'll see is an attribute that describes the encoding. If the encoding is not supplied then based on the spec there is usually a default encoding which is presumed. Commented Oct 22, 2013 at 13:48
  • possible duplicate of How to detect the character encoding of a text file? Commented Oct 22, 2013 at 14:09
  • @JimDagg text file is not same a string, any fews differences I think. Anyway, maybe share knowledge both questions. Commented Oct 23, 2013 at 6:16

3 Answers 3

4

In short, no. Please see How to detect the character encoding of a text file? for a detailed answer on various encodings and why they can't be automatically determined.

Your best solution is to convert the string from it's original encoding to UTF8 and convert that to a byte array. Then you'll know your byte array's encoding...

Sign up to request clarification or add additional context in comments.

1 Comment

If I convert string to UTF8 encoding, byte array's encoding is UTF8. Anyway, how best way safely to convert string to UTF8?
4

I realize I'm late to the party here, but I just had a need to do this very thing and found a good way to do it:

byte[] data; // Populate this however you see fit with your data
string text;
Encoding enc;
using (StreamReader reader = new StreamReader(new MemoryStream(data), 
                                              detectEncodingFromByteOrderMarks: true))
{
    text = reader.ReadToEnd();
    enc = reader.CurrentEncoding; // the reader detects the encoding for you!
}

1 Comment

This will only work if the data contains a BOM at the beginning, which is not always the case. Otherwise, it will pretty much just default to assuming it is UTF-8.
-2

Complementing other response, you could try do:

string str = BitConverter.ToString(byte_array);
byte[] byte_array = Encoding.UTF8.GetBytes(str);

1 Comment

This won't work since ´BitConverter.ToString(byte_array)´ will convert the array to a string with hex values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.